Helix-OSDL: efficient sound and music playback for the Nintendo DS

Overview

Helix-OSDL is a GPL library to be run from the ARM7 processor of the Nintendo DS so that sound and music playback can be performed, using notably the Helix MP3 decoder. MP3 is to be used mostly for musics, as for punctual sounds the Helix-OSDL PCM/IMA-ADPCM support is generally preferred.

Both of these audio content types are used here through OSDL-specific audio file formats (*.osdl.sound and *.osdl.music) for better fine-tuning and DS integration. These files can be generated by a set of OSDL tools available on the GNU/Linux platform.

Helix-OSDL offers a way of having rather high-quality music playback on the DS with various features (start, stop, fade-in, fade-out, etc.), while loading as little as possible the main processor (ARM9). The otherwise often under-used ARM7 takes in charge the actual decoding on the MP3 stream, instead of being mostly idle. Musics are streamed from the ARM9 and transmitted still encoded to the ARM7.

Helix-OSDL is developed in pure C, not C++, to fit in the up to 96 kilobytes of RAM available for the ARM7 and to ease its integration in third-party code. A library is used to avoid being too intrusive in the user code, although one can rely directly on our ARM7 full-blown dedicated example executable.

The OSDL library has been ported to the DS, thus it offers C++ code for the ARM9 to communicate with the ARM7 Helix-OSDL code. It can be used as a high-level interface to Helix-OSDL, or as an example for alternate implementations, since of course Helix-OSDL can be used by other (non-OSDL) ARM9 code.

MP3-based encoding is used for musics, as RAW/WAV files would use a lot of place in the storage medium (ex: linker) and eat of lot of I/O bandwidth for playback. OggVorbis is not used (although we would prefer it to MP3) as its decoding apparently requires more CPU resources than MP3, even with the Tremor implementation; it might be too demanding for the ARM7.

Helix is used because open-source lightweight MP3 decoders for low-end devices (with no floating-point hardware support and little memory) are not common, and because this one, being heavily optimized notably for the ARM processors, offers quite impressive performances (a reported 128 kbps 44.1 kHz joint stereo playback realtime on the ARM7, with only fixed point maths) and features (variable bitrate and joint stereo), while remaining tiny in memory.

The Helix licences, commercial and open-source, are however a bit restrictive. As mentioned here, this software is available under the RealNetworks Community Source License - RCSL 1.2 and RealNetworks Public Source License - RPSL 1.0 licenses. Although they are specified here and partly discussed here and here, they remain rather difficult to understand (the licensing FAQ can help). They imply that Helix-OSDL is released under the GPL, and not the LGPL. One has to ensure that one's project is compatible with at least one Helix license before using Helix-OSDL.

A DS application being made of two distinct executables (one for each ARM), some sources consider that a different license may apply to each piece of software, if they are deemed loosely coupled here (ex: one can run without the other). With that interpretation, Helix and, therefore, Helix-OSDL could be used on the ARM7 with, say, a GPL license, while the ARM9 could run any type of application, including OSDL-based one and/or LGPL or GPL or proprietary closed software. See this FAQ for a point of view. Are the DS FIFO a communication mechanism used for communication between two separate programs, like a UNIX pipe? Or should we consider the two parts as intimate enough and being combined actually into a larger program?

Helix-OSDL is based upon our Ceylan-based generic high-level IPC system, to convey reliably audio commands between the ARMs.

See also our homebrew guide for the DS, including the discussions about sound and audio transformations.

Helix-OSDL can be tested directly: just download testHelix-OSDL.zip, which contains our music test player (testOSDLMusic-0.5-release-patched.r4.nds) and a sample music file (test.osdl.music), to be copied at the root of your DS card. This recording, a French Christmas song interpreted by a family, lasts for about 3 minutes, and is copyright free for non-commercial use. Its homepage is here. Beware, the full test is quite long, as it performs multiple playbacks of the music.

Usage

Sounds

To perform sound playback with Helix-OSDL, osdl.sound files must be used and, thus, generated.

Step #1: generating the osdl.sound files

Such sound files can be created from WAVE files thanks to OSDL tools running on GNU/Linux, and can be internally encoded in raw PCM data (high quality, at the expense of size) or IMA-ADPCM (compact, but rather low quality). Two different tools for that are detailed next.

In both cases, we tend to favor 22 050 kHz (good balance) 16-bit (necessary) PCM (IMA ADPCM sounds quite bad) mono (stereo not that interesting on the DS) samples of short durations (up to ten seconds). Longer samples may be considered as musics.

Using wavToOSDLSound.exe (first method)

This executable (see wavToOSDLSound.cc) converts a WAVE file into an osdl.sound file:

Usage: ./wavToOSDLSound.exe [ -f frequency ] [ -m mode ] [ -b bitdepth ] X.wav
Converts a WAVE file (*.wav) into a .osdl.sound file by replacing the WAVE
header by an OSDL header filled with informations specified from the
command-line.

         -f: specifies the output sampling frequency, in Hz, ex: -f 22050 (the default)
         -m: specifies the output mode, mono or stereo, ex: -m mono (the default)
         -b: specifies the sample (PCM) bit depth, in bits, ex: -b 16 (the default). 
		 A bit depth of 4 corresponds by convention to the IMA ADPCM sample format.
		 
One may use the sox command-line tool to retrieve the relevant audio settings
for the source sound.

Ex: 'sox -V YourSound.wav -n' converts the sound and outputs its metadata 
that can be used to fill the next command line.
		
Then 'wavToOSDLSound.exe -f 44100 -m stereo -b 8 YourSound.wav' results in 
the creation of 'YourSound.osdl.sound'. 

Alternatively, use the wavToOSDLSound.sh script: 
'wavToOSDLSound.sh YourSound.wav' takes care of everything and results in
the YourSound.osdl.sound file.

For example:

>./wavToOSDLSound.exe -f 22050 -m mono -b 16 test.wav
Converting 'test.wav' into 'test.osdl.sound', using frequency 22050 Hz, 
mode mono (1), bit depth 16 (format 32784).

Generation of 'test.osdl.sound' succeeded!
Using wavToOSDLSound.sh (second method, recommended)

This script (see wavToOSDLSound.sh) is still more automated, as it guesses the sound settings, thus only the source WAVE file and, possibly, the target format (if wanting IMA-ADPCM instead of PCM) have to be specified:

Usage: wavToOSDLSound.sh [-h|--help] [-i|--ima-adpcm] SOURCE_WAVE_FILE
Converts specified wave file into an OSDL sound counterpart.

  Example: wavToOSDLSound.sh hello.wav uses hello.wav to generate its
  hello.osdl.sound counterpart (needs sox and wavToOSDLSound.exe).
  
    -h/--help: displays this help
    -i/--ima-adpcm: encode the wave samples into IMA-ADPCM 
	(about four times smaller but with poorer quality) [uses ffmpeg]
	
  Note that the wave file can contain usual PCM samples or IMA ADPCM samples:
  both will be managed automatically by this tool and by the OSDL player 
  on the Nintendo DS.
  
  To generate an IMA ADPCM-encoded wave file, one should better use ffmpeg 
  or audacity than sox, as the data produced by the latter is incorrectly
  decoded by the DS.

For example:

./wavToOSDLSound.sh test.wav
    test.osdl.sound produced, ready to be used!
-rw------- 1 sye sye 860926 2008-02-08 18:21 test.osdl.sound
-rw-rw-r-- 1 sye sye 860962 2008-02-08 18:19 test.wav

Step #2: using the osdl.sound file in your application

Once generated, the OSDL sound file can be put on the DS storage, and from the ARM9 the OSDL::Audio::Sound constructor will load specified osdl.sound automatically thanks to the Ceylan libfat-based layer. Then the sound instance may be loaded/unloaded, played at will (hardware channels are managed internally by OSDL), etc., thanks to the corresponding methods offered by the OSDL::Audio::Sound class, whose API is defined in OSDLSound.h.

See testOSDLSound.arm9.cc for an example of sound management.

Music

Exactly like sounds, OSDL music files have to be generated before playback, using various tools.

Step #1: from WAVE to MP3

When for example creating a game, your audio source is usually a high-quality uncompressed WAVE file. As osdl.music files are MP3-based, the first step is to encode this WAVE file into a MP3 one, using here preferably the encoder named LAME. As described in our homebrew guide, the MP3 encoding must be heavily tuned for the DS, resource-wise. The proper encoding can be done automatically by the wavToMP3ForDS.sh script:

Usage: wavToMP3ForDS.sh [-h|--help] SOURCE_WAVE_FILE
Converts specified WAVE file into a mp3 file appropriate for playback on the DS.

  Example: 'wavToMP3ForDS.sh hello.wav' uses hello.wav to generate its 
  hello.mp3 counterpart (needs the LAME encoder).
    -h/--help: displays this help
	
  One may use the audacity tool to preprocess the wave sound beforehand
  (cleaning, volume adjustment, correct export in format, etc.).
  
  Running this script is often the first step of a process: once having a mp3,
  one usually plays it on the Nintendo DS thanks to the getMP3Settings tool
  (copy the generated mp3 file to the root of your removable DS card, 
  under the name 'test.mp3'), which will return an upper bound to the size 
  of encoded frames for this mp3.
  
  Then this value can be used with the mp3ToOSDLMusic.exe tool (using 
  the -u parameter) to finally produce an OSDL music file ready to be 
  played back by the OSDL-Helix engine on the DS, with reduced resource needs.

For example (LAME audio details skipped):

 ./wavToMP3ForDS.sh test.wav
    Encoding test.wav into test.mp3 for DS playback.
LAME 3.97 32bits (http://www.mp3dev.org/)
CPU features: MMX (ASM used), SSE, SSE2
polyphase lowpass filter disabled
Encoding test.wav to test.mp3
Encoding as 22.05 kHz VBR(q=5) single-ch MPEG-2 Layer III (ca. 11.9x) qval=0

misc:

        scaling: 1
        ch0 (left) scaling: 0
        ch1 (right) scaling: 0
        filter type: 0
        quantization: xr^3/4
        huffman search: best (outside loop)
        experimental Y=1
        ...

stream format:

        MPEG-2.5 Layer 3
        1 channel - mono
        padding: all
        variable bitrate - VBR mtrh
        ...

psychoacoustic:
[..]

    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
   751/751   (100%)|    0:00/    0:00|    0:00/    0:00|   67.648x|    0:00
  8 [182] ***********************************************************************************
 16 [  4] **
 24 [  7] ****
 32 [ 14] *******
 40 [ 61] ****************************
 48 [206] *********************************************************************************************
 56 [133] *************************************************************
 64 [ 67] *******************************
 80 [ 46] *********************
 96 [ 31] **************
-------------------------------------------------------------------------------------------------------
   kbps       mono %     long switch short %
   43.7      100.0        61.0  19.4  19.6
    test.mp3 produced, ready to be used!
-rw-rw-r-- 1 sye sye 106859 2008-02-08 19:04 test.mp3
-rw-rw-r-- 1 sye sye 860962 2008-02-08 18:19 test.wav

For musics, we deem the most balanced settings are 22 050 kHz (higher frequencies not really needed), 16-bit (8-bit offers a too low quality), mono (stereo usually not noticed by DS users), VBR (variable bitrates more efficient than constant ones), with an average bitrate lower than 80 kbps and a peak bitrate not greater than 96 kbps (to avoid too big frame size for our upper-bound).

Step #2: determining the MP3 frame size upper bound

Once having the MP3 file, to further optimize the DS playback, a value has to be specified to the osdl.music generator: the upper bound of the size of a MP3 frame of this music. Knowing it, the OSDL-Helix playback module will be able to minimize transfers of chunks of encoded data at runtime. But how one can determine this upper bound? By using an intermediate tool (last one, promise!), getMP3Settings (see getMP3Settings.arm9.cc and getMP3Settings.arm7.c), which will run on the DS, perform the full playback of the MP3 (rename your mp3 to test.mp3 and put it at the root of your DS card), and finally output its framesize upper bound (a value of 1940 bytes could be used as a default value, but the actual value is often far below, like 314, so using getMP3Settings should really be encouraged).

Step #3: converting the MP3 file into an osdl.music file

The objective then is, knowing the upper bound, to convert the MP3 file into a proper osdl.music file, suitable for direct playback on the DS using OSDL and Helix-OSDL.

This should be done using the mp3ToOSDLMusic.exe tool (see mp3ToOSDLMusic.cc).

Usage: mp3ToOSDLMusic.exe [ -f frequency ] [ -m mode ] [ -t {CBR|VBR} ] -u BYTE_COUNT X.mp3
Converts a .mp3 file into a .osdl.music file by appending a header filled with
informations specified from the command-line.

         -f: specifies the output sampling frequency, in Hz, ex: -f 22050 (the default)
         -m: specifies the output mode, mono or stereo, ex: -m mono (the default)
         -t: specifies the bitrate type, either constant bit rate (CBR) 
		 or variable bitrate (VBR) (the default)
         -u: specifies the upper bound of the size of an encoded mp3 frame 
		 in this music, in bytes.
One may use the wavToMP3ForDS.sh script, or directly the lame command-line tool,
to convert beforehand a .wav into a .mp3.

        Ex: 'lame YourLongMusic.wav --verbose -m m --vbr-new -V 5 -q 0 
		-B 96 -t --resample 22.05 YourLongMusic.mp3' converts the music.
		
Then 'mp3ToOSDLMusic.exe -f 22050 -m mono -t vbr -u 314 YourLongMusic.mp3'
results in the creation of 'YourLongMusic.osdl.music'.

The upper bound of the size of an encoded mp3 frame can be determined thanks 
to the getMP3Settings OSDL media tool. It runs on the DS, plays the mp3 
and, once done, displays that upper bound.

For example:

> ./mp3ToOSDLMusic.exe -u 314 test.mp3
Converting 'test.mp3' into 'test.osdl.music', using frequency 22050 Hz, 
mode mono (1), bitrate 2, upper bound 314 bytes.

Step #4: using the osdl.music file in your application

Exactly like with sounds, one just has to use the proper constructor (OSDL::Audio::Music, whose API is defined in OSDLMusic.h) from the ARM9 to specify an osdl.music file on the DS card, and then this instance allows to playback this music, fade it in, etc.: load/unload, play, playWithFadeIn, pause/unpause, stop, fadeIn/fadeOut, etc. Note that only a subset of the OSDL Music API has been ported to the DS, other methods may throw an exception when called, with an appropriate error message.

See testOSDLMusic.arm9.cc for an example of music management.

Note that only one music at a time will be played, on the hardware channel 0.

Other helper tools

Regarding *.osdl.* files, the identifyOSDLFile.exe executable (see identifyOSDLFile.cc), can be used to output their metadata, for example:

			
> ./identifyOSDLFile.exe test.osdl.sound
File 'test.osdl.sound', according to its tag, is a sound (PCM or IMA ADPCM).
  + Sampling frequency: 22050 Hz.
  + Sample format: little-endian signed 16-bit (native).
  + Channel format: mono.
  + Size of all samples: 217104 bytes.
and
			
> ./identifyOSDLFile.exe test.osdl.music
File 'test.osdl.music', according to its tag, is a music (MP3).
  + Sampling frequency: 22050 Hz.
  + Channel format: mono.
  + Bitrate type: VBR (variable bitrate).
  + Upper bound of encoded frame size: 313 bytes.
  + Size of actual mp3 content: 17665 bytes.

Inner workings

The ARM9 is the only CPU able to access the storage media (the linker), at least when using the libfat, since this library cannot fit in the ARM7 memory. Thus the music has to be loaded from the ARM9.

The ARM7 is the only CPU able to access the sound hardware (mixer), so the last part of the audio pipeline has to be located in the ARM7. Consequently:

The following constraints made us design the solution that way:

Audio Pipeline

The ARM9 drives the high-level process: it decides when a given music must be played, paused, stopped, faded-out, etc. and the ARM7 performs the actual decoding. Thus the user program just has to send requests to the ARM7 implementation and process incoming ones. OSDL on the ARM9 offers as well a C++ layer that communicates appropriately with the ARM7 C implementation and allows to use directly higher-level constructs, such as: Music::playWithFadeIn.

If the playback is triggered by the ARM9, as soon as initiated it becomes entirely focused on the hard real time need of feeding the output PCM buffers with samples, on the ARM7 side. Based on a series of cascading timers, the ARM7 has to trigger double-buffered MP3 decoding and sends to the ARM9 requests to refill asynchronously another double buffer, this time for encoded data.

All I/O operations are performed outside of IRQ handlers, and the safest/most reliable solutions have been implemented (ex: regarding IRQ management or cache-aligned buffers on the ARM9). Smallest memory footprint (no intermediate buffer) and lightest memory transfers (per-music optimized end-of-buffer moves thanks to the frame size upper bound) are used. This allows the full implementation on the ARM7 to fit in 32-bit in its private memory, even with devkitARM versions more recent than the devkitARM-20 release.

On the ARM7, most of the generic code comes from Ceylan core:

Audio-specific code on the ARM7 is provided by the Helix-OSDL library, built on top of the Ceylan library:

Building Helix

Sources and build files are available in Helix-OSDL SVN.

To build Helix, one should read first the readme.txt file which gives useful explanations.

Before using the sources, one must agree with at least one of the three licences. Even though Helix can be obtained from CVS, probably the more convenient method is to download the zip archive (ex: select the stable version for GNU/Linux). Helix in itself is located in the datatype/mp3/codec/fixpt directory. One can obtain the sources from their ViewCVS as well.

Obviously we want an ARM build, not a x86 one. We target the 32-bit ARM mode, not the (16-bit) Thumb one, as it is a 32-bit codec. Real (RealNetworks) code is what we want here, as IPP (Intel Integrated Performance Primitives) is not expected to be available here. So the library should gather, as mentioned, mp3dec.c + mp3tabs.c + real/*.c + real/arm/[files].s

Using Helix

Helix being quite feature-rich, if targeting stereo output then joint-stereo should be preferred to simple stereo, as it takes advantages of the correlation between the two channels. Use the -m j option with LAME to generate appropriate audio data.

The MPEG-2.5 extension is not useful here, as apparently it targets still lower sampling frequencies (less than 12 kHz).

Variable bitrates, as opposed to constant ones, are supported by Helix and may be preferred too, so that the bitrate can vary according to the audio needs and thus be more efficient. Use the --vbr-new -V quality option with LAME, with quality between 0 and 9.

The sources of a test program using Helix should help using it correctly.

Sources

A lot of information about Helix can be found directly in its dedicated website, including:

Integration of Helix and various issues are discussed in following threads of the gbadev forums (chronological order):

Some more information can be found in this PAlib thread: How to use helix mp3 decoder with PAlib?

Most of the work has been done by ThomasS, DekuTree64, Noda, DragonMinded, tepples, simonjhall, Lazy1, many thanks to them for sharing their knowledge.

Please react!

If you have information more detailed or more recent than those presented in this document, if you noticed errors, neglects or points insufficiently discussed, or if you would like to contribute and help us, even a little bit, drop us a line !




[Top]

Last update: Sunday, April 5, 2009