Audio with SDL

Foreword: the vast majority of these informations and pieces of advice comes from the various members of the SDL mailing list. Special thanks to David Olofson and Ryan C. Gordon for having shared their knowledge with the list. I hope that this thematical gathering will help the SDL users. Many thanks to the several SDL contributors!

Overview

This section focuses on the different ways of managing sounds thanks to SDL and its helper libraries.

Due to the amount of relevant informations, this is still a document in progress and any help would be appreciated.

Terms & conventions
Buffering sounds

Choosing the right library for audio output
Audio drivers

3D sound
Mods and MIDI
Troubleshooting
Hardware-dependent audio issues
Audio links

Terms & conventions

For a given audio source, all samples respect the same format, like the pixels of a surface that respect the pixel format of the surface. The size of a sample depends on its format, which itself depends on:

the chosen sample quantification, which is the precision of the measured sound pressure, often described with 8 or 16 bits, i.e. thanks to 256 or 65536 different values
the number of channels (mono, stereo, 5.1, etc)

SampleSize = (size of a quantification unit) * (number of channels)

Hence one sample of a stereo 8 bit audio source is: 1 [8 bits] * 2 [channels] = 2 bytes.

Frequency is the number of samples played per second, expressed in Hertz (Hz) or kilo-Hertz (kHz, 1 kHz = 1000 Hz). You would set this to 44100 Hz (or 44,1 kHz) for "CD quality", or about 8 kHz for phone quality, for example.

[Back to table of contents]

Buffering sounds

Understanding the runtime behaviour of the SDL sound subsystem

A typical SDL audio driver requests two buffers at whatever sample size is specified by the application, and then pre-fills them with silence. Then it double-buffers sound, filling one buffer (thanks to the user-specified callback) while the audio hardware is reading the other.

For example, let A and B be buffers, both 4kb large (1024 samples of 16-bit stereo data):

A and B are allocated and filled with silence
Audio hardware is triggered to playback A
Application gets a callback to fill B
Audio driver waits for A to finish playing and triggers playing B
Application gets a callback to fill A
...and so on...

This scheme assumes that the application is able to fill the buffers in time, which is usually true with current CPU and OS combinations.

[Back to table of contents]

How to size the audio buffer? Latency versus skips

These days, most sound cards seem to work best with 48 kHz, since that is what the converters are designed for and/or because the cards run at that rate all the time internally, no matter what sample rates applications use.

The larger the buffer, the more latency you will experience: it is a FIFO buffer, so the bigger it is, the longer it takes from when you put data in it to when it comes out the speaker. The latency (also known as lag) corresponds to the time taken to play back one buffer of audio data.

Reciprocally, if the buffer is too small, you will get skips in the audio (the callback will not be called soon enough to fill the buffer on time). It is a balancing act. No automatic size guess can be easily implemented to adapt the values to the actual platform.

To better diagnose issues coming from an irrelevant buffer size, just hear the result if you are not satisfied with your sound output:

crackle and unpleasant metal noises: the buffer size might be set too low. Crakles happen because you are not on a real time operating system. There is a hard deadline for delivering the audio buffer (i.e. returning from the callback), and if you miss it, audio skips. Meanwhile, a standard OS scheduler and some background system load will cause the audio callback to start anywhere from a few microseconds through tens or (occasionally) hundreds of milliseconds late, wasting the cycles you were going to use for processing, or even causing audio drop-outs before you get a chance to do anything at all. Not much you can do about that, shortof switching to Linux/lowlatency or BeOS.
lots of lag, sound effects seem to be output some time after their cause: the buffer size might be set too high

You have to be prepared for any buffer size, regardless of what you ask for. Normally, you would just design all your audio code to be callback driven with a similar interface to that of the SDL audio callback. Every function is given some I/O buffers and is told to process N samples. That way, it is extremely simple to build arbitrarily complex trees of DSP functions. The right number of samples are generated for each audio callback, which minimizes latency and makes it easier to avoid drastically varying callback execution times.

This is how all major audio plugin and I/O APIs like VST, TDM, AudioUnits, LADSPA, DirectX, CoreAudio, PortAudio, ASIO, EASI, JACK, ReWire, etc. work. It is the only design that really works in anything but the most trivial cases.

Although it is not perfectly nice and clean code, this mixer example (simplemixer-1.1) demonstrates very basic mixing of multichannel sound with the "raw" SDL audio API.

To fight against crakles, one can increase the buffering (latency) so that the scheduling jitter accounts for a smaller part of the audio buffer period. That way, you can use more CPU power for audio with less drop-outs, but on an OS like Windows, there is no way ever you are going to get anywhere near 100%, and/or totally eliminate the drop-outs when doing low latency audio. Maybe you can get pretty close with Windows 2000 or XP, a decent profesional sound card and running your audio code in kernel space as Kernel Stream filters, but such configuration would be insane for "consumer" multimedia such as games.

No settings could be universal, since every system (OS, hardware, configuration, ...) is different. Therefore one has often to fine-tune its buffer size for each hardware, which advocates for a way for the user to specify sizes different from the default hardcoded ones (thanks to command line, configuration, etc.)

Of course, if latency is irrelevant, you can decide on some reasonably safe value (such as buffers of 4096 samples), but it would not be totally safe, and it may be way too much latency for most applications. For average play-back frequencies, a buffer size of 2048 samples might a sensible value that should work on most systems, keeping in mind the optimum size results from a hardware-dependant compromise. If still hearing crackles, increase that buffer size.

Note you need to scale the buffer size with sample rate, sample format and number of channels, to maintain the desired amount of buffering in terms of latency. However, do not expect a common value to be totally accurate on all platforms. Internal buffering and stuff in the OS and/or drivers may affect things in weird ways, so do not rely too heavily on it.

Therefore the value for samples in the audiospec needs to be appropriate for the audio rate which is to be set. Here are some values that should be fairly good for stereo data:


Playing frequency (in kHz)	Buffer size (in bytes)
11	512
22	1024
44	2048

Finally, latency comes also from leading silences, that should be removed (and ending silences as well) from all audio assets. See the trimSilence.sh script (uses SoX).

[Back to table of contents]

Callbacks

The callback is the user-provided function regularly called by the audio system so that the audio output is rendered. As already described about buffer sizes, one has to make sure his buffering is set up so that it is small enough to stay in reasonable sync with the gameplay, but large enough so that there is not too much callback overhead and one does not fall inside the non-realtimeness of modern day systems. Some experimentation will be required.

The buffer contents will be played only after having returned from the callback. The data that would go into the buffer is preferably to be generated in the callback. Otherwise, one would need extra buffering and thread safe communication between the audio engine and the callback. This can make sense in some cases, but definitely not when you want interactive low latency sound effects.

Preferably never call SDL_LockAudio / SDL_UnlockAudio. Use some sort of lock-free communication between the main thread and the audio callback context instead, to avoid increasing the callback "scheduling" jitter.

[Back to table of contents]

Resampling: changing frequency and channel output

SDL does some simple internal resampling on load, but that only gives double or halving resamples. One would therefore need an audio resampling that works with Mix_Chunks from SDL_mixer, such as the Pymedia's Resample object. Or get resample.c from ffmpeg project. They do have simple linear interpolation. ffmpeg from the 0.4.8 version supports mono and stereo resampling, mono to stereo, stereo to mono, stereo to 5.1 resampling. Pymedia supports 5.1/4/5 to stereo also.

[Back to table of contents]

Choosing the right library for audio output

Main choices are:

Basic SDL API, fairly low-level
SDL_sound: lightweight, multiple format supported, but it is not a sound mixer, so it may take more work to use, unless you want to play just one sound at a time. The mpglib packaged with SDL_sound seems not to be even worse than SMPEG audio support
SDL_mixer, the library can be built with MP3 support or, preferably Ogg Vorbis support

SDL_mixer

SDL_mixer can handle samples, channels, groups of channels, music and special effects, etc. [documentation]

Loading sounds with SDL_mixer

"Audio files" (wav, voc, etc.) are loaded in their entirety before playing, "music files" (mp3, ogg, midi) are streamed from the RWops as needed.

When you use SDL_mixer to load a Mix_Chunk, it also converts the sample to the current output format of the opened audio device. So it is not surprising to see a low frequency file or mono or perhaps 8 bit samples being resampled to a high frequency, stereo, 16 bit one, which may account for a 16 time increase in memory size, compared to file size. So all you have to know to predict the size in memory is the audio output format, which you can get from a query function.

If you need finer-grained control on all files, you might want to look into SDL_sound, and gluing it to SDL_mixer. There was a patch floating around for that at one point.

Using SDL_mixer

Audio files (short sounds) are usually better managed in the WAV unencoded format. The best choice for music files (longer sounds) is often the OggVorbis format, a lossy audio codec whose performances are quite similar to the ones of the MP3 codecs on most platforms (except maybe embedded ones with low resources, ex: the ARM7 of the Nintendo DS) and which is not subject to licensing fees.

Most of the time, the calling code can just pass a null pointer for obtained because it does not care what the underlying audio driver is doing.

A very simple example of use would be:

Mix_PlayMusic( music, 0 ) ;

while ( Mix_PlayingMusic() == 1 )
{
	// Do something
	SDL_Delay( 10 ) ;
}
Mix_FreeMusic( music ) ;

Currently, Mix_LoadWAV supports WAVE, VOC, AIFF, and OGG. SDL_mixer can play several Ogg streams simultaneously: the Mix_Chunk structure is created with either Mix_LoadWAV or Mix_LoadWAV_RW. Despite their name, they both load Ogg just fine, if SDL_mixer is built with Ogg support (thanks to libogg and libvorbis). One would almost certainly not want to use .ogg files of any significant size with Mix_LoadWAV*, since it would have to predecode the whole thing in memory, whereas the LoadMUS path decodes more as it needs it, i.e. on the fly.

Other SDL_mixer examples: 1, 2.

Troubleshooting for SDL_mixer

During play-back, how to find out what the current position is?: There is a Mix_SetMusicPosition but no Mix_GetMusicPosition. One solution would be to call SDL_GetTicks when you start the song, and then have your application keep note of how long it has been playing. This will probably prove accurate within a few milliseconds.
How to load an ogg file with SDL_mixer?: See that gpwiki hint.
Using an external libmikmod creates music.raw files: One solution is to manually rebuild SDL_mixer whilst disabling use of an external libmikmod. SDL_Mixer using an external libmikmod is the default on Gentoo.

[Back to table of contents]

Audio drivers

What advantages do higher level audio drivers have?

Device-sharing
Format conversion

What disadvantages do higher-level audio drivers have?

Lag: they are an extra level of buffering
More skipping for the same amount of lag
Extra CPU workload
Instability: some people need to do killall -9 artsd at least once a day
Insecurity: artsd runs best with root permissions, but this is no longer recommended due to security flaws
Unwanted features: not everybody wants to have his default soundcard potentially be on the other side of the Internet
More error conditions that cannot be bypassed: one can try an alternative OSS device, but good luck finding an alternative artsd!

Lower-level drivers almost almost always offer better performance in all the ways that matter inside SDL.

For example, in some cases if a SDL program is run from a non-root user, there might be about a 3/4 second delay for all the sound effects. If the program is run as root, the sound delay goes away. This is typical of using a sound daemon like artsd or esd.

Also, perhaps the daemons have suid privileges on /dev/dsp where the user does not have access directly to it: here again, one should check file permissions.

Changing of sound daemon

This page explains how to choose a specific audio driver, and lists them. Refer to this page as well to know the appropriate environment variables.

Trying playing around with the available audio drivers (ex: export SDL_AUDIODRIVER=esd in the shell, prior to running a SDL program), may resolve some audio issues.

Another work-around which is often efficient is to enter in the shell export SDL_AUDIODRIVER=dsp; export SDL_DSP_NOSELECT=1, still before launching the SDL program. However this work-around has been removed starting from 1.2.8.

oss

If it fails to open a device, it tries the next ones.

alsa

ALSA and SDL disturb each other

They should not if you do not use SDL_OpenAudio, and do not use SDL_INIT_AUDIO when calling SDL_Init.

Way too high latency with ALSA

The driver defaults on Linux seem to be way too high for real-time audio, but probably fine for the cute little dings that KDE and Gnome make. Adding something like this to ~/.asoundrc may help:

period_size 1024
buffer_size 4096

esd

One may use export ESD_NO_SPAWN=1 to prevent SDL from starting esd.

arts

arts does not seem to have a good reputation, often it has to be switched off for debugging purpose.

[Back to table of contents]

3D sound

SDL_sound does not support it, fmod is not free, that limits options to OpenAL.

The official OpenAL packages for Debian are old and buggy, compiling OpenAL from CVS can help. Thanks to OpenAL, a 3D audio engine with dynamic pitch and Ogg streaming can be done more easily.

Watch out for the ALut functions though, they are not portable. You cannot even load a WAV using them and expect to compile the same program for Linux. OpenAL is a good library but does not support MID and MOD files. If you want to play only Ogg or PCC wav files, try OpenAL with ear and source at [0,0,0].

Recent surround sound patches for SDL let us hope that SLD will basically provide more audio features.

[Back to table of contents]

Mods and MIDI

If your program plays midis by calling SDL_mixer to play them, then users of your program will need an SDL_mixer installed that can somehow play midis. That's all. SDL_mixer has various means of playing midis, depending on the operating system and how it has been installed. One of those ways uses SDL_mixer's timidity library.

Both mods and midis use instrument samples to make musical notes. Mods incorporate the samples they use into the mod file, but midi files do not contain samples. Midis depend on a synthesizer which can use its own samples, commonly from the standard General Midi set of instrument samples.

Mods are more flexible in the sounds they use: they do not have to stick to imitations of ordinary musical instruments. Mods tend to be musically less conventional and less sophisticated than midis.

[Back to table of contents]

Troubleshooting

I got: open /dev/sequencer: No such device

Check you have a /dev/sequencer pseudo-file on your system. See next hints if still out of luck

I got: open /dev/sequencer: Permission denied

Check you have the right permissions (ls -l /dev/sequencer), for example your user must often be part of the audio group. See next hint if still out of luck

/dev/sequencer exists, my user is in audio's group, but I still got: open /dev/sequencer: No such device

It is possibly because there is really no such device. Whether there is a sequencer device available depends on whether there is a driver for it in the kernel, in part.

On some systems, there are ALSA drivers but no OSS drivers. The /dev/sequencer device is an OSS special file, and it exists when the appropriate ALSA driver for OSS emulation is loaded. That driver is snd-seq-oss. This driver can be loaded with modprobe snd-seq-oss.

Before the above statement is executed, /dev/sequencer does not exist. Afterwards, it does exist. If you use the ALSA drivers, rather than the OSS drivers, it may work this way on your system.

You can check which kernel drivers are loaded with lsmod. If you have ALSA, you can check the availability of /dev/sequencer with cat /proc/asound/oss/sndstat. If there is no available driver for /dev/sequencer, under Synth devices: you will see Device not enabled in config, or something similar. If there is a driver, you will see the device driver listed here. If you use OSS drivers, use cat /dev/sndstat instead, for this information.

As /dev/sequencer is provided by OSS, not ALSA, one should make sure his kernel has support for OSS and/or the ALSA OSS compatibility library, and that SDL was built with ALSA support. And of course, one should not forget to actually load the ALSA OSS compatibility module and MIDI modules. They often are not loaded by default.

An application that needs only PCM sound, and not MIDI, should not fail because /dev/sequencer is not available.

Sometimes I have Mix_OpenAudio: No available audio device, and others open /dev/sequencer: Permission denied

Check the rights to /dev/sequencer: ls -l /dev/sequencer may output something like crw-rw---- 1 root audio 14, 1 2004-04-24 13:08 /dev/sequencer. Verify that the user is in audio group. In /etc/group one might see something as audio:x:29:yourUser,guest. If so, perhaps some other application is holding /dev/sequencer open. Try running lsof /dev/sequencer and/or fuser /dev/sequencer to check.

Problems with /dev/dsp

They are often very similar to the ones of /dev/sequencer: just run ps -edf | egrep 'artsd|esd', check the device permissions: ls -la /dev/dsp*, which processes use this device: fuser /dev/dsp or lsof /dev/dsp.

Then execute id and check if any returned group matches the one of the dsp, and if the dsp has group writability.

When I load a music from an Ogg Vorbis file using Mix_LoadMUS, will it completely load the file in memory and decode it when playing, or will it stream the file from the disk?

It is streamed from disk as needed during playback, as all "music" files are (as opposed to "special effects" files) [see full explanation]

At runtime, I get the message Warning: incorrect audio format, and when I play the track I just hear noise

Try AUDIO_S16SYS for endianness: developers should never use AUDIO_S16 in code, they should use AUDIO_S16SYS instead.

For midi files, the window opens but no sound can be heard: Timidity not working

Look in /etc/timidity/timidity.cfg, if this file exists, and see what is there. If there is a statement source freepats.cfg there, but there is no file freepats.cfg, then that is your problem.

Indeed, for timidity to work, you have to have a set of GUS .pat files on your system, and timidity has to be able to find them. One such set of .pat files is the freepats set.

Then, for timidity to find the .pat files, it has to find its configuration file, named timidity.cfg. One standard place for this file is in the directory /etc/timidity, and another standard place is in the directory /usr/local/lib/timidity. If you find the file on your system, you want to somehow fix things so timidity can find it. If, for example, you have the file timidity.cfg and the .pat files in /usr/local/lib/timidity, it might work to change directory to /etc and execute ln -s /usr/local/lib/timidity.

If you do not have timidity.cfg anywhere, you probably want to download and install the freepats set of patches, because that will have a sample timidity.cfg file. If you do have timidity.cfg, take a look at its contents. The GUS .pat files and .cfg files should either be in the same directory as timidity.cfg, or else there should be a statement dir <directory> that tells where they are to be found.

There are other sets of GUS .pat files. The eawpats set is more complete than freepats, but it does not seem to be available any more. If you want to go to the trouble, you can construct sets of .pat files from soundfonts using an utility program named unsf, which is in the distribution of the midi player gt, available here.

At the bottom of this page, one can download a freely redistributable set of GUS compatible patches: timidity.tar.gz [14 Mb]

With SDL_mixer, sounds are played differently under Windows or Linux

If buffer_size in Mix_OpenAudio(44100, MIX_DEFAULT_FORMAT, 2, buffer_size) is too high, try reducing it (ex: from 4 096, reduce it to 2 048 bytes). Latency should halve. Do not reduce it too much or you will have problems such as static and cracking with some soundcards, like SB Live for example. See our buffering section.

Poor sound quality with SDL_mixer, reverb and higher frequencies seem to be missing, everything sounds like played inside a tin can

Maybe the sound data is mono, so it has to be duplicated to two channels, for stereo output.

When using SDL_mixer on Linux, some unexpected latency experienced

On Linux, when using for example Mix_PlayChannel to play a loaded chunk of sound, the main thread in some cases simply stalls for a short duration of time, after which it plays the sound seemingly okay.

The arts daemon (the KDE sound daemon) is sometimes causing the delay. It appears that it buffers the sound and schedules it to share the sound device. What can be done is to kill the arts process so that the delay disappears.

Sound quality & latency depending on the user running the program being root or not

If the sound seems better as root, see our sound daemons section.

Otherwise, one might be getting more "real-time" permissions as root, but not as big a timeslice as a normal user.

This message always shows up: mcop warning: user defined signal handler found for SIG_PIPE, overriding.

SDL may use arts on your machine. A solution is to switch to another audio driver, see this section.

The code appears to work properly when the esd driver is in use, but switching to alsa or pcm drivers leaves the application running slowly and the pitch of the sound far too low

Is it an AC97 chip? There are problems with this chip family (distorded sound in many SDL games). SDL can detect this in some circumstances and print a message, but it does not work for me. There is a workaround for the dsp backend: export SDL_DSP_NOSELECT=1.

SDL_mixer does not seem to play a sound at the right speed

For example, it may play samples 20% faster than expected. The rate-conversion capabilities of SDL are limited. Take a look at the sample rate of the file and the sample rate you are playing audio at. Probably it is not a whole multiple of the file rate. See also the resample.sh script, to perform most suitable high-quality resampling to arbitrary frequencies with gain control (uses SoX).

fmod seems to work badly

fmod seems to have a bad time with ALSA users, OpenAL can be used instead.

[Back to table of contents]

Hardware-dependent audio issues

Sound on Windows

The MinGW version may be using WaveOut, where the Visual C++ version may be using DirectSound. SDL's WaveOut implementation enforces 250 ms buffering.

At 44.1 kHz, with WaveOut, a 8 kb audio buffer is needed, which is about 185 ms. DirectSound does not need as much: 2 kb usually works fine. WaveOut should only ever be used as a compatibility, fallback interface. More precisely, when using SDL_AUDIODRIVER=waveout, there is an awful lag (about 0.25 to 0.5 second) because Microsoft says the audio buffer for waveout must be above a certain minimum size, which happens to be huge. Smaller sizes will usually work anyway, but SDL keeps to the minimum to abide by the standard.

On Windows, sound output will not work unless you have set the video mode: we need to associate DirectSound with a window handle. So you need to:

initialize SDL with both SDL_INIT_VIDEO and SDL_INIT_AUDIO at the same time
call SDL_SetVideoMode before SDL_OpenAudio.

To get anywhere near the "standard" Linux, BeOS and Mac OS latencies on Windows, you need to use ASIO, EASI or similar "bypass" audio API instead of DirectSound. KernelStreams might work, but only if you disable the internal software mixer.

Choosing which DSP should be used if more than one sound card is available

To use an alternate DSP: export SDL_PATH_DSP=/dev/dspXXX where dspXXX is the proper DSP device.

On Windows, right-clicking on the file and selecting properties Windows told me that this file had a sample rate of 32000.

[Back to table of contents]

Some audio links

SDL_sound decodes several popular sound formats, such as MIDI, .wav and .mp3. With other libraries, it can manage OggVorbis and many other formats. SDL_sound can also handle sample rate, audio format, and channel conversion on-the-fly and behind-the-scenes
SDL_mixer can handle samples, channels, groups of channels, music and special effects, etc. [documentation]

[Back to table of contents]

Please react!

If you have information more detailed or more recent than those presented in this document, if you noticed errors, neglects or points insufficiently discussed, drop us a line!

[Top]

Last update: Friday, November 13, 2009

Audio with SDL

Overview

Table of contents

Terms & conventions

Buffering sounds

Understanding the runtime behaviour of the SDL sound subsystem

How to size the audio buffer? Latency versus skips

Callbacks

Resampling: changing frequency and channel output

Choosing the right library for audio output

SDL_mixer

Loading sounds with SDL_mixer

Using SDL_mixer

Troubleshooting for SDL_mixer

Audio drivers

What advantages do higher level audio drivers have?

What disadvantages do higher-level audio drivers have?

Changing of sound daemon

oss

alsa

esd

arts

3D sound

Mods and MIDI

Troubleshooting

Hardware-dependent audio issues

Sound on Windows

Choosing which DSP should be used if more than one sound card is available

Some audio links

Please react!