Audio with SDL

Foreword: the vast majority of these informations and pieces of advice comes from the various members of the SDL mailing list. Special thanks to David Olofson and Ryan C. Gordon for having shared their knowledge with the list. I hope that this thematical gathering will help the SDL users. Many thanks to the several SDL contributors!

Overview

This section focuses on the different ways of managing sounds thanks to SDL and its helper libraries.

Due to the amount of relevant informations, this is still a document in progress and any help would be appreciated.

Table of contents

Terms & conventions

For a given audio source, all samples respect the same format, like the pixels of a surface that respect the pixel format of the surface. The size of a sample depends on its format, which itself depends on:

SampleSize = (size of a quantification unit) * (number of channels)

Hence one sample of a stereo 8 bit audio source is: 1 [8 bits] * 2 [channels] = 2 bytes.

Frequency is the number of samples played per second, expressed in Hertz (Hz) or kilo-Hertz (kHz, 1 kHz = 1000 Hz). You would set this to 44100 Hz (or 44,1 kHz) for "CD quality", or about 8 kHz for phone quality, for example.


[Back to table of contents]


Buffering sounds

Understanding the runtime behaviour of the SDL sound subsystem

A typical SDL audio driver requests two buffers at whatever sample size is specified by the application, and then pre-fills them with silence. Then it double-buffers sound, filling one buffer (thanks to the user-specified callback) while the audio hardware is reading the other.

For example, let A and B be buffers, both 4kb large (1024 samples of 16-bit stereo data):

  1. A and B are allocated and filled with silence
  2. Audio hardware is triggered to playback A
  3. Application gets a callback to fill B
  4. Audio driver waits for A to finish playing and triggers playing B
  5. Application gets a callback to fill A
  6. ...and so on...

This scheme assumes that the application is able to fill the buffers in time, which is usually true with current CPU and OS combinations.


[Back to table of contents]


How to size the audio buffer? Latency versus skips

These days, most sound cards seem to work best with 48 kHz, since that is what the converters are designed for and/or because the cards run at that rate all the time internally, no matter what sample rates applications use.

The larger the buffer, the more latency you will experience: it is a FIFO buffer, so the bigger it is, the longer it takes from when you put data in it to when it comes out the speaker. The latency (also known as lag) corresponds to the time taken to play back one buffer of audio data.

Reciprocally, if the buffer is too small, you will get skips in the audio (the callback will not be called soon enough to fill the buffer on time). It is a balancing act. No automatic size guess can be easily implemented to adapt the values to the actual platform.

To better diagnose issues coming from an irrelevant buffer size, just hear the result if you are not satisfied with your sound output:

You have to be prepared for any buffer size, regardless of what you ask for. Normally, you would just design all your audio code to be callback driven with a similar interface to that of the SDL audio callback. Every function is given some I/O buffers and is told to process N samples. That way, it is extremely simple to build arbitrarily complex trees of DSP functions. The right number of samples are generated for each audio callback, which minimizes latency and makes it easier to avoid drastically varying callback execution times.

This is how all major audio plugin and I/O APIs like VST, TDM, AudioUnits, LADSPA, DirectX, CoreAudio, PortAudio, ASIO, EASI, JACK, ReWire, etc. work. It is the only design that really works in anything but the most trivial cases.

Although it is not perfectly nice and clean code, this mixer example (simplemixer-1.1) demonstrates very basic mixing of multichannel sound with the "raw" SDL audio API.

To fight against crakles, one can increase the buffering (latency) so that the scheduling jitter accounts for a smaller part of the audio buffer period. That way, you can use more CPU power for audio with less drop-outs, but on an OS like Windows, there is no way ever you are going to get anywhere near 100%, and/or totally eliminate the drop-outs when doing low latency audio. Maybe you can get pretty close with Windows 2000 or XP, a decent profesional sound card and running your audio code in kernel space as Kernel Stream filters, but such configuration would be insane for "consumer" multimedia such as games.

No settings could be universal, since every system (OS, hardware, configuration, ...) is different. Therefore one has often to fine-tune its buffer size for each hardware, which advocates for a way for the user to specify sizes different from the default hardcoded ones (thanks to command line, configuration, etc.)

Of course, if latency is irrelevant, you can decide on some reasonably safe value (such as buffers of 4096 samples), but it would not be totally safe, and it may be way too much latency for most applications. For average play-back frequencies, a buffer size of 2048 samples might a sensible value that should work on most systems, keeping in mind the optimum size results from a hardware-dependant compromise. If still hearing crackles, increase that buffer size.

Note you need to scale the buffer size with sample rate, sample format and number of channels, to maintain the desired amount of buffering in terms of latency. However, do not expect a common value to be totally accurate on all platforms. Internal buffering and stuff in the OS and/or drivers may affect things in weird ways, so do not rely too heavily on it.

Therefore the value for samples in the audiospec needs to be appropriate for the audio rate which is to be set. Here are some values that should be fairly good for stereo data:

Playing frequency (in kHz) Buffer size (in bytes)
11
512
22
1024
44
2048

Finally, latency comes also from leading silences, that should be removed (and ending silences as well) from all audio assets. See the trimSilence.sh script (uses SoX).


[Back to table of contents]


Callbacks

The callback is the user-provided function regularly called by the audio system so that the audio output is rendered. As already described about buffer sizes, one has to make sure his buffering is set up so that it is small enough to stay in reasonable sync with the gameplay, but large enough so that there is not too much callback overhead and one does not fall inside the non-realtimeness of modern day systems. Some experimentation will be required.

The buffer contents will be played only after having returned from the callback. The data that would go into the buffer is preferably to be generated in the callback. Otherwise, one would need extra buffering and thread safe communication between the audio engine and the callback. This can make sense in some cases, but definitely not when you want interactive low latency sound effects.

Preferably never call SDL_LockAudio / SDL_UnlockAudio. Use some sort of lock-free communication between the main thread and the audio callback context instead, to avoid increasing the callback "scheduling" jitter.


[Back to table of contents]


Resampling: changing frequency and channel output

SDL does some simple internal resampling on load, but that only gives double or halving resamples. One would therefore need an audio resampling that works with Mix_Chunks from SDL_mixer, such as the Pymedia's Resample object. Or get resample.c from ffmpeg project. They do have simple linear interpolation. ffmpeg from the 0.4.8 version supports mono and stereo resampling, mono to stereo, stereo to mono, stereo to 5.1 resampling. Pymedia supports 5.1/4/5 to stereo also.


[Back to table of contents]


Choosing the right library for audio output

Main choices are:

SDL_mixer

SDL_mixer can handle samples, channels, groups of channels, music and special effects, etc. [documentation]

Loading sounds with SDL_mixer

"Audio files" (wav, voc, etc.) are loaded in their entirety before playing, "music files" (mp3, ogg, midi) are streamed from the RWops as needed.

When you use SDL_mixer to load a Mix_Chunk, it also converts the sample to the current output format of the opened audio device. So it is not surprising to see a low frequency file or mono or perhaps 8 bit samples being resampled to a high frequency, stereo, 16 bit one, which may account for a 16 time increase in memory size, compared to file size. So all you have to know to predict the size in memory is the audio output format, which you can get from a query function.

If you need finer-grained control on all files, you might want to look into SDL_sound, and gluing it to SDL_mixer. There was a patch floating around for that at one point.

Using SDL_mixer

Audio files (short sounds) are usually better managed in the WAV unencoded format. The best choice for music files (longer sounds) is often the OggVorbis format, a lossy audio codec whose performances are quite similar to the ones of the MP3 codecs on most platforms (except maybe embedded ones with low resources, ex: the ARM7 of the Nintendo DS) and which is not subject to licensing fees.

Most of the time, the calling code can just pass a null pointer for obtained because it does not care what the underlying audio driver is doing.

A very simple example of use would be:

Mix_PlayMusic( music, 0 ) ;

while ( Mix_PlayingMusic() == 1 )
{
	// Do something
	SDL_Delay( 10 ) ;
}
Mix_FreeMusic( music ) ;

Currently, Mix_LoadWAV supports WAVE, VOC, AIFF, and OGG. SDL_mixer can play several Ogg streams simultaneously: the Mix_Chunk structure is created with either Mix_LoadWAV or Mix_LoadWAV_RW. Despite their name, they both load Ogg just fine, if SDL_mixer is built with Ogg support (thanks to libogg and libvorbis). One would almost certainly not want to use .ogg files of any significant size with Mix_LoadWAV*, since it would have to predecode the whole thing in memory, whereas the LoadMUS path decodes more as it needs it, i.e. on the fly.

Other SDL_mixer examples: 1, 2.

Troubleshooting for SDL_mixer

During play-back, how to find out what the current position is?
There is a Mix_SetMusicPosition but no Mix_GetMusicPosition. One solution would be to call SDL_GetTicks when you start the song, and then have your application keep note of how long it has been playing. This will probably prove accurate within a few milliseconds.
How to load an ogg file with SDL_mixer?
See that gpwiki hint.
Using an external libmikmod creates music.raw files
One solution is to manually rebuild SDL_mixer whilst disabling use of an external libmikmod. SDL_Mixer using an external libmikmod is the default on Gentoo.


[Back to table of contents]


Audio drivers

What advantages do higher level audio drivers have?

What disadvantages do higher-level audio drivers have?

Lower-level drivers almost almost always offer better performance in all the ways that matter inside SDL.

For example, in some cases if a SDL program is run from a non-root user, there might be about a 3/4 second delay for all the sound effects. If the program is run as root, the sound delay goes away. This is typical of using a sound daemon like artsd or esd.

Also, perhaps the daemons have suid privileges on /dev/dsp where the user does not have access directly to it: here again, one should check file permissions.

Changing of sound daemon

This page explains how to choose a specific audio driver, and lists them. Refer to this page as well to know the appropriate environment variables.

Trying playing around with the available audio drivers (ex: export SDL_AUDIODRIVER=esd in the shell, prior to running a SDL program), may resolve some audio issues.

Another work-around which is often efficient is to enter in the shell export SDL_AUDIODRIVER=dsp; export SDL_DSP_NOSELECT=1, still before launching the SDL program. However this work-around has been removed starting from 1.2.8.

oss

If it fails to open a device, it tries the next ones.

alsa

ALSA and SDL disturb each other
They should not if you do not use SDL_OpenAudio, and do not use SDL_INIT_AUDIO when calling SDL_Init.
Way too high latency with ALSA
The driver defaults on Linux seem to be way too high for real-time audio, but probably fine for the cute little dings that KDE and Gnome make. Adding something like this to ~/.asoundrc may help:
period_size 1024
buffer_size 4096

esd

One may use export ESD_NO_SPAWN=1 to prevent SDL from starting esd.

arts

arts does not seem to have a good reputation, often it has to be switched off for debugging purpose.


[Back to table of contents]


3D sound

SDL_sound does not support it, fmod is not free, that limits options to OpenAL.

The official OpenAL packages for Debian are old and buggy, compiling OpenAL from CVS can help. Thanks to OpenAL, a 3D audio engine with dynamic pitch and Ogg streaming can be done more easily.

Watch out for the ALut functions though, they are not portable. You cannot even load a WAV using them and expect to compile the same program for Linux. OpenAL is a good library but does not support MID and MOD files. If you want to play only Ogg or PCC wav files, try OpenAL with ear and source at [0,0,0].

Recent surround sound patches for SDL let us hope that SLD will basically provide more audio features.


[Back to table of contents]


Mods and MIDI

If your program plays midis by calling SDL_mixer to play them, then users of your program will need an SDL_mixer installed that can somehow play midis. That's all. SDL_mixer has various means of playing midis, depending on the operating system and how it has been installed. One of those ways uses SDL_mixer's timidity library.

Both mods and midis use instrument samples to make musical notes. Mods incorporate the samples they use into the mod file, but midi files do not contain samples. Midis depend on a synthesizer which can use its own samples, commonly from the standard General Midi set of instrument samples.

Mods are more flexible in the sounds they use: they do not have to stick to imitations of ordinary musical instruments. Mods tend to be musically less conventional and less sophisticated than midis.


[Back to table of contents]


Troubleshooting

I got: open /dev/sequencer: No such device
Check you have a /dev/sequencer pseudo-file on your system. See next hints if still out of luck
I got: open /dev/sequencer: Permission denied
Check you have the right permissions (ls -l /dev/sequencer), for example your user must often be part of the audio group. See next hint if still out of luck
/dev/sequencer exists, my user is in audio's group, but I still got: open /dev/sequencer: No such device

It is possibly because there is really no such device. Whether there is a sequencer device available depends on whether there is a driver for it in the kernel, in part.

On some systems, there are ALSA drivers but no OSS drivers. The /dev/sequencer device is an OSS special file, and it exists when the appropriate ALSA driver for OSS emulation is loaded. That driver is snd-seq-oss. This driver can be loaded with modprobe snd-seq-oss.

Before the above statement is executed, /dev/sequencer does not exist. Afterwards, it does exist. If you use the ALSA drivers, rather than the OSS drivers, it may work this way on your system.

You can check which kernel drivers are loaded with lsmod. If you have ALSA, you can check the availability of /dev/sequencer with cat /proc/asound/oss/sndstat. If there is no available driver for /dev/sequencer, under Synth devices: you will see Device not enabled in config, or something similar. If there is a driver, you will see the device driver listed here. If you use OSS drivers, use cat /dev/sndstat instead, for this information.

As /dev/sequencer is provided by OSS, not ALSA, one should make sure his kernel has support for OSS and/or the ALSA OSS compatibility library, and that SDL was built with ALSA support. And of course, one should not forget to actually load the ALSA OSS compatibility module and MIDI modules. They often are not loaded by default.

An application that needs only PCM sound, and not MIDI, should not fail because /dev/sequencer is not available.

Sometimes I have Mix_OpenAudio: No available audio device, and others open /dev/sequencer: Permission denied
Check the rights to /dev/sequencer: ls -l /dev/sequencer may output something like crw-rw---- 1 root audio 14, 1 2004-04-24 13:08 /dev/sequencer. Verify that the user is in audio group. In /etc/group one might see something as audio:x:29:yourUser,guest. If so, perhaps some other application is holding /dev/sequencer open. Try running lsof /dev/sequencer and/or fuser /dev/sequencer to check.
Problems with /dev/dsp

They are often very similar to the ones of /dev/sequencer: just run ps -edf | egrep 'artsd|esd', check the device permissions: ls -la /dev/dsp*, which processes use this device: fuser /dev/dsp or lsof /dev/dsp.

Then execute id and check if any returned group matches the one of the dsp, and if the dsp has group writability.

When I load a music from an Ogg Vorbis file using Mix_LoadMUS, will it completely load the file in memory and decode it when playing, or will it stream the file from the disk?
It is streamed from disk as needed during playback, as all "music" files are (as opposed to "special effects" files) [see full explanation]
At runtime, I get the message Warning: incorrect audio format, and when I play the track I just hear noise
Try AUDIO_S16SYS for endianness: developers should never use AUDIO_S16 in code, they should use AUDIO_S16SYS instead.
For midi files, the window opens but no sound can be heard: Timidity not working

Look in /etc/timidity/timidity.cfg, if this file exists, and see what is there. If there is a statement source freepats.cfg there, but there is no file freepats.cfg, then that is your problem.

Indeed, for timidity to work, you have to have a set of GUS .pat files on your system, and timidity has to be able to find them. One such set of .pat files is the freepats set.

Then, for timidity to find the .pat files, it has to find its configuration file, named timidity.cfg. One standard place for this file is in the directory /etc/timidity, and another standard place is in the directory /usr/local/lib/timidity. If you find the file on your system, you want to somehow fix things so timidity can find it. If, for example, you have the file timidity.cfg and the .pat files in /usr/local/lib/timidity, it might work to change directory to /etc and execute ln -s /usr/local/lib/timidity.

If you do not have timidity.cfg anywhere, you probably want to download and install the freepats set of patches, because that will have a sample timidity.cfg file. If you do have timidity.cfg, take a look at its contents. The GUS .pat files and .cfg files should either be in the same directory as timidity.cfg, or else there should be a statement dir <directory> that tells where they are to be found.

There are other sets of GUS .pat files. The eawpats set is more complete than freepats, but it does not seem to be available any more. If you want to go to the trouble, you can construct sets of .pat files from soundfonts using an utility program named unsf, which is in the distribution of the midi player gt, available here.

At the bottom of this page, one can download a freely redistributable set of GUS compatible patches: timidity.tar.gz [14 Mb]

With SDL_mixer, sounds are played differently under Windows or Linux

If buffer_size in Mix_OpenAudio(44100, MIX_DEFAULT_FORMAT, 2, buffer_size) is too high, try reducing it (ex: from 4 096, reduce it to 2 048 bytes). Latency should halve. Do not reduce it too much or you will have problems such as static and cracking with some soundcards, like SB Live for example. See our buffering section.

Poor sound quality with SDL_mixer, reverb and higher frequencies seem to be missing, everything sounds like played inside a tin can
Maybe the sound data is mono, so it has to be duplicated to two channels, for stereo output.
When using SDL_mixer on Linux, some unexpected latency experienced

On Linux, when using for example Mix_PlayChannel to play a loaded chunk of sound, the main thread in some cases simply stalls for a short duration of time, after which it plays the sound seemingly okay.

The arts daemon (the KDE sound daemon) is sometimes causing the delay. It appears that it buffers the sound and schedules it to share the sound device. What can be done is to kill the arts process so that the delay disappears.

Sound quality & latency depending on the user running the program being root or not

If the sound seems better as root, see our sound daemons section.

Otherwise, one might be getting more "real-time" permissions as root, but not as big a timeslice as a normal user.

This message always shows up: mcop warning: user defined signal handler found for SIG_PIPE, overriding.
SDL may use arts on your machine. A solution is to switch to another audio driver, see this section.
The code appears to work properly when the esd driver is in use, but switching to alsa or pcm drivers leaves the application running slowly and the pitch of the sound far too low
Is it an AC97 chip? There are problems with this chip family (distorded sound in many SDL games). SDL can detect this in some circumstances and print a message, but it does not work for me. There is a workaround for the dsp backend: export SDL_DSP_NOSELECT=1.
SDL_mixer does not seem to play a sound at the right speed
For example, it may play samples 20% faster than expected. The rate-conversion capabilities of SDL are limited. Take a look at the sample rate of the file and the sample rate you are playing audio at. Probably it is not a whole multiple of the file rate. See also the resample.sh script, to perform most suitable high-quality resampling to arbitrary frequencies with gain control (uses SoX).
fmod seems to work badly
fmod seems to have a bad time with ALSA users, OpenAL can be used instead.

[Back to table of contents]


Hardware-dependent audio issues

Sound on Windows

The MinGW version may be using WaveOut, where the Visual C++ version may be using DirectSound. SDL's WaveOut implementation enforces 250 ms buffering.

At 44.1 kHz, with WaveOut, a 8 kb audio buffer is needed, which is about 185 ms. DirectSound does not need as much: 2 kb usually works fine. WaveOut should only ever be used as a compatibility, fallback interface. More precisely, when using SDL_AUDIODRIVER=waveout, there is an awful lag (about 0.25 to 0.5 second) because Microsoft says the audio buffer for waveout must be above a certain minimum size, which happens to be huge. Smaller sizes will usually work anyway, but SDL keeps to the minimum to abide by the standard.

On Windows, sound output will not work unless you have set the video mode: we need to associate DirectSound with a window handle. So you need to:

  1. initialize SDL with both SDL_INIT_VIDEO and SDL_INIT_AUDIO at the same time
  2. call SDL_SetVideoMode before SDL_OpenAudio.

To get anywhere near the "standard" Linux, BeOS and Mac OS latencies on Windows, you need to use ASIO, EASI or similar "bypass" audio API instead of DirectSound. KernelStreams might work, but only if you disable the internal software mixer.

Choosing which DSP should be used if more than one sound card is available

To use an alternate DSP: export SDL_PATH_DSP=/dev/dspXXX where dspXXX is the proper DSP device.

On Windows, right-clicking on the file and selecting properties Windows told me that this file had a sample rate of 32000.


[Back to table of contents]




Some audio links


[Back to table of contents]




Please react!

If you have information more detailed or more recent than those presented in this document, if you noticed errors, neglects or points insufficiently discussed, drop us a line!




[Top]

Last update: Friday, November 13, 2009