Foreword: the vast majority of these informations and pieces of advice comes from the various members of the SDL mailing list. Special thanks to David Olofson and Ryan C. Gordon for having shared their knowledge with the list. I hope that this thematical gathering will help the SDL users. Many thanks to the several SDL contributors!
This section focuses on the different ways of managing sounds thanks to SDL and its helper libraries.
Due to the amount of relevant informations, this is still a document in progress and any help would be appreciated.
Terms & conventions
Mods and MIDI
Hardware-dependent audio issues
For a given audio source, all samples respect the same format, like the pixels of a surface that respect the pixel format of the surface. The size of a sample depends on its format, which itself depends on:
Hence one sample of a stereo 8 bit audio source is:
1 [8 bits] * 2 [channels] = 2 bytes.
Frequency is the number of samples played per second, expressed in Hertz (Hz) or kilo-Hertz (kHz, 1 kHz = 1000 Hz). You would set this to 44100 Hz (or 44,1 kHz) for "CD quality", or about 8 kHz for phone quality, for example.
A typical SDL audio driver requests two buffers at whatever sample size is specified by the application, and then pre-fills them with silence. Then it double-buffers sound, filling one buffer (thanks to the user-specified callback) while the audio hardware is reading the other.
For example, let A and B be buffers, both 4kb large (1024 samples of 16-bit stereo data):
This scheme assumes that the application is able to fill the buffers in time, which is usually true with current CPU and OS combinations.
These days, most sound cards seem to work best with 48 kHz, since that is what the converters are designed for and/or because the cards run at that rate all the time internally, no matter what sample rates applications use.
The larger the buffer, the more latency you will experience: it is a FIFO buffer, so the bigger it is, the longer it takes from when you put data in it to when it comes out the speaker. The latency (also known as lag) corresponds to the time taken to play back one buffer of audio data.
Reciprocally, if the buffer is too small, you will get skips in the audio (the callback will not be called soon enough to fill the buffer on time). It is a balancing act. No automatic size guess can be easily implemented to adapt the values to the actual platform.
To better diagnose issues coming from an irrelevant buffer size, just hear the result if you are not satisfied with your sound output:
You have to be prepared for any buffer size, regardless of what you ask for. Normally, you would just design all your audio code to be callback driven with a similar interface to that of the SDL audio callback. Every function is given some I/O buffers and is told to process N samples. That way, it is extremely simple to build arbitrarily complex trees of DSP functions. The right number of samples are generated for each audio callback, which minimizes latency and makes it easier to avoid drastically varying callback execution times.
This is how all major audio plugin and I/O APIs like VST, TDM, AudioUnits, LADSPA, DirectX, CoreAudio, PortAudio, ASIO, EASI, JACK, ReWire, etc. work. It is the only design that really works in anything but the most trivial cases.
Although it is not perfectly nice and clean code, this mixer example (
simplemixer-1.1) demonstrates very basic mixing of multichannel sound with the "raw" SDL audio API.
To fight against crakles, one can increase the buffering (latency) so that the scheduling jitter accounts for a smaller part of the audio buffer period. That way, you can use more CPU power for audio with less drop-outs, but on an OS like Windows, there is no way ever you are going to get anywhere near 100%, and/or totally eliminate the drop-outs when doing low latency audio. Maybe you can get pretty close with Windows 2000 or XP, a decent profesional sound card and running your audio code in kernel space as Kernel Stream filters, but such configuration would be insane for "consumer" multimedia such as games.
No settings could be universal, since every system (OS, hardware, configuration, ...) is different. Therefore one has often to fine-tune its buffer size for each hardware, which advocates for a way for the user to specify sizes different from the default hardcoded ones (thanks to command line, configuration, etc.)
Of course, if latency is irrelevant, you can decide on some reasonably safe value (such as buffers of 4096 samples), but it would not be totally safe, and it may be way too much latency for most applications. For average play-back frequencies, a buffer size of 2048 samples might a sensible value that should work on most systems, keeping in mind the optimum size results from a hardware-dependant compromise. If still hearing crackles, increase that buffer size.
Note you need to scale the buffer size with sample rate, sample format and number of channels, to maintain the desired amount of buffering in terms of latency. However, do not expect a common value to be totally accurate on all platforms. Internal buffering and stuff in the OS and/or drivers may affect things in weird ways, so do not rely too heavily on it.
Therefore the value for samples in the audiospec needs to be appropriate for the audio rate which is to be set. Here are some values that should be fairly good for stereo data:
|Playing frequency (in kHz)||Buffer size (in bytes)|
Finally, latency comes also from leading silences, that should be removed (and ending silences as well) from all audio assets. See the trimSilence.sh script (uses SoX).
The callback is the user-provided function regularly called by the audio system so that the audio output is rendered. As already described about buffer sizes, one has to make sure his buffering is set up so that it is small enough to stay in reasonable sync with the gameplay, but large enough so that there is not too much callback overhead and one does not fall inside the non-realtimeness of modern day systems. Some experimentation will be required.
The buffer contents will be played only after having returned from the callback. The data that would go into the buffer is preferably to be generated in the callback. Otherwise, one would need extra buffering and thread safe communication between the audio engine and the callback. This can make sense in some cases, but definitely not when you want interactive low latency sound effects.
Preferably never call
SDL_UnlockAudio. Use some sort of lock-free communication between the main thread and the audio callback context instead, to avoid increasing the callback "scheduling" jitter.
SDL does some simple internal resampling on load, but that only gives double or halving resamples. One would therefore need an audio resampling that works with Mix_Chunks from SDL_mixer, such as the Pymedia's Resample object. Or get
resample.c from ffmpeg project. They do have simple linear interpolation. ffmpeg from the 0.4.8 version supports mono and stereo resampling, mono to stereo, stereo to mono, stereo to 5.1 resampling. Pymedia supports 5.1/4/5 to stereo also.
Main choices are:
mpglibpackaged with SDL_sound seems not to be even worse than SMPEG audio support
SDL_mixer can handle samples, channels, groups of channels, music and special effects, etc. [documentation]
"Audio files" (wav, voc, etc.) are loaded in their entirety before playing, "music files" (mp3, ogg, midi) are streamed from the
RWops as needed.
When you use SDL_mixer to load a Mix_Chunk, it also converts the sample to the current output format of the opened audio device. So it is not surprising to see a low frequency file or mono or perhaps 8 bit samples being resampled to a high frequency, stereo, 16 bit one, which may account for a 16 time increase in memory size, compared to file size. So all you have to know to predict the size in memory is the audio output format, which you can get from a query function.
If you need finer-grained control on all files, you might want to look into SDL_sound, and gluing it to SDL_mixer. There was a patch floating around for that at one point.
Audio files (short sounds) are usually better managed in the WAV unencoded format. The best choice for music files (longer sounds) is often the OggVorbis format, a lossy audio codec whose performances are quite similar to the ones of the MP3 codecs on most platforms (except maybe embedded ones with low resources, ex: the ARM7 of the Nintendo DS) and which is not subject to licensing fees.
Most of the time, the calling code can just pass a null pointer for
obtained because it does not care what the underlying audio driver is doing.
A very simple example of use would be:
Mix_LoadWAV supports WAVE, VOC, AIFF, and OGG. SDL_mixer can play several Ogg streams simultaneously: the
Mix_Chunk structure is created with either
Mix_LoadWAV_RW. Despite their name, they both load Ogg just fine, if SDL_mixer is built with Ogg support (thanks to
libvorbis). One would almost certainly not want to use .ogg files of any significant size with
Mix_LoadWAV*, since it would have to predecode the whole thing in memory, whereas the
LoadMUS path decodes more as it needs it, i.e. on the fly.
Other SDL_mixer examples: 1, 2.
Mix_GetMusicPosition. One solution would be to call
SDL_GetTickswhen you start the song, and then have your application keep note of how long it has been playing. This will probably prove accurate within a few milliseconds.
killall -9 artsdat least once a day
Lower-level drivers almost almost always offer better performance in all the ways that matter inside SDL.
For example, in some cases if a SDL program is run from a non-root user, there might be about a 3/4 second delay for all the sound effects. If the program is run as root, the sound delay goes away. This is typical of using a sound daemon like artsd or esd.
Also, perhaps the daemons have suid privileges on
/dev/dsp where the user does not have access directly to it: here again, one should check file permissions.
This page explains how to choose a specific audio driver, and lists them. Refer to this page as well to know the appropriate environment variables.
Trying playing around with the available audio drivers (ex:
export SDL_AUDIODRIVER=esd in the shell, prior to running a SDL program), may resolve some audio issues.
Another work-around which is often efficient is to enter in the shell
export SDL_AUDIODRIVER=dsp; export SDL_DSP_NOSELECT=1, still before launching the SDL program. However this work-around has been removed starting from 1.2.8.
If it fails to open a device, it tries the next ones.
SDL_OpenAudio, and do not use
One may use
export ESD_NO_SPAWN=1 to prevent SDL from starting esd.
arts does not seem to have a good reputation, often it has to be switched off for debugging purpose.
SDL_sound does not support it, fmod is not free, that limits options to OpenAL.
The official OpenAL packages for Debian are old and buggy, compiling OpenAL from CVS can help. Thanks to OpenAL, a 3D audio engine with dynamic pitch and Ogg streaming can be done more easily.
Watch out for the
ALut functions though, they are not portable. You cannot even load a WAV using them and expect to compile the same program for Linux. OpenAL is a good library but does not support MID and MOD files. If you want to play only Ogg or PCC wav files, try OpenAL with ear and source at
Recent surround sound patches for SDL let us hope that SLD will basically provide more audio features.
If your program plays midis by calling SDL_mixer to play them, then users of your program will need an SDL_mixer installed that can somehow play midis. That's all. SDL_mixer has various means of playing midis, depending on the operating system and how it has been installed. One of those ways uses SDL_mixer's timidity library.
Both mods and midis use instrument samples to make musical notes. Mods incorporate the samples they use into the mod file, but midi files do not contain samples. Midis depend on a synthesizer which can use its own samples, commonly from the standard General Midi set of instrument samples.
Mods are more flexible in the sounds they use: they do not have to stick to imitations of ordinary musical instruments. Mods tend to be musically less conventional and less sophisticated than midis.
open /dev/sequencer: No such device
/dev/sequencerpseudo-file on your system. See next hints if still out of luck
open /dev/sequencer: Permission denied
ls -l /dev/sequencer), for example your user must often be part of the
audiogroup. See next hint if still out of luck
/dev/sequencerexists, my user is in
audio's group, but I still got:
open /dev/sequencer: No such device
It is possibly because there is really no such device. Whether there is a sequencer device available depends on whether there is a driver for it in the kernel, in part.
On some systems, there are ALSA drivers but no OSS drivers. The
/dev/sequencer device is an OSS special file, and it exists when the appropriate ALSA driver for OSS emulation is loaded. That driver is
snd-seq-oss. This driver can be loaded with
Before the above statement is executed,
/dev/sequencer does not exist. Afterwards, it does exist. If you use the ALSA drivers, rather than the OSS drivers, it may work this way on your system.
You can check which kernel drivers are loaded with
lsmod. If you have ALSA, you can check the availability of
cat /proc/asound/oss/sndstat. If there is no available driver for
Synth devices: you will see
Device not enabled in config, or something similar. If there is a driver, you will see the device driver listed here. If you
use OSS drivers, use
cat /dev/sndstat instead, for this information.
/dev/sequencer is provided by OSS, not ALSA, one should make sure his kernel has support for OSS and/or the ALSA OSS compatibility library, and that SDL was built with ALSA support. And of course, one should not forget to actually load the ALSA OSS compatibility module and MIDI modules. They often are not loaded by default.
An application that needs only PCM sound, and not MIDI, should not fail because
/dev/sequencer is not available.
Mix_OpenAudio: No available audio device, and others
open /dev/sequencer: Permission denied
ls -l /dev/sequencermay output something like
crw-rw---- 1 root audio 14, 1 2004-04-24 13:08 /dev/sequencer. Verify that the user is in
/etc/groupone might see something as
audio:x:29:yourUser,guest. If so, perhaps some other application is holding
/dev/sequenceropen. Try running
fuser /dev/sequencerto check.
They are often very similar to the ones of
/dev/sequencer: just run
ps -edf | egrep 'artsd|esd', check the device permissions:
ls -la /dev/dsp*, which processes use this device:
fuser /dev/dsp or
id and check if any returned group matches the one of the dsp, and if the dsp has group writability.
Mix_LoadMUS, will it completely load the file in memory and decode it when playing, or will it stream the file from the disk?
Warning: incorrect audio format, and when I play the track I just hear noise
AUDIO_S16SYSfor endianness: developers should never use
AUDIO_S16in code, they should use
/etc/timidity/timidity.cfg, if this file exists, and see what is there. If there is a statement
source freepats.cfg there, but there is no file
freepats.cfg, then that is your problem.
Indeed, for timidity to work, you have to have a set of GUS
.pat files on your system, and timidity has to be able to find them. One such set of
.pat files is the freepats set.
Then, for timidity to find the
.pat files, it has to find its configuration file, named
timidity.cfg. One standard place for this file is in the directory
/etc/timidity, and another standard place is in the directory
/usr/local/lib/timidity. If you find the file on your system, you want to somehow fix things so timidity can find it. If, for example, you have the file
timidity.cfg and the
.pat files in
/usr/local/lib/timidity, it might work to change directory to
/etc and execute
ln -s /usr/local/lib/timidity.
If you do not have
timidity.cfg anywhere, you probably want to download and install the freepats set of patches, because that will have a sample
timidity.cfg file. If you do have
timidity.cfg, take a look at its contents. The GUS
.pat files and
.cfg files should either be in the same directory as
timidity.cfg, or else there should be a statement
dir <directory> that tells where they are to be found.
There are other sets of GUS
.pat files. The
eawpats set is more complete than
freepats, but it does not seem to be available any more. If you want to go to the trouble, you can construct sets of
.pat files from
soundfonts using an utility program named
unsf, which is in the distribution of the midi player gt, available here.
At the bottom of this page, one can download a freely redistributable set of GUS compatible patches: timidity.tar.gz [14 Mb]
buffer_size in Mix_OpenAudio
(44100, MIX_DEFAULT_FORMAT, 2, buffer_size) is too high, try reducing it (ex: from 4 096, reduce it to 2 048 bytes). Latency should halve. Do not reduce it too much or you will have problems such as static and cracking with some soundcards, like SB Live for example. See our buffering section.
On Linux, when using for example
Mix_PlayChannel to play a loaded chunk of sound, the main thread in some cases simply stalls for a short duration of time, after which it plays the sound seemingly okay.
arts daemon (the KDE sound daemon) is sometimes causing the delay. It appears that it buffers the sound and schedules it to share the sound device. What can be done is to kill the arts process so that the delay disappears.
If the sound seems better as root, see our sound daemons section.
Otherwise, one might be getting more "real-time" permissions as root, but not as big a timeslice as a normal user.
mcop warning: user defined signal handler found for SIG_PIPE, overriding.
esddriver is in use, but switching to
pcmdrivers leaves the application running slowly and the pitch of the sound far too low
AC97chip? There are problems with this chip family (distorded sound in many SDL games). SDL can detect this in some circumstances and print a message, but it does not work for me. There is a workaround for the
The MinGW version may be using WaveOut, where the Visual C++ version may be using DirectSound. SDL's WaveOut implementation enforces 250 ms buffering.
At 44.1 kHz, with WaveOut, a 8 kb audio buffer is needed, which is about 185 ms. DirectSound does not need as much: 2 kb usually works fine. WaveOut should only ever be used as a compatibility, fallback interface. More precisely, when using
SDL_AUDIODRIVER=waveout, there is an awful lag (about 0.25 to 0.5 second) because Microsoft says the audio buffer for waveout must
be above a certain minimum size, which happens to be huge. Smaller sizes will usually work anyway, but SDL keeps to the minimum to abide by the standard.
On Windows, sound output will not work unless you have set the video mode: we need to associate DirectSound with a window handle. So you need to:
SDL_INIT_AUDIOat the same time
To get anywhere near the "standard" Linux, BeOS and Mac OS latencies on Windows, you need to use ASIO, EASI or similar "bypass" audio API instead of DirectSound. KernelStreams might work, but only if you disable the internal software mixer.
To use an alternate DSP:
export SDL_PATH_DSP=/dev/dspXXX where dspXXX is the proper DSP device.
On Windows, right-clicking on the file and selecting properties Windows told me that this file had a sample rate of 32000.
If you have information more detailed or more recent than those presented in this document, if you noticed errors, neglects or points insufficiently discussed, drop us a line!