Science & Tech

Playing and Recording Sound on Linux, Windows, FreeBSD and macOS / Habr


Hearing is among the few fundamental senses that we people have together with the opposite our skills to see, scent, style and contact. If we could not hear, the world as we all know it will be much less fascinating and colourful to us. It could be a complete silence – a scary factor, even to think about. And talking makes our life a lot enjoyable, as a result of what else will be higher than speaking to our family and friends? Also, we’re capable of hearken to our favourite music wherever we’re, due to computer systems and headphones. With the assistance of tiny microphones built-in into our telephones and laptops we at the moment are capable of speak to the individuals around the globe from anyplace with an Internet connection. But pc {hardware} alone is not sufficient – it’s pc software program that actually defines the way in which how and when the {hardware} ought to function. Operating Systems present the means for that to the apps that wish to use pc’s audio capabilities. In actual use-cases audio information often goes the great distance from one finish to a different, being remodeled and (un)compressed on-the-fly, attenuated, filtered, and so forth. But in the long run all of it comes down to only 2 fundamental processes: enjoying the sound or recording it.

Today we’ll focus on find out how to make use of the API that common OS present: that is an important information if you wish to create an app your self which works with audio I/O. But there’s only one downside standing on our means: there isn’t a single API that every one OS assist. In reality, there are fully completely different API, completely different approaches, barely completely different logic. We might simply use some library which solves all these issues for us, however in that case we can’t perceive what’s actually occurring below the hood – what is the level? But people are constructed the way in which that we generally wish to dig somewhat bit deeper, to study somewhat bit greater than what simply lies on the floor. That’s why we’ll study the API that OS present by default: ALSA (Linux), PulseAudio (Linux), WASAPI (Windows), OSS (FreeBSD), CoreAudio (macOS).

Although I attempt to clarify each element that I believe is essential, these API are so advanced that in any case it’s worthwhile to discover the official documentation which explains all features, identificators, parameters, and many others, in far more element. Going by this tutorial does not imply you needn’t learn official docs – you must, otherwise you’ll find yourself with an incomplete understanding of the code you write.

All pattern code for this information is out there right here: https://github.com/stsaz/audio-api-quick-start-guide. I like to recommend that whereas studying this tutorial you must have an instance file opened in entrance of you so that you just higher perceive the aim of every code assertion in international context. When you are prepared for barely extra superior utilization of audio API, you possibly can analyze the code of ffaudio library: https://github.com/stsaz/ffaudio.

Contents:

Overview

First, I’m gonna describe find out how to work with audio units normally, with none API specifics.

Step 1. It all begins with enumerating the out there audio units. There are 2 sorts of units OS have: a playback machine, wherein we write audio information, or a seize machine, from which we learn audio information. Every machine has its personal distinctive ID, identify and different properties. Every app can entry this information to pick one of the best machine for its want or to indicate all units to its consumer so he can choose the one he likes manually. However, more often than not we needn’t choose a selected machine, however slightly simply use the default machine. In this case we needn’t enumerate units and we can’t even must retrieve any properties.

Note that there could be a registered machine within the system however unavailable, for instance when the consumer disabled it in system settings. If we carry out all essential error checks when writing our code, we can’t usually see any disabled units.

Sometimes we are able to retrieve the checklist of all supported audio codecs for a selected machine, however I would not depend on this data very a lot – it isn’t cross-platform, in spite of everything. Instead, it is higher to attempt to assign an audio buffer to this machine and see if the audio format is supported for positive.

Step 2. After we have now decided what machine we wish to use, we proceed with creating an audio buffer and assigning it to machine. At this level we should know the audio format we might like to make use of: pattern format (e.g. signed integer), pattern width (e.g. 16 bit), pattern price (e.g. 48000 Hz) and the variety of channels (e.g. 2 for stereo).

Sample format is both signed integer, unsigned integer or a floating level quantity. Some audio drivers assist all integers and floats, whereas some might assist simply 16-bit signed integers and nothing else. This does not at all times imply that the machine works with them natively – it is attainable that audio machine software program converts samples internally.

Sample width is the dimensions of every pattern, 16 bit is the usual for CD Audio high quality, nevertheless it’s not the only option for audio processing use-cases as a result of it may simply produce artifacts (though, it is unlikely that we are able to actually inform the distinction). 24 bit is significantly better and it is supported by many audio units. But however, the skilled sound apps do not give any likelihood to sound artifacts: they use 64-bit float samples internally when performing mixing operations and different sorts of filtering.

Sample price is the variety of samples essential to get 1 complete second of audio information. The hottest charges are 44.1KHz and 48KHz, however audiophiles might argue that 96KHz pattern price is one of the best. Usually, audio units can work with charges as much as 192KHz, however I actually do not consider that anyone can hear any distinction between 48KHz and better values. Note that pattern price is the variety of samples per second for one channel solely. So in actual fact for 16bit 48KHz stereo stream the variety of bytes we have now to course of is 2 * 48000 * 2.

0 sec                                   1 sec
|      (Sample 0)  ...  (Sample 47999)  |
|       brief[L]   ...     brief[L]     |
|       brief[R]   ...     brief[R]     |

Sometimes samples are known as frames, just like video frames. An audio pattern/body is a stack of numerical values which collectively kind the audio sign power for every {hardware} machine channel for a similar cut-off date. Note, nonetheless, that some official docs do not agree with me on this: they particularly outline {that a} pattern is simply 1 numerical worth, whereas the body is a set of these values for all channels. Why do not they name pattern price a body price then? Please check out the diagram above as soon as once more. There, the pattern width (column width) continues to be 16bit, regardless of what number of channels (rows under) we have now. Sample format (which is a signed integer in our case) at all times stays the identical too. Sample price is the variety of columns for 1 second of audio, regardless of what number of channels we have now. Therefore, my very own logic tells me the definition of an audio pattern to me (regardless of how others might outline it), however your logic could also be completely different – that is completely nice.

Sample dimension (or body dimension) is one other property that you’ll usually use whereas working with digital audio. This is only a fixed worth to conveniently convert the variety of bytes to the variety of audio samples and vice versa:

	int sample_size = sample_width/8 * channels;
	int bytes_in_buffer = samples_in_buffer * sample_size;

Of course, we are able to additionally set extra parameters for our audio buffer, such because the size of our buffer (in milliseconds or in bytes) which is the principle property for controlling sound latency. But understand that each machine has its personal limits for this parameter: we won’t set the size too low when machine simply does not assist it. I believe that 250ms is a nice place to begin for many purposes, however some real-time apps require the minimal attainable latency at the price of increased CPU utilization – all of it relies on your specific use-case.

When opening an audio machine we must always at all times be prepared if the API we use returns with a “bad format” error which signifies that the audio format we selected is not supported by the underlying software program or by bodily machine itself. In this case we must always choose a extra appropriate format and recreate the buffer.

Note that one bodily machine will be opened solely as soon as, we won’t connect 2 or extra audio buffers to the identical machine – what a multitude it will be in any other case, proper? But on this case we have to remedy an issue with having a number of audio apps that wish to play some audio in parallel by way of the one machine. Windows solves this downside in WASAPI by introducing 2 completely different modes: shared and unique. In shared mode we connect our audio buffers to a digital machine, which mixes all streams from completely different apps collectively, applies some filtering corresponding to sound attenuation after which passes the information to a bodily machine. And PulseAudio on Linux works the identical means on prime of ALSA. Of course, the draw back is that the shared mode has to have a better latency and a better CPU utilization. On the opposite hand, in unique mode we have now nearly direct connection to audio machine driver which means that we are able to obtain the utmost sound high quality and minimal latency, however no different app will be capable of use this machine whereas we’re utilizing it.

Step 3. After we have now ready and configured an audio buffer, we are able to begin utilizing it: writing information to it for playback or studying information from it to report audio. Audio buffer is in actual fact a round buffer, the place studying and writing operations are carried out infinitely in circle.

But there’s one downside: CPU should be in synchronization with the audio machine when performing I/O on the identical reminiscence buffer. If not, CPU will work at its full velocity and run over the audio buffer some million instances whereas the audio machine solely finishes its first flip. Therefore, CPU should at all times anticipate the audio machine to slowly do its work. Then, after a while, CPU wakes up for a short while interval to get some extra audio information from the machine (when recording) or to feed some extra information to the machine (in playback mode), after which CPU ought to proceed sleeping. Note that that is the place the audio buffer size parameter comes into play: the much less the buffer dimension – the extra instances CPU should get up to carry out its work. A round buffer will be in 3 completely different states: empty, half-full and full. And we have to perceive find out how to correctly deal with all these states in our code.

Empty buffer whereas recording signifies that there aren’t any audio samples out there to us in the mean time. We should wait till audio machine places one thing new into it.

Empty buffer whereas enjoying signifies that we’re free to write down audio information into the buffer at any time. However, if the audio machine is operating and it comes so far when there isn’t any extra information out there for it to learn, it signifies that we have now didn’t sustain with it, this example is known as buffer underrun. If we’re on this state, we must always pause the machine, fill the audio buffer after which resume the traditional operation.

Half-full buffer whereas recording signifies that there are some audio samples contained in the buffer, nevertheless it’s not but fully full. We ought to course of the out there information as quickly as we are able to and mark this information area as learn (or ineffective) in order that the subsequent time we can’t see this information as out there.

Half-full buffer for playback streams signifies that we are able to put some extra information into it.

Full buffer for recording streams signifies that we’re falling behind the audio machine to learn the out there information. Audio machine has stuffed the buffer fully and there isn’t any extra room for brand new information. This state is known as buffer overrun. In this case we should reset the buffer and resume (unpause) the machine to proceed usually.

Full buffer for playback streams is a standard state of affairs and we must always simply wait till some free area is out there.

A number of phrases about how the ready course of is carried out. Some API present the means for us to subscribe to notifications to attain I/O with the least attainable latency. For instance, ALSA can ship SIGIO sign to our course of after it has written some information into audio recording buffer. WASAPI in unique mode can notify us by way of a Windows kernel occasion object. However, for apps that do not require a lot accuracy we are able to merely use our personal timers, or we are able to simply block our course of with features like usleep/Sleep. When utilizing these strategies we simply must ensure that we sleep no more than half of our audio buffer size, .e.g. for 500ms buffer we might set the timer to 250ms and carry out I/O 2 instances per every buffer rotation. Of course you perceive that we won’t do it reliably for very small buffers, as a result of even a slight delay may cause audio stutter. Anyway, on this tutorial we do not want excessive accuracy, however we’d like small code that’s simpler to grasp.

Step 4. For playback buffers there’s yet another factor. After we have now accomplished writing all our information into audio buffer, we should nonetheless anticipate it to course of the information. In different phrases, we must always drain the buffer. Sometimes we must always manually add silence to the buffer, in any other case it might play a bit of some outdated invalid information, leading to audio artifacts. Then, after we see that the entire buffer has grow to be empty, we are able to cease the machine and shut the buffer.

Also, understand that through the regular operation these issues might come up:

  • The bodily audio machine will be switched off by consumer and grow to be unavailable for us.

  • The digital audio machine could possibly be reconfigured by consumer and require us to reopen and reconfigure audio buffers.

  • CPU was too busy performing another operations (for one more app or system service) leading to buffer overrun/underrun situation.

Careful programmer should at all times test for all attainable eventualities, test all return codes from API features that we name and deal with them or present an error message to consumer. I do not do it in my pattern code simply because this tutorial is so that you can perceive an audio API. And this objective is fulfilled with the shortest attainable code so that you can learn – error checking in all places will make the issues worse on this case.

Of course, after we’re carried out, we should shut the handlers to audio buffers and units, free allotted reminiscence areas. But we do not wanna do this in case we simply wish to play one other audio file, for instance. Preparing a brand new audio buffer can take plenty of time, so at all times attempt to reuse it when you possibly can.

Audio Data Representation

Now let’s discuss how audio information is definitely organized and find out how to analyze it. There are 2 sorts of audio buffers: interleaved and non-interleaved. Interleaved buffer is a single contiguous reminiscence area the place the units of audio samples go one after the other. This is the way it appears like for 16bit stereo audio:

brief[0][L]
brief[0][R]
brief[1][L]
brief[1][R]
...

Here, 0 and 1 are the pattern indexes, and L and R are the channels. For instance, how we are able to learn the values for each channels for pattern #9 is we take the pattern index and multiply it by the variety of channels:

	brief *samples = (brief*)buffer;
	brief sample_9_left = samples[9*2];
	brief sample_9_right = samples[9*2 + 1];

These 16-bit signed values are the sign power the place 0 is silence. But the sign power is often measured in dB values. Here’s how we are able to convert our integer values to dB:

	brief pattern = ...;
	double acquire = (double)pattern * (1 / 32768.0);
	double db = log10(acquire) * 20;

Here we first convert integer to a float quantity – that is acquire worth the place 0.0 is silence and +/-1.0 – max sign. Then, utilizing acquire = 10 ^ (db / 20) method we convert the acquire into dB worth. If we wish to do an reverse conversion, we might use this code:

	#embrace <emmintrin.h> // SSE2 features. All AMD64 CPU assist them.

	double db = ...;
	double acquire = pow(10, db / 20);
	double d = acquire * 32768.0;
	brief pattern;
	if (d < -32768.0)
		pattern = -0x8000;
	else if (d > 32768.0 - 1)
		pattern = 0x7fff;
	else
		pattern = _mm_cvtsd_si32(_mm_load_sd(&d));

I’m not an professional in audio math, I’m simply displaying you ways I do it, however you might discover a higher resolution.

The hottest audio codecs and essentially the most audio API use interleaved audio information format.

Non-interleaved buffer is an array of (probably) completely different reminiscence areas, one for every channel:

L -> {
	brief[0][L]
	brief[1][L]
	...
}
R -> {
	brief[0][R]
	brief[1][R]
	...
}

For instance, the mainstream Vorbis and FLAC audio codecs use this format. As you possibly can see, it’s extremely straightforward to function on samples inside a single channel in non-interleaved buffers. For instance, swapping left and proper channels would take simply a few CPU cycles to swap the pointers.

I believe we have had sufficient principle and we’re prepared for some actual code with an actual audio API.

Linux and ALSA

ALSA is Linux’s default audio subsystem, so let’s begin with it. ALSA consists of two elements: audio drivers that reside contained in the kernel and consumer API which gives common entry to the drivers. We’re going to study consumer mode ALSA API – it is the lowest stage for accessing sound {hardware} in consumer mode.

First, we should set up improvement package deal, which is libalsa-devel for Fedora. Now we are able to embrace it in our code:

	#embrace <alsa/asoundlib.h>

And when linking our binaries we add -lalsa flag.

ALSA: Enumerating Devices

First, iterate over all sound playing cards out there within the system, till we get a -1 index:

	int icard = -1;
	for (;;) {
		snd_card_next(&icard);
		if (icard == -1)
			break;
		...
	}

For every sound card index we put together a NULL-terminated string, e.g. hw:0, which is a singular ID of this sound card. We obtain the sound card handler from snd_ctl_open(), which we later shut with snd_ctl_close().

	char scard[32];
	snprintf(scard, sizeof(scard), "hw:%u", icard);

	snd_ctl_t *sctl = NULL;
	snd_ctl_open(&sctl, scard, 0);
	...
	snd_ctl_close(sctl);

For every sound card we stroll by all its units till we get -1 index:

	int idev = -1;
	for (;;) 

Now put together a NULL-terminated string, e.g. plughw:0,0, which is the machine ID we are able to later use when assigning an audio buffer. plughw: prefix signifies that ALSA will attempt to apply some audio conversion when essential. If we wish to use {hardware} machine instantly, we must always use hw: prefix as a substitute. For default machine we might use plughw:0,0 string, however in principle it may be unavailable – you must present a means for the consumer to pick a selected machine.

	char device_id[64];
	snprintf(device_id, sizeof(device_id), "plughw:%u,%u", icard, idev);

ALSA: Opening Audio Buffer

Now that we all know the machine ID, we are able to assign a brand new audio buffer to it with snd_pcm_open(). Note that we can’t be capable of open the identical ALSA machine twice. And if this machine is utilized by system PulseAudio course of, no different app within the system will be capable of use audio whereas we’re holding it.

	snd_pcm_t *pcm;
	const char *device_id = "plughw:0,0";
	int mode = (playback) ? SND_PCM_STREAM_PLAYBACK : SND_PCM_STREAM_CAPTURE;
	snd_pcm_open(&pcm, device_id, mode, 0);
	...
	snd_pcm_close(pcm);

Next, set the parameters for our buffer. Here we inform ALSA that we wish to use mmap-style features to get direct entry to its buffers and that we would like an interleaved buffer. Then, we set audio format and the buffer size. Note that ALSA updates some values for us if the values we equipped aren’t supported by machine. However, if pattern format is not supported, we have now to search out the proper worth manually by probing with snd_pcm_hw_params_get_format_mask()/snd_pcm_format_mask_test(). In actual life you must test in case your higher-level code helps this new configuration.

	snd_pcm_hw_params_t *params;
	snd_pcm_hw_params_alloca(&params);
	snd_pcm_hw_params_any(pcm, params);

	int entry = SND_PCM_ACCESS_MMAP_INTERLEAVED;
	snd_pcm_hw_params_set_access(pcm, params, entry);

	int format = SND_PCM_FORMAT_S16_LE;
	snd_pcm_hw_params_set_format(pcm, params, format);

	u_int channels = 2;
	snd_pcm_hw_params_set_channels_near(pcm, params, &channels);

	u_int sample_rate = 48000;
	snd_pcm_hw_params_set_rate_near(pcm, params, &sample_rate, 0);

	u_int buffer_length_usec = 500 * 1000;
	snd_pcm_hw_params_set_buffer_time_near(pcm, params, &buffer_length_usec, NULL);

	snd_pcm_hw_params(pcm, params);

Finally, we have to bear in mind the body dimension and the entire buffer dimension (in bytes).

	int frame_size = (16/8) * channels;
	int buf_size = sample_rate * (16/8) * channels * buffer_length_usec / 1000000;

ALSA: Recording Audio

To begin recording we name snd_pcm_start():

	snd_pcm_start(pcm);

During regular operation we ask ALSA for some new audio information with snd_pcm_mmap_begin() which returns the buffer, offset to the legitimate area and the variety of legitimate frames. For this perform to work appropriately we must always first name snd_pcm_avail_update() which updates the buffer’s inner pointers. After we have now processed the information, we should get rid of it with snd_pcm_mmap_commit().

	for (;;) {
		snd_pcm_avail_update(pcm);

		const snd_pcm_channel_area_t *areas;
		snd_pcm_uframes_t off;
		snd_pcm_uframes_t frames = buf_size / frame_size;
		snd_pcm_mmap_begin(pcm, &areas, &off, &frames);
		...
		snd_pcm_mmap_commit(pcm, off, frames);
	}

When we get 0 frames out there, it signifies that the buffer is empty. Start the recording stream if essential, then anticipate some extra information. I take advantage of 100ms interval, however truly it must be computed utilizing the actual buffer dimension.

	if (frames == 0) {
		int period_ms = 100;
		usleep(period_ms*1000);
		proceed;
	}

After we have got some information, we get the pointer to the precise interleaved information and the variety of out there bytes on this area:

	const void *information = (char*)areas[0].addr + off * areas[0].step/8;
	int n = frames * frame_size;

ALSA: Playing Audio

Writing audio is nearly the identical as studying it. We get the buffer area by snd_pcm_mmap_begin(), copy our information to it after which mark it as full with snd_pcm_mmap_commit(). When the buffer is full, we obtain 0 out there free frames. In this case we begin the playback stream for the primary time and begin ready till some free area is out there within the buffer.

	if (frames == 0) {
		if (SND_PCM_STATE_RUNNING != snd_pcm_state(pcm))
			snd_pcm_start(pcm);

		int period_ms = 100;
		usleep(period_ms*1000);
		proceed;
	}

ALSA: Draining

To drain playback buffer we needn’t do something particular. First, we test whether or not there’s nonetheless some information in buffer and in that case, wait till the buffer is totally empty.

	for (;;) {
		if (0 >= snd_pcm_avail_update(pcm))
			break;

		if (SND_PCM_STATE_RUNNING != snd_pcm_state(pcm))
			snd_pcm_start(pcm);

		int period_ms = 100;
		usleep(period_ms*1000);
	}

But why will we at all times test the state of our buffer after which name snd_pcm_start() if essential? It’s as a result of ALSA by no means begins streaming mechanically. We want to begin it initially after the buffer is full, and we have to begin it each time an error corresponding to buffer overrun happens. We additionally want to begin it in case we have not stuffed the buffer fully.

ALSA: Error Checking

Most of the ALSA features we use right here return integer consequence codes. They return 0 on success and non-zero error code on failure. To translate an error code to a user-friendly error message we are able to use snd_strerror() perform. I additionally advocate storing the identify of the perform that returned with an error in order that the consumer has full details about what went mistaken precisely.

But there’s extra. During regular operation whereas recording or enjoying audio we must always deal with buffer overrun/underrun circumstances. Here’s find out how to do it. First, test if the error code is -EPIPE. Then, name snd_pcm_prepare() to reset the buffer. If it fails, then we won’t proceed regular operation, it is a deadly error. If it completes efficiently, we proceed regular operation as if there was no buffer overrun. Why cannot ALSA simply deal with this case internally? To give us extra management over our program. For instance, some app on this case should notify the consumer that an audio information chunk was misplaced.

	if (err == -EPIPE)
		assert(0 == snd_pcm_prepare(pcm));

Next case once we want particular error dealing with is after we have now known as snd_pcm_mmap_commit() perform. The downside is that even when it returns some information and never an error code, we nonetheless must test whether or not all information is processed. If not, we set -EPIPE error code ourselves and we are able to then deal with it with the identical code as proven above.

	err = snd_pcm_mmap_commit(pcm, off, frames);
	if (err >= 0 && (snd_pcm_uframes_t)err != frames)
		err = -EPIPE;

Next, the features might return -ESTRPIPE error code which signifies that for some motive the machine we’re at present utilizing has been briefly stopped or paused. If it occurs, we must always wait till the machine comes on-line once more, periodically checking its state with snd_pcm_resume(). And then we name snd_pcm_prepare() to reset the buffer and proceed as traditional.

	if (err == -ESTRPIPE) {
		whereas (-EAGAIN == snd_pcm_resume(pcm)) {
			int period_ms = 100;
			usleep(period_ms*1000);
		}
		snd_pcm_prepare(pcm);
	}

Don’t overlook that after dealing with these errors we have to name snd_pcm_start() to begin the buffer. For recording streams we do it instantly, and for playback streams we do it when the buffer is full.

Linux and PulseAudio

PulseAudio works on prime of ALSA, it may’t substitute ALSA – it is simply an audio layer with a number of helpful options for graphical multi-app surroundings, e.g. sound mixing, conversion, rerouting, enjoying audio notifications. Therefore, in contrast to ALSA, PulseAudio can share a single audio machine between a number of apps – I believe that is the principle motive why it is helpful.

Note that on Fedora PulseAudio will not be the default audio layer anymore, it is changed with PipeWire with one more audio API (although PulseAudio apps will proceed to work by way of the PipeWire-PulseAudio layer). But till PipeWire is not the default selection on different common Linux distributions, PulseAudio is extra helpful total.

First, we should set up improvement package deal, which is libpulse-devel for Fedora. Now we are able to embrace it in our code:

	#embrace <pulse/pulseaudio.h>

And when linking our binaries we add -lpulse flag.

A pair phrases about how PulseAudio is completely different from others. PulseAudio has a client-server design which suggests we do not function on an audio machine instantly however simply problem instructions to PulseAudio server and obtain the response from it. Thus, we at all times begin with connecting to PulseAudio server. We must implement considerably advanced logic to do it, as a result of the interplay between us and the server is asynchronous: we have now to ship a command to server after which anticipate it to course of our command and obtain the consequence, all by way of a socket (UNIX) connection. Of course, this communication takes a while, and we are able to do another stuff whereas ready for the server’s response. But with our pattern code right here we can’t be that intelligent: we’ll simply anticipate responses synchronously which is simpler to grasp.

We start by making a separate thread which can course of socket I/O operations for us. Don’t overlook to cease this thread and shut its handlers after we’re carried out with PulseAudio.

	pa_threaded_mainloop *mloop = pa_threaded_mainloop_new();
	pa_threaded_mainloop_start(mloop);
	...
	pa_threaded_mainloop_stop(mloop);
	pa_threaded_mainloop_free(mloop);

The very first thing to recollect when utilizing PulseAudio is that we should carry out all operations whereas holding the interior lock for this I/O thread. “Lock the thread”, carry out essential calls to PA objects, after which “unlock the thread”. Failing to correctly lock the thread might at any level lead to race situation. This lock is recursive, which means that it is protected to lock it a number of time from the identical thread. Just name the unlocking perform the identical variety of instances. However, I do not see how the lock recursiveness is beneficial in actual life. Recursive locks often imply that we have now a nasty structure they usually may cause tough to search out issues – I by no means advise to make use of this characteristic.

	pa_threaded_mainloop_lock(mloop);
	...
	pa_threaded_mainloop_unlock(mloop);

Now start connection to PA server. Note that pa_context_connect() perform often returns instantly even when the connection is not but established. We’ll obtain the results of connection later in a callback perform we set by way of pa_context_set_state_callback(). Don’t overlook to disconnect from the server once we’re carried out.

	pa_mainloop_api *mlapi = pa_threaded_mainloop_get_api(mloop);
	pa_context *ctx = pa_context_new_with_proplist(mlapi, "My App", NULL);

	void *udata = NULL;
	pa_context_set_state_callback(ctx, on_state_change, udata);

	pa_context_connect(ctx, NULL, 0, NULL);
	...
	pa_context_disconnect(ctx);
	pa_context_unref(ctx);

After we have issued the connection command we have now nothing extra to do besides ready for the consequence. We ask for the connection standing, and if it isn’t but prepared, we name pa_threaded_mainloop_wait() which blocks our thread till a sign is acquired.

	whereas (PA_CONTEXT_READY != pa_context_get_state(ctx))  {
		pa_threaded_mainloop_wait(mloop);
	}

And here is how our on-state-change callback perform appears like. Nothing intelligent: we simply sign the our thread to exit from pa_threaded_mainloop_wait() the place we’re at present hanging. Note that this perform is known as not from our personal thread (it nonetheless retains hanging), however from the I/O thread we began beforehand with pa_threaded_mainloop_start(). As a basic rule, attempt to maintain the code in these callback features as small as attainable. Your perform is known as, you obtain the consequence and ship a sign to your thread – that must be sufficient.

	void on_state_change(pa_context *c, void *userdata)
	{
		pa_threaded_mainloop_signal(mloop, 0);
	}

I hope this call-stack diagram makes this PA server connection logic somewhat bit clearer for you:

	[Our Thread]
	|- pa_threaded_mainloop_start()
	|                                       [PA I/O Thread]
	   |- pa_context_connect()              |
	   |- pa_threaded_mainloop_wait()       |
	   |                                    |- on_state_change()
	   |                                       |- pa_threaded_mainloop_signal()
	[pa_threaded_mainloop_wait() returns]

The identical logic applies to dealing with all operations outcomes with our callback features.

PulseAudio: Enumerating Devices

After the connection to PA server is established, we proceed by itemizing the out there units. We create a brand new operation with a callback perform. We may also cross some pointer to our callback perform, however I simply use NULL worth. Don’t overlook to launch the pointer after the operation is full. And in fact, this code must be executed solely whereas holding the mainloop thread lock.

	pa_operation *op;
	void *udata = NULL;
	if (playback)
		op = pa_context_get_sink_info_list(ctx, on_dev_sink, udata);
	else
		op = pa_context_get_source_info_list(ctx, on_dev_source, udata);
	...
	pa_operation_unref(op);

Now wait till the operation is full.

	for (;;)  r == PA_OPERATION_CANCELLED)
			break;
		pa_threaded_mainloop_wait(mloop);
	

While we’re at it, the I/O thread is receiving information from server and performs a number of profitable calls to our callback perform the place we are able to entry all properties for every out there machine. When an error occurrs or when there aren’t any extra units, eol parameter is about to a non-zero worth. When this occurs we simply ship the sign to our thread. The perform for itemizing playback units appears this manner:

	void on_dev_sink(pa_context *c, const pa_sink_info *data, int eol, void *udata)
	{
		if (eol != 0) {
			pa_threaded_mainloop_signal(mloop, 0);
			return;
		}

		const char *device_id = info->identify;
	}

And the perform for itemizing recording units appears related:

	void on_dev_source(pa_context *c, const pa_source_info *data, int eol, void *udata)

The worth of udata is the worth we set whereas calling pa_context_get_*_info_list(). In our code they at all times NULL as a result of my mloop variable is international and we do not want anything.

PulseAudio: Opening Audio Buffer

We create a brand new audio buffer with pa_stream_new() passing our connection context to it, the identify of our utility and the sound format we wish to use.

	pa_sample_spec spec;
	spec.format = PA_SAMPLE_S16LE;
	spec.price = 48000;
	spec.channels = 2;
	pa_stream *stm = pa_stream_new(ctx, "My App", &spec, NULL);
	...
	pa_stream_unref(stm);

Next, we connect our buffer to machine with pa_stream_connect_*(). We set buffer size in pa_buffer_attr::tlength in bytes, and we go away all different parameters as default (setting them to -1). We additionally assign with pa_stream_set_*_callback() our callback perform which will probably be known as each time audio I/O is full. We can use device_id worth we obtained whereas enumerating units or we are able to use NULL for default machine.

	pa_buffer_attr attr;
	memset(&attr, 0xff, sizeof(attr));

	int buffer_length_msec = 500;
	attr.tlength = spec.price * 16/8 * spec.channels * buffer_length_msec / 1000;

For recording streams we do:

	void *udata = NULL;
	pa_stream_set_read_callback(stm, on_io_complete, udata);
	const char *device_id = ...;
	pa_stream_connect_record(stm, device_id, &attr, 0);
	...
	pa_stream_disconnect(stm);

And for playback streams:

	void *udata = NULL;
	pa_stream_set_write_callback(stm, on_io_complete, udata);
	const char *device_id = ...;
	pa_stream_connect_playback(stm, device_id, &attr, 0, NULL, NULL);
	...
	pa_stream_disconnect(stm);

As traditional, we have now to attend till our operation is full. We learn the present state of our buffer with pa_stream_get_state(). PA_STREAM_READY signifies that recording is began efficiently and we are able to proceed with regular operation. PA_STREAM_FAILED means an error occurred.

	for (;;) {
		int r = pa_stream_get_state(stm);
		if (r == PA_STREAM_READY)
			break;
		else if (r == PA_STREAM_FAILED)
			error

		pa_threaded_mainloop_wait(mloop);
	}

While we’re hanging inside pa_threaded_mainloop_wait() our callback perform on_io_complete() will probably be known as sooner or later inside I/O thread. Now we simply ship a sign to our major thread.

	void on_io_complete(pa_stream *s, size_t nbytes, void *udata)
	{
		pa_threaded_mainloop_signal(mloop, 0);
	}

PulseAudio: Recording Audio

We receive the information area with audio samples from PulseAudio with pa_stream_peek() and after we have now processed it, we discard this information with pa_stream_drop().

	for (;;) {
		const void *information;
		size_t n;
		pa_stream_peek(stm, &information, &n);
		if (n == 0) {
			// Buffer is empty. Process extra occasions
			pa_threaded_mainloop_wait(mloop);
			proceed;

		} else if (information == NULL && n != 0) {
			// Buffer overrun occurred

		} else {
			...
		}

		pa_stream_drop(stm);
	}

pa_stream_peek() returns 0 samples when buffer is empty. In this case we needn’t name pa_stream_drop() and we must always wait till extra information arrives. When buffer overrun happens we have now information=NULL. This is only a notification to us and we are able to proceed by calling pa_stream_drop() after which pa_stream_peek() once more.

PulseAudio: Playing Audio

When we write information to an audio machine, we first should get the quantity of free area within the audio buffer with pa_stream_writable_size(). It returns 0 when the buffer is full, and we should wait till some free area is out there after which strive once more.

	size_t n = pa_stream_writable_size(stm);
	if (n == 0) {
		pa_threaded_mainloop_wait(mloop);
		proceed;
	}

We get the buffer area the place into we are able to copy audio samples with pa_stream_begin_write(). After we have stuffed the buffer, we name pa_stream_write() to launch this reminiscence area.

	void *buf;
	pa_stream_begin_write(stm, &buf, &n);
	...
	pa_stream_write(stm, buf, n, NULL, 0, PA_SEEK_RELATIVE);

PulseAudio: Draining

To drain the buffer we create a drain operation with pa_stream_drain() and cross our callback perform to it which will probably be known as when draining is full.

	void *udata = NULL;
	pa_operation *op = pa_stream_drain(stm, on_op_complete, udata);
	...
	pa_operation_unref(op);

Now wait till our callback perform indicators us.

	for (;;)  r == PA_OPERATION_CANCELLED)
			break;
		pa_threaded_mainloop_wait(mloop);
	

Here’s how our callback perform appears like:

	void on_op_complete(pa_stream *s, int success, void *udata)
	{
		pa_threaded_mainloop_signal(mloop, 0);
	}

Windows and WASAPI

WASAPI is default sound subsystem beginning with Windows Vista. It’s a successor to DirectSound API which we do not focus on right here, as a result of I doubt you wish to assist outdated Windows XP. But should you do, please take a look at the suitable code in ffaudio your self. WASAPI can work in 2 completely different modes: shared and unique. In shared mode a number of apps can use the identical bodily machine and it is the proper mode for traditional playback/recording apps. In unique mode we have now an unique entry to audio machine, that is appropriate for skilled real-time sound apps.

WASAPI embrace directives should be preceded by COBJMACROS preprocessor definition, that is for pure C definitions to work appropriately.

	#outline COBJMACROS
	#embrace <mmdeviceapi.h>
	#embrace <audioclient.h>

Before doing anything we should initialize COM-interface subsystem.

	CoInitializeEx(NULL, 0);

We should hyperlink all WASAPI apps with -lole32 linker flag.

The most of WASAPI features return 0 on success and non-zero on failure.

WASAPI: Enumerating Devices

We create machine enumerator object with CoCreateInstance(). Don’t overlook to launch it once we’re carried out.

	IMMDeviceEnumerator *enu;
	const GUID _CLSID_MMDeviceEnumerator = {0xbcde0395, 0xe52f, 0x467c, {0x8e,0x3d, 0xc4,0x57,0x92,0x91,0x69,0x2e}};
	const GUID _IID_IMMDeviceEnumerator = {0xa95664d2, 0x9614, 0x4f35, {0xa7,0x46, 0xde,0x8d,0xb6,0x36,0x17,0xe6}};
	CoCreateInstance(&_CLSID_MMDeviceEnumerator, NULL, CLSCTX_ALL, &_IID_IMMDeviceEnumerator, (void**)&enu);
	...
	IMMDeviceEnumerator_Release(enu);

We use this machine enumerator object to get array of accessible units with IMMDeviceEnumerator_EnumAudioEndpoints().

	IMMDeviceCollection *dcoll;
	int mode = (playback) ? eRender : eCapture;
	IMMDeviceEnumerator_EnumAudioEndpoints(enu, mode, DEVICE_STATE_ACTIVE, &dcoll);
	...
	IMMDeviceCollection_Release(dcoll);

Enumerate units by asking IMMDeviceCollection_Item() to return machine handler for the required array index.

	for (int i = 0;  ;  i++) {
		IMMDevice *dev;
		if (0 != IMMDeviceCollection_Item(dcoll, i, &dev))
			break;
		...
		IMMDevice_Release(dev);
	}

Then, get set of properties for this machine.

	IPropertyStore *props;
	IMMDevice_OpenPropertyStore(dev, STGM_READ, &props);
	...
	IPropertyStore_Release(props);

Read a single property worth with IPropertyStore_GetWorth(). Here’s find out how to get user-friendly identify for the machine.

	PROPVARIANT identify;
	PropVariantInit(&identify);
	const PROPERTYKEY _PKEY_Device_FriendlyName = {{0xa45c254e, 0xdf1c, 0x4efd, {0x80, 0x20, 0x67, 0xd1, 0x46, 0xa8, 0x50, 0xe0}}, 14};
	IPropertyStore_GetWorth(props, &_PKEY_Device_FriendlyName, &identify);
	const wchar_t *device_name = identify.pwszVal;
	...
	PropVariantClear(&identify);

And now the principle motive why we have to checklist units: we get the distinctive machine ID with IMMDevice_GetId().

	wchar_t *device_id = NULL;
	IMMDevice_GetId(dev, &device_id);
	...
	CoTaskMemFree(device_id);

To get system default machine we use IMMDeviceEnumerator_GetDefaultAudioEndpoint(). Then we are able to get its ID and identify precisely the identical means as described above.

	IMMDevice *def_dev = NULL;
	IMMDeviceEnumerator_GetDefaultAudioEndpoint(enu, mode, eConsole, &def_dev);
	IMMDevice_Release(def_dev);

WASAPI: Opening Audio Buffer in Shared Mode

Here’s the simplest method to open an audio buffer in shared mode. Once once more we begin by creating a tool enumerator object.

	IMMDeviceEnumerator *enu;
	const GUID _CLSID_MMDeviceEnumerator = {0xbcde0395, 0xe52f, 0x467c, {0x8e,0x3d, 0xc4,0x57,0x92,0x91,0x69,0x2e}};
	const GUID _IID_IMMDeviceEnumerator = {0xa95664d2, 0x9614, 0x4f35, {0xa7,0x46, 0xde,0x8d,0xb6,0x36,0x17,0xe6}};
	CoCreateInstance(&_CLSID_MMDeviceEnumerator, NULL, CLSCTX_ALL, &_IID_IMMDeviceEnumerator, (void**)&enu);
	...
	IMMDeviceEnumerator_Release(enu);

Now we both use the default seize machine or we already know the particular machine ID. In both case we get the machine descriptor.

	IMMDevice *dev;
	wchar_t *device_id = NULL;
	if (device_id == NULL) {
		int mode = (playback) ? eRender : eCapture;
		IMMDeviceEnumerator_GetDefaultAudioEndpoint(enu, mode, eConsole, &dev);
	} else {
		IMMDeviceEnumerator_GetMachine(enu, device_id, &dev);
	}
	...
	IMMDevice_Release(dev);

We create an audio seize buffer with IMMDevice_Activate() passing IID_IAudioConsumer identificator to it.

	IAudioConsumer *shopper;
	const GUID _IID_IAudioConsumer = {0x1cb9ad4c, 0xdbfa, 0x4c32, {0xb1,0x78, 0xc2,0xf5,0x68,0xa7,0x03,0xb2}};
	IMMDevice_Activate(dev, &_IID_IAudioConsumer, CLSCTX_ALL, NULL, (void**)&shopper);
	...
	IAudioClient_Release(shopper);

Because we wish to open WASAPI audio buffer in shared mode, we won’t order it to make use of the audio format that we would like. Audio format is the topic of system-level configuration and we simply must adjust to it. Most probably this format will probably be 16bit/44100/stereo or 24bit/44100/stereo, however we are able to by no means make sure. To be fully trustworthy, WASAPI can settle for a distinct pattern format from us (e.g. we are able to use float32 format and WASAPI will mechanically convert our samples to 16bit), however once more, we should not depend on this behaviour. The most strong method to get the proper audio format is by calling IAudioClient_GetCombineFormat() which creates a WAVE-format header for us. The identical header format is utilized in .wav information, by the way in which. Note that for recoding and for playback there are 2 completely different settings for audio format in Windows. It relies on which machine our buffer is assigned to.

	WAVEFORMATEX *wf;
	IAudioClient_GetCombineFormat(shopper, &wf);
	...
	CoTaskMemFree(wf);

Now we simply use this audio format to arrange our buffer with IAudioClient_Initialize(). Note that we use AUDCLNT_SHAREMODE_SHARED flag right here which signifies that we wish to configurate the buffer in shared mode. The buffer size parameter should be in 100-nanoseconds interval. Keep in thoughts that that is only a trace, and after the perform returns efficiently, we must always at all times get the precise buffer size chosen by WASAPI.

	int buffer_length_msec = 500;
	REFERENCE_TIME dur = buffer_length_msec * 1000 * 10;
	int mode = AUDCLNT_SHAREMODE_SHARED;
	int aflags = 0;
	IAudioClient_Initialize(shopper, mode, aflags, dur, dur, (void*)wf, NULL);

	u_int buf_frames;
	IAudioClient_GetBufferMeasurement(shopper, &buf_frames);
	buffer_length_msec = buf_frames * 1000 / wf->nSamplesPerSec;

WASAPI: Recording Audio in Shared Mode

We initialized the buffer, nevertheless it does not present us with an interface we are able to use to carry out I/O. In our case for recording streams we have now to get IAudioCaptureClient interface object from it.

	IAudioCaptureClient *capt;
	const GUID _IID_IAudioCaptureClient = {0xc8adbd64, 0xe71e, 0x48a0, {0xa4,0xde, 0x18,0x5c,0x39,0x5c,0xd3,0x17}};
	IAudioClient_GetService(shopper, &_IID_IAudioCaptureClient, (void**)&capt);

Preparation is full, we’re prepared to begin recording.

	IAudioClient_Start(shopper);

To get a bit of recorded audio information we name IAudioCaptureClient_GetBuffer(). It returns AUDCLNT_S_BUFFER_EMPTY error when there isn’t any unread information contained in the buffer. In this case we simply wait then strive once more. After we have processed the audio samples, we launch the information with IAudioCaptureClient_ReleaseBuffer().

	for (;;) {
		u_char *information;
		u_int nframes;
		u_long flags;
		int r = IAudioCaptureClient_GetBuffer(capt, &information, &nframes, &flags, NULL, NULL);

		if (r == AUDCLNT_S_BUFFER_EMPTY) {
			// Buffer is empty. Wait for extra information.
			int period_ms = 100;
			Sleep(period_ms);
			proceed;
		} else (r != 0) {
			// error
		}
		...
		IAudioCaptureClient_ReleaseBuffer(capt, nframes);
	}

WASAPI: Playing Audio in Shared Mode

Playing audio is similar to recording however we have to use one other interface for I/O. This time we cross IID_IAudioRenderClient identificator and get the IAudioRenderClient interface object.

	IAudioRenderClient *render;
	const GUID _IID_IAudioRenderClient = {0xf294acfc, 0x3146, 0x4483, {0xa7,0xbf, 0xad,0xdc,0xa7,0xc2,0x60,0xe2}};
	IAudioClient_GetService(shopper, &_IID_IAudioRenderClient, (void**)&render);
	...
	IAudioRenderClient_Release(render);

The regular playback operation is once we add some extra information into audio buffer in a loop as quickly as there’s some free area in buffer. To get the quantity of used area we name IAudioClient_GetCurrentPadding(). To get the quantity of free area we use the dimensions of our buffer (buf_frames) we received whereas opening the buffer. These numbers are in samples, not in bytes.

	u_int stuffed;
	IAudioClient_GetCurrentPadding(shopper, &stuffed);
	int n_free_frames = buf_frames - stuffed;

The perform units the variety of used area to 0 when the buffer is full. Now for the primary time we have now the total buffer should begin the playback.

	if (!began) {
		IAudioClient_Start(shopper);
		began = 1;
	}

We get the free buffer area with IAudioRenderClient_GetBuffer() and after we have stuffed it with audio samples we launch it with IAudioRenderClient_ReleaseBuffer().

	u_char *information;
	IAudioRenderClient_GetBuffer(render, n_free_frames, &information);
	...
	IAudioRenderClient_ReleaseBuffer(render, n_free_frames, 0);

WASAPI: Draining

We always remember to empty the audio buffer earlier than closing it in any other case the final audio information will not be performed as a result of we have not given it sufficient time. The algorithm is similar as for ALSA. We get the variety of samples nonetheless left to be performed, and when the buffer is empty the draining is full.

	for (;;) {
		u_int stuffed;
		IAudioClient_GetCurrentPadding(shopper, &stuffed);
		if (stuffed == 0)
			break;
		...
	}

In case our enter information was too small to even fill our audio buffer, we nonetheless have not began the playback at this level. We do it, in any other case IAudioClient_GetCurrentPadding() won’t ever sign us with “buffer empty” situation.

	if (!began) {
		IAudioClient_Start(shopper);
		began = 1;
	}

WASAPI: Error Reporting

The most WASAPI features return 0 on success and an error code on failure. The downside with this error code is that generally we won’t convert it to user-friendly error message instantly – we have now to do it manually. First, we test if it is AUDCLNT_E_* code. In this case we have now to set our personal error message relying on the worth. For instance, we might have an array of strings for every attainable AUDCLNT_E_* code. Don’t overlook index-out-of-bounds checks!

	int err = ...;
	if ((err & 0xffff0000) == MAKE_HRESULT(SEVERITY_ERROR, FACILITY_AUDCLNT, 0)) {
		err = err & 0xffff;
		static const char audclnt_errors[][39] = {
			"",
			"AUDCLNT_E_NOT_INITIALIZED", // 0x1
			...
			"AUDCLNT_E_RESOURCES_INVALIDATED", //  0x26
		};
		const char *error_name = audclnt_errors[err];
	}

But in case it isn’t a AUDCLNT_E_* code, we are able to get error message from Windows the same old means.

	wchar_t buf[255];
	int n = FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS | FORMAT_MESSAGE_MAX_WIDTH_MASK
		, 0, err, 0, buf, sizeof(buf)/sizeof(*buf), 0);
	if (n == 0)
		buf[0] = '';

And it is at all times good follow to retailer names of the features that return with an error. User should know which perform precisely has failed and which code it has returned together with error description.

FreeBSD and OSS

OSS is the default audio subsystem on FreeBSD and another OS. It was default on Linux too earlier than ALSA changed it. Some docs say that OSS layer continues to be supported on fashionable Linux, however I do not assume it is helpful for brand new software program. OSS API may be very easy evaluating to different API: we solely use commonplace syscalls. I/O with OSS is rather like I/O with common information, which makes OSS fairly straightforward to grasp and use.

Include essential header information:

	#embrace <sys/soundcard.h>
	#embrace <fcntl.h>
	#embrace <unistd.h>
	#embrace <stdlib.h>
	#embrace <string.h>
	#embrace <math.h>

OSS: Enumerating Devices

Open system mixer machine precisely the identical means as we open common information.

	int mixer = open("/dev/mixer", O_RDONLY, 0);
	...
	shut(mixer);

We talk with this machine by issuing instructions by way of ioctl(). Get the variety of registered units with SNDCTL_SYSINFO machine management code.

	oss_sysinfo si = {};
	ioctl(mixer, SNDCTL_SYSINFO, &si);
	int n_devs = si.numaudios;

We get the properties for every machine with SNDCTL_AUDIOINFO_EX.

	for (int i = 0;  i != n_devs;  i++) {
		oss_audioinfo ainfo = {};
		ainfo.dev = i;
		ioctl(mixer, SNDCTL_AUDIOINFO_EX, &ainfo);
		...
	}

Because we iterate over all units, each playback and recording, we should use oss_audioinfo::cap subject to filter what we’d like: PCM_CAP_OUTPUT means it is a playback machine and PCM_CAP_INPUT means it is a recording machine. We get different essential data, most significantly – machine ID, from the identical oss_audioinfo object.

	int is_playback_device = !!(ainfo.caps & PCM_CAP_OUTPUT);
	int is_capture_device = !!(ainfo.caps & PCM_CAP_INPUT);
	const char *device_id = ainfo.devnode;
	const char *device_name = ainfo.identify;

OSS: Opening Audio Buffer

We open audio machine with open() and get the machine descriptor. To use default machine we cross "/dev/dsp" string. We must cross the proper flags to the perform: O_WRONLY for playback as a result of we’ll write information to audio machine and O_RDONLY for recording as a result of we’ll learn from it. We may also use O_NONBLOCK flag right here which makes our descriptor non-blocking, i.e. learn/write features will not block and return instantly with EAGAIN error.

	const char *device_id = NULL;
	if (device_id == NULL)
		device_id = "/dev/dsp";
	int flags = (playback) ? O_WRONLY : O_RDONLY;
	int dsp = open(device_id, flags | O_EXCL, 0);
	...
	shut(dsp);

Let’s configure the machine for the audio format we wish to use. We cross the worth we wish to use to ioctl(), and it updates it on return with the precise worth that machine driver has set. Of course, this worth will be completely different from the one we handed. In actual code we should detect such circumstances and notify the consumer in regards to the format change or exit with an error.

	int format = AFMT_S16_LE;
	ioctl(dsp, SNDCTL_DSP_SETFMT, &format);

	int channels = 2;
	ioctl(dsp, SNDCTL_DSP_CHANNELS, &channels);

	int sample_rate = 44100;
	ioctl(dsp, SNDCTL_DSP_SPEED, &sample_rate);

To set audio buffer size we first get “fragment size” property for our machine. Then we use this worth to transform the buffer size to the variety of fragments. Fragments aren’t audio frames, fragment dimension will not be the dimensions of a pattern! Then we set the variety of fragments with SNDCTL_DSP_SETFRAGMENT management code. Note that we are able to skip this part if we do not wish to set our personal buffer size and use the default buffer size.

	audio_buf_info data = {};
	if (playback)
		ioctl(dsp, SNDCTL_DSP_GETOSPACE, &data);
	else
		ioctl(dsp, SNDCTL_DSP_GETISPACE, &data);
	int buffer_length_msec = 500;
	int frag_num = sample_rate * 16/8 * channels * buffer_length_msec / 1000 / data.fragsize;
	int fr = (frag_num << 16) | (int)log2(data.fragsize); // buf_size = frag_num * 2^n
	ioctl(dsp, SNDCTL_DSP_SETFRAGMENT, &fr);

We’ve completed making ready the machine. Now we get the precise buffer size with SNDCTL_DSP_GETOSPACE for playback or SNDCTL_DSP_GETISPACE for recording streams.

	audio_buf_info data = {};
	int r;
	if (playback)
		r = ioctl(dsp, SNDCTL_DSP_GETOSPACE, &data);
	else
		r = ioctl(dsp, SNDCTL_DSP_GETISPACE, &data);
	buffer_length_msec = data.fragstotal * data.fragsize * 1000 / (sample_rate * 16/8 * channels);
	int buf_size = data.fragstotal * data.fragsize;
	frame_size = 16/8 * sample_rate * channels;

Finally, we allocate the buffer of the required dimension.

	void *buf = malloc(buf_size);
	...
	free(buf);

OSS: Recording Audio

There’s nothing simpler than audio I/O with OSS. We use the same old learn() perform passing to it our audio buffer and the utmost variety of bytes out there inside it. It returns the variety of bytes learn. The perform additionally blocks the execution when the buffer is empty, so there isn’t any want in calling sleep features for us.

	int n = learn(dsp, buf, buf_size);

OSS: Playing Audio

For playback streams we first write audio samples to our buffer, then cross this area to the machine with write(). It returns the variety of bytes truly written. The features blocks execution when the buffer is full.

	int n = write(dsp, buf, n);

OSS: Draining

To drain the buffer we simply use SNDCTL_DSP_SYNC management code. It blocks till the playback is full.

	ioctl(dsp, SNDCTL_DSP_SYNC, 0);

OSS: Error Reporting

On failure, open(), ioctl(), learn() and write() return with a damaging worth and set errno. We can convert the code into an error message as traditional with strerror().

	int err = ...;
	const char *error_message = strerror(err);

macOS and CoreAudio

CoreAudio is the default sound subsystem on macOS and iOS. I’ve little expertise with it as a result of I do not like Apple’s merchandise. I’m simply displaying you the way in which it labored for me, however theoretically there could also be higher options than mine. The essential contains are:

	#embrace <CoreAudio/CoreAudio.h>
	#embrace <CoreFoundation/CFString.h>

When linking we cross -framework CoreFoundation -framework CoreAudio linker flags.

CoreAudio: Enumerating Devices

We get the array of audio units with AudioObjectGetPropertyData(). But first we have to know the minimal variety of bytes to allocate for the array – we get the required dimension with AudioObjectGetPropertyDataMeasurement().

	const AudioObjectPropertyAddress prop_dev_list = { okAudioHardwarePropertyDevices, okAudioObjectPropertyScopeGlobal, okAudioObjectPropertyElementMaster };
	u_int dimension;
	AudioObjectGetPropertyDataMeasurement(okAudioObjectSystemObject, &prop_dev_list, 0, NULL, &dimension);

	AudioObjectID *devs = (AudioObjectID*)malloc(dimension);
	AudioObjectGetPropertyData(okAudioObjectSystemObject, &prop_dev_list, 0, NULL, &dimension, devs);

	int n_dev = dimension / sizeof(AudioObjectID);
	...
	free(devs);

Then we iterate over the array to get the machine ID.

	for (int i = 0;  i != n_dev;  i++) {
		AudioObjectID device_id = devs[i];
		...
	}

For every machine we are able to get a user-friendly identify, however we have now to transform the CoreFoundation’s string object to a NULL-terminated string with CFStringGetCString().

	const AudioObjectPropertyAddress prop_dev_outname = { okAudioObjectPropertyName, okAudioDevicePropertyScopeOutput, okAudioObjectPropertyElementMaster };
	const AudioObjectPropertyAddress prop_dev_inname = { okAudioObjectPropertyName, okAudioDevicePropertyScopeInput, okAudioObjectPropertyElementMaster };
	const AudioObjectPropertyAddress *prop = (playback) ? &prop_dev_outname : &prop_dev_inname;
	u_int dimension = sizeof(CFStringRef);
	CFStringRef cfs;
	AudioObjectGetPropertyData(devs[i], prop, 0, NULL, &dimension, &cfs);

	CFIndex len = CFStringGetMaximumSizeForEncoding(CFStringGetSize(cfs), kCFStringEncodingUTF8);
	char *device_name = malloc(len + 1);
	CFStringGetCString(cfs, device_name, len + 1, kCFStringEncodingUTF8);
	CFRelease(cfs);
	...
	free(device_name);

CoreAudio: Opening Audio Buffer

If we wish to use the default machine, here is how we are able to get its ID.

	AudioObjectID device_id;
	const AudioObjectPropertyAddress prop_odev_default = { okAudioHardwarePropertyDefaultOutputDevice, okAudioObjectPropertyScopeGlobal, okAudioObjectPropertyElementMaster };
	const AudioObjectPropertyAddress prop_idev_default = { okAudioHardwarePropertyDefaultInputDevice, okAudioObjectPropertyScopeGlobal, okAudioObjectPropertyElementMaster };
	const AudioObjectPropertyAddress *a = (playback) ? &prop_odev_default : &prop_idev_default;
	u_int dimension = sizeof(AudioObjectID);
	AudioObjectGetPropertyData(okAudioObjectSystemObject, a, 0, NULL, &dimension, &device_id);

Get the supported audio format. It appears that CoreAudio makes use of float32 samples by default.

	const AudioObjectPropertyAddress prop_odev_fmt = { okAudioDevicePropertyStreamFormat, okAudioDevicePropertyScopeOutput, okAudioObjectPropertyElementMaster };
	const AudioObjectPropertyAddress prop_idev_fmt = { okAudioDevicePropertyStreamFormat, okAudioDevicePropertyScopeInput, okAudioObjectPropertyElementMaster };
	AudioStreamBasicDescription asbd = {};
	u_int dimension = sizeof(asbd);
	const AudioObjectPropertyAddress *a = (playback) ? &prop_odev_fmt : &prop_idev_fmt;
	AudioObjectGetPropertyData(device_id, a, 0, NULL, &dimension, &asbd);
	int sample_rate = asbd.mSampleFee;
	int channels = asbd.mChannelsPerFrame;

Create the buffer with 500ms audio size. Note that we use our personal ring buffer right here to switch information between the callback perform and our I/O loop.

	int buffer_length_msec = 500;
	int buf_size = 32/8 * sample_rate * channels * buffer_length_msec / 1000;
	ring_buf = ringbuf_alloc(buf_size);
	...
	ringbuf_free(ring_buf);

Register I/O callback perform which will probably be known as by CoreAudio when there’s some extra information for us (for recording) or when it needs to learn some information from us (for playback). We can cross our ring buffer as a user-parameter. The return worth is a pointer which we later use to manage the stream.

	AudioDeviceIOProcID io_proc_id = NULL;
	void *udata = ring_buf;
	AudioMachineCreateIOProcID(device_id, proc, udata, &io_proc_id);
	...
	AudioDeviceDestroyIOProcID(device_id, io_proc_id);

The callback perform appears like this:

	OSStatus io_callback(AudioDeviceID machine, const AudioTimeStamp *now,
		const AudioBufferListing *indata, const AudioTimeStamp *intime,
		AudioBufferListing *outdata, const AudioTimeStamp *outtime,
		void *udata)
	{
		...
		return 0;
	}

CoreAudio: Recording Audio

We begin recording with AudioMachineStart().

	AudioMachineStart(device_id, io_proc_id);

Then after a while our callback perform is known as. While inside it, we should add all audio samples to our ring buffer.

	const float *d = indata->mBuffers[0].mData;
	size_t n = indata->mBuffers[0].mDataByteSize;

	ringbuf *ring = udata;
	ringbuf_write(ring, d, n);
	return 0;

In our I/O loop we attempt to learn some information from the buffer. If the buffer is empty, we wait, after which strive once more. My ring buffer implementation right here permits us to make use of the buffer instantly. We get the buffer area, course of it, after which launch it.

	ringbuffer_chunk buf;
	size_t h = ringbuf_read_begin(ring_buf, -1, &, NULL);
	if (.len == 0) {
		// Buffer is empty. Wait till some new information is out there
		int period_ms = 100;
		usleep(period_ms*1000);
		proceed;
	}
	...
	ringbuf_read_finish(ring_buf, h);

CoreAudio: Playing Audio

Inside the callback perform we write audio samples from our ring buffer to CoreAudio’s buffer. Note that we learn from the buffer 2 instances, as a result of as soon as we attain the top of reminiscence area in our ring buffer, we have now to proceed from the start. In case there wasn’t sufficient information in our buffer we cross silence (information area crammed with zeros) in order that there aren’t any audible surprises when this information is performed.

	float *d = outdata->mBuffers[0].mData;
	size_t n = outdata->mBuffers[0].mDataByteSize;

	ringbuf *ring = udata;
	ringbuffer_chunk buf;

	size_t h = ringbuf_read_begin(ring, n, &buf, NULL);
	memcpy(buf.ptr, d, buf.len);
	ringbuf_read_finish(ring, h);
	d = (char*)d + buf.len;
	n -= buf.len;

	if (n != 0) {
		h = ringbuf_read_begin(ring, n, &buf, NULL);
		memcpy(buf.ptr, d, buf.len);
		ringbuf_read_finish(ring, h);
		d = (char*)d + buf.len;
		n -= buf.len;
	}

	if (n != 0)
		memset(d, 0, n);

In our major I/O loop we first get the free buffer area the place we write new audio samples. When the buffer is full we begin the stream for the primary time and wait till our callback perform is known as.

	ringbuffer_chunk buf;
	size_t h = ringbuf_write_begin(ring_buf, 16*1024, &buf, NULL);

	if (buf.len == 0) {
		if (!began) {
			AudioMachineStart(device_id, io_proc_id);
			began = 1;
		}

		// Buffer is full. Wait.
		int period_ms = 100;
		usleep(period_ms*1000);
		proceed;
	}

	...
	ringbuf_write_finish(ring_buf, h);

CoreAudio: Draining

To drain the buffer we simply wait till our ring buffer is empty. When it’s, I cease the stream with AudioDeviceStop(). Remember that in case the enter information was lower than the dimensions of our buffer, our stream is not but began. We begin it with AudioMachineStart() if it is the case.

	size_t free_space;
	ringbuffer_chunk d;
	ringbuf_write_begin(ring_buf, 0, &d, &free_space);

	if (free_space == ring_buf->cap) {
		AudioDeviceStop(device_id, io_proc_id);
		break;
	}

	if (!began) {
		AudioMachineStart(device_id, io_proc_id);
		began = 1;
	}

	// Buffer is not empty. Wait.
	int period_ms = 100;
	usleep(period_ms*1000);

Final Results

I believe we coated the commonest audio API and their use-cases, I hope that you have realized one thing new and helpful. There are few issues, although, that did not make it into this tutorial:

  • ALSA’s SIGIO notifications. Not all units assist this, so far as I do know.

  • WASAPI notifications by way of Windows occasions. This is simply helpful for real-time low-latency apps.

  • WASAPI unique mode, loopback mode. Explaining the small print round opening audio buffer in unique mode would make me make investments a lot extra time, that I’m undecided I can do it proper now. And the loopback mode will not be cross-platform. You can find out about these items, in order for you, by studying official docs or ffaudio’s supply code for instance.

The official API documentation hyperlinks:

P.S. I used some pictures from right here (due to the authors!):



Source hyperlink

Leave a Reply

Your email address will not be published.