Audio Latency demystified, part I

In case you haven’t noticed, here at MIND we care a lot about audio latency. One of the core features of the Elk Audio OS is providing ultra low latency processing while still retaining the ease of development and flexibility of a traditional general-purpose Linux system.

This is the first in a series of four posts where we’ll dig into more details on the latency problem that most general-purpose OS suffer from and what Elk’s approach to solve it is. We start by looking at the most common sources of latency in a conventional general-purpose system and how they impact the user experience.

Background: the audio processing chain

Modern CPUs are capable of running sophisticated signal processing algorithms at very high speed and should be well suited to use in digital musical instruments. Now if all the CPU had to do was to run Digital Signal Processing (DSP algorithms, everything would be fine. Unfortunately, modern multitasking operating systems, while being extremely efficient at what they do, need to do a million things at once. This includes networking, managing peripherals, drivers, memory, drawing graphical user interfaces and running multiple applications, all at once to give the user the illusion that all these things are happening simultaneously. Modern operating systems are fantastic at multitasking, but it comes at a price of being unable to give any strict guarantees on timing.

Real-time audio/signal processing is very much a use case that has strict timing requirements. Real-time in this case meaning that a request always needs to be served within a given time frame, no exceptions. So what exactly is a request in our scenario? A diagram helps to illustrate the situation:

Audio processing usually happens in blocks of a certain number of samples (denoted as N in the picture) and with a given sampling frequency, for example 44.1 or 48 KHz. At a regular time intervals equal to the block size over the sampling rate, these things happen:

For example, with a buffer size of 48 samples and at sampling frequency of 48 KHz, once every one millisecond (= 48 / 48000), the CPU needs to compute the output block. This is the real-time request that needs to be completed before the next block arrives, i.e. within one millisecond in this example. The total round-trip latency is then 2 milliseconds due to the nature of the double-buffering scheme. The actual value is usually larger because it will include delays induced by the converters (ADC and DAC) and, as it is often the case in general-purpose systems, by additional buffering stages.

If the OS fails to give enough priority to the process doing the signal processing, even once, resulting in the block of samples not being delivered to the audio interface in time, the consequence will be unacceptable glitches and distortions in the sound. And even if the operating system manages to serve this request within the given time 99.9% of the time, this will still result in unacceptable sound quality. In order to be safe here, what we would need are strict real-time guarantees and that is something Linux cannot provide.

This is a common problem inherent to all modern multitasking operating systems, including Linux, Windows, macOS and more. It is also the main reason why musicians and others that crave low latency and glitch free sound buy specialised audio interfaces with custom drivers and frameworks (ASIO, Jack) and spend hours tweaking their systems. And still, the non-real time nature of the OS may prevent you from stressing the CPU to more than 60-70% of its maximum load before audio glitches start to appear.

This has previously forced system designers to go for DSP (Digital signal processor) based systems or similar processing units usually running a RTOS (Real-time operating system). Digital signal processors are processors specifically built for signal processing and Real-time systems are usually targeted towards embedded control systems. Hence a combination of RTOS + DSP can offer extremely good performance in terms of meeting all the timing related requirements, though they lack the flexibility and ease of development that a more general purpose operating system like Linux has to offer. The development cycle on Linux can be much shorter and it already has an existing, well established ecosystem of tools and frameworks for developers. Which means lower development effort  and in turn lower development costs. Hence, it makes a very good case for using Linux for real-time audio processing and attempting to tweak it to match the performance of DSP and RTOS but in a much easier way.

Typical latency sources in operating systems

It is clear that, ideally, we would like to keep the buffer size N as small as possible in order to achieve the low-latency required by professional audio applications. So what are the obstacles that prevent us to achieve that in general-purpose operating systems? The following timing diagram shows what happens inside the CPU when it receives a real-time request to process an audio block:

When the audio Interrupt Request (IRQ) arrives, the CPU might be busy doing something else that often can’t be interrupted immediately. Therefore, there is a task scheduling latency before the CPU can handle the request in the context of an Audio Driver running in Kernel Space. Typically, the audio driver sets up the asynchronous transfers between the I/O buffers and the converters and few other things before signaling the userspace Audio Processing Task to start doing its computations on its input buffer.

As most general-purpose OSes use some variant of preemptive scheduling, the operating system can potentially come in at any time and interrupt the audio processing task for some time before restoring its execution. From our perspective, all these interruptions result in wasted CPU time;not doing the real work that we care about, which is doing some nice processing on the digital audio signal. Moreover, there could be a large variance in the amount of these wasted time slices but, for professional audio applications, we just care about the worst-case scenario, even if it happens rarely. You can easily see how it’s easy to miss the deadline if too many of these interruptions happen during a single interrupt.

There is another way to look at the graph, from the perspective of efficient CPU usage. At a given buffer size N, having less interruptions from the OS means that the green area available for audio processing is greater. As an obvious consequence, having an OS optimized for low-latency also means having an efficient usage of your CPU, so you can for example run more FX plugins or have more voices in your synthesizer. And if you are running your code on a multicore CPU, you have better opportunities for parallelization because, for example, you can split the computation of a serial audio chain into multiple cores. This operation typically adds latency due to the double-buffering needed between each stage. However, if you are already starting with a very low buffer size, the overall latency might still be good enough for your application scenario.

Conclusions

We hope we have given you a good overview of the most important issues regarding audio and latency from an operating system’s perspective. It’s also important to note that the audio programming task might add latency of its own, for example if it uses extra buffering for some kind of algorithms or if it doesn’t follow real-time best practices. Ross Bencina has an excellent introduction to common real-time problems from the perspective of userspace applications such as audio plugins.


Stay tuned for the next chapter in this series which will cover the Task scheduling latency and the common Linux approaches to real-time in more detail.