Libopus Internal Resampling Explained
This article explains how the libopus audio codec
manages internal resampling when input audio rates differ from its
native processing rates. It covers the architectural division between
the SILK and CELT layers, the mathematical design of its internal
resamplers, and how it maintains low latency and high audio fidelity
during conversion.
Supported Input Rates vs. Internal Processing
The Opus API natively accepts only five input sampling rates: 8, 12, 16, 24, and 48 kHz. If an application uses an unsupported rate (such as 44.1 kHz), the application itself must resample the audio before passing it to the encoder.
Once inside libopus, however, the encoder frequently
needs to convert these supported rates to match the operating
requirements of its two internal engines: SILK (optimized for voice at
8, 12, or 16 kHz) and CELT (optimized for music and general audio at 48
kHz).
The Dual-Resampler Architecture
To balance CPU efficiency and audio quality, libopus
uses two distinct internal resampling mechanisms depending on which mode
(SILK or CELT) is active.
1. SILK Resampler (IIR and FIR Hybrid)
The SILK engine is designed for low-power embedded voice applications. To keep computational complexity to a minimum, its internal resampler relies heavily on Infinite Impulse Response (IIR) and Finite Impulse Response (FIR) hybrid structures: * All-pass IIR Decimation: For downsampling (e.g., converting 48 kHz input to 16 kHz for voice encoding), SILK uses cascaded, second-order all-pass IIR filters. These filters provide low group delay and excellent stopband rejection with a fraction of the CPU cycles required by pure FIR filters. * Matrix-based FIR Interpolation: For upsampling, SILK combines low-order FIR interpolation with rational factor conversion (e.g., 2:3 or 1:2 ratios) using highly optimized coefficient tables stored in ROM.
2. CELT/Standard Polyphase Resampler
When operating in CELT or hybrid modes, libopus handles
audio at a native 48 kHz processing rate. If the input sample rate is
lower (such as 16 or 24 kHz), it must be upsampled to 48 kHz.
CELT utilizes a highly optimized polyphase FIR filter bank derived from the Speex resampler. This method operates as follows: * Sinc Interpolation: The resampler uses a windowed sinc function to calculate the values of the new samples. * Polyphase Decomposition: To avoid the massive overhead of zero-stuffing and high-rate filtering, the filter is split into a bank of smaller sub-filters (phases). Only the sub-filter corresponding to the required output fractional sample is evaluated. * Rational Factors: Since the input rates (8, 12, 16, 24 kHz) are exact divisors of 48 kHz, the resampling ratios are simple integers (1:6, 1:4, 1:3, 1:2). This allows the polyphase filter to run with minimal phase steps, reducing memory overhead and instruction counts.
Latency and Performance Optimization
Because Opus is designed for interactive, real-time communication, its internal resamplers are engineered to introduce virtually zero audible latency.
- SIMD Optimization: The resampler code contains hand-written assembly and SIMD intrinsics (such as ARM NEON and x86 SSE) to process multiple audio samples in parallel.
- Fixed-Point vs. Floating-Point:
libopuscan be compiled in either fixed-point (for mobile/embedded CPUs lacking a fast FPU) or floating-point math. The resampler dynamically scales its filter precision based on this configuration, ensuring optimal performance on any hardware platform.