Opus Bitrate Threshold for SILK to Hybrid Mode
This article provides a clear and direct explanation of the specific bitrate thresholds at which the libopus audio codec transitions from its voice-optimized SILK-only mode to its Hybrid mode. You will learn how the codec manages these internal state changes to balance bandwidth, audio quality, and compression efficiency.
The SILK to Hybrid Mode Transition Threshold
In the Opus codec (specifically the standard libopus
implementation), the transition from SILK-only mode to Hybrid mode
typically occurs at a bitrate threshold of 16 kbps
(specifically between 15.2 kbps and 16 kbps, depending on frame size and
audio bandwidth).
To understand how this threshold functions, it is helpful to look at how libopus categorizes and processes different audio bandwidths:
- Below 16 kbps: For ultra-low bitrates, libopus relies entirely on SILK-only mode to encode narrowband (NB) and wideband (WB) speech. SILK uses linear predictive coding (LPC) which is highly efficient for human voice but struggles with high-frequency music or full-range audio.
- At 16 kbps to 32 kbps: Once the target bitrate reaches approximately 16 kbps, the encoder transitions to Hybrid mode for super-wideband (SWB) and fullband (FB) signals.
- Above 32 kbps: The codec typically transitions fully to CELT-only mode to deliver high-fidelity, low-latency stereo or multi-channel audio.
How Hybrid Mode Works
When the 16 kbps threshold is crossed and Hybrid mode is activated, libopus splits the audio spectrum into two distinct bands:
- Lower Frequencies (0 to 8 kHz): This band is processed using the SILK engine. Because SILK is highly optimized for the fundamental frequencies of human speech, it handles the core voice data with extreme bit efficiency.
- Higher Frequencies (above 8 kHz): This band is processed using the CELT engine. CELT uses a transform-based approach (MDCT) to capture the upper harmonics, air, and high-frequency details that SILK cannot efficiently encode.
By combining both technologies at the 16 kbps threshold, libopus achieves high-quality, fullband speech and mixed-content encoding at bitrates that would normally be too low for a pure transform codec like CELT, and too complex for a pure predictive codec like SILK.