How Libopus Decoder Packet Loss Concealment Works

Real-time audio streaming over IP networks is highly susceptible to packet loss, which can cause jarring audio dropouts and clicks. To maintain a smooth listening experience, the standard libopus decoder implements Packet Loss Concealment (PLC), a suite of computational algorithms that reconstructs missing audio data. This article explains how the decoder uses linear prediction, waveform extrapolation, spectral shaping, and smooth cross-fading to mask lost packets depending on whether the codec is operating in its speech-focused SILK mode or its music-focused CELT mode.

Because Opus is a hybrid codec, libopus employs two distinct PLC strategies tailored to the characteristics of the audio being decoded.

SILK Mode PLC (Speech Optimization)

When processing speech in SILK mode, libopus relies on Linear Predictive Coding (LPC). Speech is highly redundant and structured around vocal tract resonances (formants) and pitch. When a packet is lost, the decoder utilizes the LPC coefficients from the last successfully received packet to maintain the vocal tract model.

To generate the excitation signal that drives this LPC filter, the decoder performs pitch estimation on the historical audio buffer. If a pitch period is identified, the decoder extrapolates the previous excitation signal by repeating it at the estimated pitch interval. To prevent the audio from sounding artificially “buzzy” or metallic if the packet loss persists, the decoder gradually dampens (attenuates) the amplitude of the generated signal and injects a pseudo-random noise component. This simulated excitation is then passed through the LPC synthesis filter to produce natural-sounding, albeit slightly degraded, speech.

CELT Mode PLC (Music and General Audio Optimization)

For music and transient-heavy audio processed in CELT mode, pitch-based linear prediction is less effective. CELT is a transform-based codec (using the Modified Discrete Cosine Transform, or MDCT), and its PLC mechanism operates primarily in the frequency domain, transitioning to the time domain for smoothing.

When a CELT packet is lost, the decoder first attempts a time-domain pitch predictor (TDGP) extrapolation for the first lost frame to preserve the harmonic structure of music. If packet loss continues beyond a few milliseconds, the decoder transitions to noise substitution. It generates pseudo-random noise but shapes its spectrum to match the energy envelope of the last received good frame. This ensures that the concealed audio has the same spectral color and loudness as the preceding audio, avoiding sudden silent gaps.

Transition and Resynchronization

A critical phase of PLC is the transition back to normal decoding once packets begin arriving again. Simply playing the newly received packet immediately after a concealed block would cause a phase mismatch, resulting in an audible click or pop.

To prevent this, libopus performs a cross-fade (overlap-add) between the extrapolated concealment signal and the newly decoded actual audio. Furthermore, the decoder continuously updates its internal filter memories and state variables during the concealment phase. This minimizes the state drift between the encoder and decoder, ensuring that when true data resumes, the decoding matrix is already synchronized and aligned.