Libopus Decoding of Out-of-Band FEC Data

This article explains how the libopus decoding engine processes and handles Forward Error Correction (FEC) data to reconstruct lost audio packets. It covers the detection of packet loss by the hosting application, the specific decoder API calls required to trigger FEC reconstruction, how the SILK and CELT engines interpret the bitstream, and the fallback to Packet Loss Concealment (PLC) when FEC data is unavailable.

In the Opus audio codec codec suite, Forward Error Correction (FEC) is primarily embedded within the SILK layer of the bitstream as Low Bit-Rate Redundancy (LBRR). Although the redundant data is technically packaged within the subsequent packet’s bitstream (often referred to as in-band FEC), the hosting application must manage it “out-of-band” in relation to the standard sequential decoding pipeline. This process relies on a coordinated handoff between the network’s jitter buffer and the libopus decoding engine.

Step 1: Packet Loss Detection and Jitter Buffer Coordination

The libopus decoder itself is stateless regarding network transport; it does not know if packets are lost unless the hosting application informs it. When the application’s jitter buffer detects that packet \(N-1\) is missing but packet \(N\) has arrived safely, it initiates the FEC recovery sequence. Instead of passing packet \(N\) for immediate playback, the application uses packet \(N\) to reconstruct the missing packet \(N-1\).

Step 2: Triggering the FEC Decoder Path

To decode the redundant FEC data contained within packet \(N\), the application calls the standard decoding function—opus_decode() or opus_decode_float()—with specific arguments: * The data parameter is populated with the payload of the currently received packet (packet \(N\)). * The decode_fec flag is explicitly set to 1 (true). * The frame_size parameter must match the expected duration of the lost packet (\(N-1\)).

By setting decode_fec to 1, the decoder is instructed to ignore the primary audio payload of packet \(N\) and instead search its bitstream headers for the redundant LBRR data representing packet \(N-1\).

Step 3: Parsing the Bitstream (SILK Layer)

Once the FEC path is triggered, libopus parses the payload of packet \(N\). * SILK Frames: If the packet contains SILK frames (used for voice), the decoder checks the LBRR flag in the channel header. If the encoder previously determined that the frame was at risk of packet loss and contained highly voiced content, it will have written low-bitrate redundant parameters into this section. * CELT Frames: The CELT layer (used for music and high-frequency audio) does not natively support LBRR/FEC due to the high bitrate overhead. If the stream is operating in CELT-only mode, the decoder will find no FEC data.

Step 4: Reconstructing the Lost Frame

If valid LBRR data is present in the SILK layer, libopus decodes this highly compressed representation of packet \(N-1\). The resulting audio is lower in quality than the original packet would have been, but it preserves the pitch, energy, and spectral parameters of the voice.

This reconstructed audio is written to the output buffer, and the internal states of the decoder’s prediction filters are updated. Updating these states is critical because it ensures that when packet \(N\) is eventually decoded, the transition between the recovered frame and the new frame is smooth and free of audible phase jumps or clicks.

Step 5: Fallback to Packet Loss Concealment (PLC)

If the decoder find that packet \(N\) does not contain FEC data (either because the encoder chose not to send it, or the stream is in CELT mode), it falls back to Packet Loss Concealment (PLC).

In this scenario, the decoder generates a replacement frame by extrapolating the period and spectral shape of the last successfully decoded frame, gradually muting the volume if losses persist.

Step 6: Sequential Decoding of the Current Packet

After the lost frame (\(N-1\)) is successfully reconstructed via FEC (or concealed via PLC) and played back, the application immediately calls the decoder a second time. This time, it passes packet \(N\) with the decode_fec flag set to 0. The decoder then processes packet \(N\) normally, generating the high-quality target audio and advancing the playback timeline seamlessly.