How Opus Handles Packet Padding and Raw Bits

This article explains how the libopus packetization standard processes raw padding bits and bytes within its payload structure. We will explore the mechanisms of packet-level padding using the Code 3 configuration, as well as frame-level padding where the range coder and raw bits (used in the CELT band) interface from opposite ends of the bitstream, leaving unused padding bits in the middle.

Packet-Level Padding (The Opus Parser)

At the packet level, Opus allows padding to be added to the end of a payload. This is primarily used for maintaining a Constant Bit Rate (CBR) or matching Maximum Transmission Unit (MTU) size constraints over a network. This padding is governed by the Table of Contents (TOC) byte and is only available in “Code 3” packets (packets containing an arbitrary number of frames).

In a Code 3 packet, the second byte of the payload contains a padding flag (p) at bit 1 (the second most significant bit).

When the padding flag is set, the parser reads the byte(s) immediately following the frame count byte to determine the total length of the padding. Opus uses a simple additive approach to encode this length: 1. The parser reads a byte. 2. If the byte value is 255, it adds 255 to the running total and reads the next byte. 3. This process repeats until a byte with a value less than 255 is encountered. This final byte is added to the total. 4. The resulting sum indicates the total number of padding bytes located at the very end of the packet.

The parser safely strips or skips these designated bytes at the end of the packet, preventing them from being passed to the decoder core.

Frame-Level Padding and Raw Bits (The CELT Layer)

At the individual frame level—specifically within the CELT (Constrained Energy Lapped Transform) codec layer—padding is handled through a dual-direction bit-packing architecture.

During the compression of a single frame, data is written into a fixed-size byte buffer using two different methods:

1. The Forward Stream (Range Coder)

The entropy coder (range coder) encodes the majority of the audio parameters (such as the spectrum shape and fine energy) and writes this data starting from the beginning of the frame buffer, moving forward.

2. The Backward Stream (Raw Bits)

Certain parameters do not benefit from entropy coding and are written as raw, uncompressed bits to save processing overhead. These include sign bits, coarse energy parameters, and explicit allocation flags. To prevent these raw bits from interfering with the variable-length range-coded data, the encoder packs them starting from the very end of the frame allocation, moving backward (starting with the least significant bit of the last byte).

The Unused Bit Gap

Because the range-coded data grows forward and the raw bits grow backward, they eventually converge. The space remaining between the end of the range-coded stream and the start of the raw bits represents unused padding bits.

During decoding, the CELT decoder decodes the forward stream until all symbolic data is parsed, and pulls the raw bits from the end of the buffer moving backward. Any leftover bits in the gap between these two streams are treated as raw padding. The decoder ignores this remaining gap, ensuring that any alignment padding or unused allocation does not corrupt the decoded audio.