How Libopus Encodes Audio Frame Duration

This article explains how the Opus audio codec (libopus) cleanly and efficiently encodes audio frame duration within its initial packet header. By analyzing the structure of the Table of Contents (TOC) byte, we will explore how libopus uses configuration bits and frame count codes to define the precise duration of audio frames and packets without introducing unnecessary data overhead.


The Opus codec (standardized in RFC 6716) is designed for low overhead and interactive real-time applications. To minimize latency and bandwidth, libopus does not use complex nested headers to describe packet contents. Instead, every Opus packet begins with a single, mandatory Table of Contents (TOC) byte that tells the decoder exactly how to interpret the packet, including the duration of the audio frames.

The TOC Byte Structure

The TOC byte is divided into three distinct bitfields:

 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| config    |s|c|
+-+-+-+-+-+-+-+-+

The combination of the config bits and the c bits allows the decoder to instantly calculate both the duration of an individual frame and the total duration of the packet.


Step 1: Determining Frame Duration via the config Bits

The first five bits (config) map to a predefined table in the Opus specification. This configuration number identifies: 1. The codec mode: SILK (speech-optimized), CELT (music/low-latency-optimized), or Hybrid. 2. The audio bandwidth: Narrowband, Mediumband, Wideband, Super-wideband, or Fullband. 3. The frame size (duration): The base duration of a single audio frame in the packet.

Depending on the configuration number, the base frame duration can be 2.5, 5, 10, 20, 40, or 60 milliseconds. Because these configurations are hardcoded into the codec standard, the decoder only needs to read these 5 bits to know the exact duration of a single frame.


Step 2: Determining Frame Count via the c Bits

The final two bits of the TOC byte (c) define the frame count code. This code dictates how many frames of the duration specified by the config bits are bundled into the packet:


Calculating Total Packet Duration

By combining the frame duration (derived from config) and the frame count (derived from c), the decoder calculates the total packet duration using a simple formula:

\[\text{Total Packet Duration} = \text{Frame Duration} \times \text{Frame Count}\]

For safety and stability, the Opus specification enforces a maximum total packet duration of 120 milliseconds. Any combination of configuration and frame count that exceeds 120 ms is considered invalid, and the decoder will discard the packet.

Through this elegant bit-allocation design, libopus packages all necessary temporal information into the very first byte of the stream, allowing decoders to allocate buffers and manage audio synchronization with minimal CPU cycles.