How Opus Decoder Determines Audio Channels

This article explains how the libopus decoder identifies the number of audio channels in an encoded Opus payload. It covers the structure of the Opus Table of Contents (TOC) byte, the internal configuration mapping that dictates mono or stereo modes, and how multi-channel configurations are handled using multistream decoding.

The Table of Contents (TOC) Byte

At the very beginning of every encoded Opus packet lies the Table of Contents (TOC) byte. The libopus decoder parses this single byte before decoding any audio data to determine critical properties of the payload, including the operating mode, frame duration, and the channel count (mono or stereo).

The TOC byte is structured as follows: * Bits 0–4 (5 bits): The configuration (config) number. * Bit 5 (1 bit): The stereo/mono flag (implicitly defined within the configuration code). * Bits 6–7 (2 bits): The frame count code (s), which indicates how many frames are packed into the payload.

The Configuration Table Lookup

The 5-bit config parameter yields 32 possible configurations (0 to 31). The Opus specification (RFC 6716) defines a strict lookup table that maps each of these 32 configurations to a specific combination of: 1. Audio Engine: SILK (speech-optimized), CELT (music-optimized), or Hybrid. 2. Audio Bandwidth: Narrowband, mediumband, wideband, super-wideband, or fullband. 3. Frame Sizes: Ranges from 2.5 ms to 60 ms. 4. Channels: Mono (1 channel) or Stereo (2 channels).

The libopus decoder extracts the config bits from the TOC byte and matches them against this internal table. For example: * Configurations 0 through 3 indicate SILK mode, narrowband, with mono channels. * Configurations 4 through 7 indicate SILK mode, narrowband, with stereo channels. * Configurations 16 through 19 indicate CELT mode, mono channels. * Configurations 20 through 23 indicate CELT mode, stereo channels.

Through this simple bitmasking and table lookup, the decoder instantly knows whether the incoming packet contains one or two channels of audio.

Multistream Decoding (More than 2 Channels)

Standard Opus packets only support mono or stereo audio. To decode surround sound or multi-channel audio (such as 5.1 or 7.1 spatial audio), libopus utilizes a wrapper API called the Multistream API (opus_multistream_decoder).

In a multistream scenario, the channel count cannot be determined solely by the TOC byte of a single payload. Instead, channel determination occurs during the initialization of the decoder:

  1. Ogg Container Headers: In encapsulated formats like Ogg, an Identification Header (ID Header) preceding the audio packets explicitly defines the total channel count and a channel mapping family.
  2. Mapping API: The application passes this channel count and mapping family to opus_multistream_decoder_create().
  3. Stream Demuxing: The multistream decoder splits the incoming multiplexed payload into multiple independent mono or stereo sub-packets. It then decodes each sub-packet using standard libopus decoders (which use the TOC byte method described above) and maps the decoded channels back to the requested multi-channel output layout.