How Ogg Encapsulation Maps to Libopus Packets
This article provides a technical overview of how the Ogg container format encapsulates libopus compressed audio packets. We will examine the exact mapping relationship between Ogg packets and libopus packets, the structuring of Ogg pages, and the mandatory header sequences defined by RFC 7845 that allow decoders to successfully reconstruct and play back the audio stream.
The One-to-One Packet Mapping
In an Ogg Opus stream, the mapping between Ogg packets and libopus packets is highly straightforward: exactly one Ogg packet maps to exactly one libopus packet.
A libopus packet is a self-contained unit of compressed audio data that contains a Table of Contents (TOC) byte followed by one or more audio frames. When encapsulated inside an Ogg stream, the raw bytes of this libopus packet are stored directly as the payload of a single Ogg packet. There is no additional padding, stuffing, or codec-specific framing added to the audio payload itself.
Ogg Page Framing and Lacing Values
The Ogg container manages packet boundaries using a physical framing layer called “Ogg pages.” Because libopus packets are variable in size, Ogg uses a “lacing values” system in the page header to define where one packet ends and the next begins.
- Lacing Values: Each Ogg page header contains a segment table. Each byte in this table (a lacing value) represents the length of a data segment (up to 255 bytes).
- Packet Reassembly: A libopus packet larger than 255 bytes is split across multiple 255-byte segments. A lacing value of 255 indicates that the packet continues into the next segment. A lacing value of less than 255 terminates the packet.
- Multi-packet Pages: A single Ogg page can contain multiple libopus packets, as long as the total number of segments on the page does not exceed 255.
Mandatory Header Packets
Before any libopus audio packets can be transmitted in an Ogg stream, the Ogg encapsulation specification (RFC 7845) requires exactly two metadata packets to initialize the decoder. These are mapped as the first two packets of the logical stream.
1. The Identification Header (ID Header)
The very first packet in the Ogg logical stream must be the ID
Header. It contains essential setup parameters for the libopus decoder:
* Magic Signature: The 8-octet string
OpusHead. * Version: The specification
version. * Channel Count: Number of audio channels. *
Pre-skip: The number of samples to restrict from
playback at the start of the stream (used to discard encoder delay). *
Original Sample Rate: Information on the source audio
rate. * Output Gain and Channel Map: Volume adjustment
parameters and mapping of channels to physical speakers.
2. The Comment Header
The second packet in the stream must be the Comment Header. It maps
directly to Ogg metadata fields: * Magic Signature: The
8-octet string OpusTags. * Vendor String:
Information about the encoder library used (e.g.,
libopus 1.3). * User Comments: Tag-value
pairs (e.g., TITLE=Example Track) formatted in UTF-8.
Granule Position and Timing Mapping
Every Ogg page header contains a field called the granulepos (granule position). In an Ogg Opus stream, the granulepos is used to map packets to absolute timeline positions.
The granulepos value represents the absolute page end-time measured in PCM audio samples at a fixed rate of 48,000 Hz (48 kHz), regardless of the original sampling rate of the input audio. The granulepos of an Ogg page is set to the playback index of the last sample of the last packet completed in that page. This allows media players to seek to precise timestamps within the encapsulated libopus stream.