Libopus Maximum Audio Packet Duration Limits

This article provides an overview of the maximum duration limits that the libopus library enforces on a single encoded audio packet. It explains the technical constraints dictated by the Opus specification, how frame sizes combine to reach this limit, and the underlying reasons for these design choices in real-time communication.

The Absolute Maximum Duration: 120 ms

The libopus library, which implements the IETF Opus Audio Codec (RFC 6716), strictly limits the maximum duration of a single encoded audio packet to 120 milliseconds (ms). Any attempt to encode a packet representing more than 120 ms of audio in a single payload will violate the Opus specification.

How the 120 ms Limit is Structured

An Opus packet is composed of a Table of Contents (TOC) header followed by one or more audio frames. The codec allows for several specific frame durations:

2.5 ms
5 ms
10 ms
20 ms
40 ms
60 ms

To achieve the maximum packet duration of 120 ms, libopus packages multiple frames of the same duration into a single packet. The packet configurations are constrained by the following rules:

Frame Count Limits: A single packet can contain a maximum of 48 frames (if using 2.5 ms frames, which equals exactly 120 ms).
Common Configurations: The 120 ms limit is typically reached by combining:
- Two 60 ms frames
- Three 40 ms frames
- Six 20 ms frames
- Twelve 10 ms frames
- Twenty-four 5 ms frames
- Forty-eight 2.5 ms frames

Packets containing more than 120 ms of audio cannot be represented because the TOC byte in the Opus bitstream header cannot encode a frame count and size combination that exceeds this threshold.

Minimum Packet Duration

While the maximum limit is 120 ms, the minimum duration of an Opus packet is 2.5 ms. This ultra-low duration is designed for applications requiring minimal algorithmic delay, such as live musical performances or high-speed gaming communication.

Why Libopus Imposes the 120 ms Limit

The 120 ms limitation is a deliberate design choice aimed at balancing compression efficiency, network reliability, and latency:

Packet Loss Impact: In IP networks, audio is transmitted in packets. If a 120 ms packet is lost, it results in a highly noticeable drop in audio quality (a tenth of a second of silence). Larger packets would make packet loss concealment (PLC) significantly harder and more disruptive to the listener.
Latency: Larger packets require the encoder to buffer more audio before sending, which increases latency. A 120 ms packet introduces at least 120 ms of algorithmic delay, which is already at the upper limit of what is acceptable for interactive, two-way communication.
Memory and Complexity: Restricting the maximum packet size limits the buffer sizes required by decoders, ensuring Opus can run efficiently on low-power embedded devices.