Supported Libopus Audio Frame Sizes in Milliseconds

This article provides a direct overview of the exact audio frame sizes, measured in milliseconds, supported by the libopus library. It outlines the specific frame durations available in the Opus codec, how they affect audio transmission, and the trade-offs between latency and bandwidth efficiency.

The libopus library, which implements the IETF Opus Audio Codec (RFC 6716), supports a specific set of audio frame sizes. These frame sizes dictate the duration of audio digitized and packaged into a single payload.

Exact Supported Frame Sizes

The native audio frame sizes supported by libopus are:

2.5 ms
5 ms
10 ms
20 ms
40 ms
60 ms

Maximum Packet Duration

While individual frames are restricted to the sizes listed above, libopus can combine multiple frames of the same size into a single packet (using Opus “Packet 3” configurations). The maximum total duration of a single encoded Opus packet is 120 ms.

This means a single packet can contain: * Two 60 ms frames (120 ms) * Three 40 ms frames (120 ms) * Six 20 ms frames (120 ms) * Twelve 10 ms frames (120 ms)

Frame Size Trade-offs

Choosing the right frame size involves balancing latency and overhead:

Lower Frame Sizes (2.5 ms, 5 ms, 10 ms): These sizes are ideal for ultra-low latency applications like real-time interactive communication, musical performances, or gaming. However, smaller frames increase packet overhead because more IP/UDP/RTP headers must be sent per second of audio.
Standard Frame Size (20 ms): This is the default and most commonly used frame size for VoIP and WebRTC. It offers the best balance between low latency (which is barely perceptible to humans at this threshold) and compression efficiency.
Higher Frame Sizes (40 ms, 60 ms): These sizes are best for high-efficiency streaming or broadcasting where latency is not a critical factor. Larger frames reduce packet overhead, allowing for better audio quality at lower bitrates.