Supported Libopus Audio Frame Sizes in Milliseconds
This article provides a direct overview of the exact audio frame
sizes, measured in milliseconds, supported by the libopus
library. It outlines the specific frame durations available in the Opus
codec, how they affect audio transmission, and the trade-offs between
latency and bandwidth efficiency.
The libopus library, which implements the IETF Opus
Audio Codec (RFC 6716), supports a specific set of audio frame sizes.
These frame sizes dictate the duration of audio digitized and packaged
into a single payload.
Exact Supported Frame Sizes
The native audio frame sizes supported by libopus
are:
- 2.5 ms
- 5 ms
- 10 ms
- 20 ms
- 40 ms
- 60 ms
Maximum Packet Duration
While individual frames are restricted to the sizes listed above,
libopus can combine multiple frames of the same size into a
single packet (using Opus “Packet 3” configurations). The maximum total
duration of a single encoded Opus packet is 120 ms.
This means a single packet can contain: * Two 60 ms frames (120 ms) * Three 40 ms frames (120 ms) * Six 20 ms frames (120 ms) * Twelve 10 ms frames (120 ms)
Frame Size Trade-offs
Choosing the right frame size involves balancing latency and overhead:
- Lower Frame Sizes (2.5 ms, 5 ms, 10 ms): These sizes are ideal for ultra-low latency applications like real-time interactive communication, musical performances, or gaming. However, smaller frames increase packet overhead because more IP/UDP/RTP headers must be sent per second of audio.
- Standard Frame Size (20 ms): This is the default and most commonly used frame size for VoIP and WebRTC. It offers the best balance between low latency (which is barely perceptible to humans at this threshold) and compression efficiency.
- Higher Frame Sizes (40 ms, 60 ms): These sizes are best for high-efficiency streaming or broadcasting where latency is not a critical factor. Larger frames reduce packet overhead, allowing for better audio quality at lower bitrates.