Libopus Multistream Channel Mapping Matrix Math

This article explains the mathematical purpose and mechanics of the custom channel mapping matrix within the libopus multistream API. It details how this matrix functions as a linear transformation to project audio signals between the input channel space and the encoded stream space. By understanding this mathematical relationship, developers can efficiently compress, downmix, and reconstruct arbitrary multi-channel and spatial audio layouts.

The Linear Algebra of Channel Mapping

At its mathematical core, the custom channel mapping matrix (specifically used in libopus mapping family 255) performs a linear transformation that maps a vector of input audio channels to a vector of encoded stream channels, and vice versa.

In a standard multi-channel setup, we represent the input audio at any given discrete time sample as an \(N\)-dimensional vector:

\[\mathbf{x} = [x_1, x_2, \dots, x_N]^T\]

where \(N\) is the number of input channels (e.g., 6 for 5.1 surround sound).

To compress this audio efficiently, libopus groups the audio into \(M\) encoded streams (where \(M \le N\)). Some of these streams may be coupled (stereo) and others uncoupled (mono). Let \(K\) be the total number of coded channels across all streams. The mapping matrix acts as the operator that transforms the input vector \(\mathbf{x}\) into the stream vector \(\mathbf{s}\):

\[\mathbf{s} = \mathbf{M}_{\text{enc}} \mathbf{x}\]

where: * \(\mathbf{s}\) is a \(K \times 1\) vector of the channels to be encoded. * \(\mathbf{M}_{\text{enc}}\) is a \(K \times N\) encoding matrix (or downmix matrix) containing gain coefficients. * Each element \(m_{i,j}\) in \(\mathbf{M}_{\text{enc}}\) dictates the contribution of input channel \(j\) to encoded stream channel \(i\).

Reconstructing the Output (The Decoding Matrix)

Upon decoding, the receiver decodes the stream vector \(\mathbf{s}\) to obtain \(\mathbf{s}'\) (which contains the original signal plus some quantization noise). To render this back to the physical speaker channels, a second linear transformation is applied using a decoding matrix (or upmix matrix) \(\mathbf{M}_{\text{dec}}\):

\[\mathbf{y} = \mathbf{M}_{\text{dec}} \mathbf{s}'\]

where: * \(\mathbf{y}\) is the \(P \times 1\) reconstructed output channel vector (where \(P\) is the number of playback speakers). * \(\mathbf{M}_{\text{dec}}\) is a \(P \times K\) decoding matrix.

Mathematical Purposes of the Matrices

The custom mapping matrix serves three primary mathematical purposes in spatial audio processing:

1. Dimensionality Reduction and Redundancy Elimination

By projecting a high-dimensional input space \(N\) into a lower-dimensional stream space \(K\), the matrix discards redundant spatial information. For example, in Ambisonics (often encoded using mapping family 2 or 3), the matrix projects physical microphone signals into spherical harmonic coefficients (B-format), isolating the directional components mathematically.

2. Energy Preservation and Normalization

To prevent digital clipping and maintain consistent loudness, the mapping matrices must be normalized. This is mathematically achieved by ensuring the rows or columns of the matrices satisfy specific norm conditions. For example, to preserve total signal energy during a downmix, the sum of the squares of the coefficients for any given input channel across the mixed streams is often constrained to \(1\):

\[\sum_{i=1}^{K} m_{i,j}^2 = 1\]

3. Spatial Decoding and Pseudoinverse Relationships

Ideally, the decoding matrix \(\mathbf{M}_{\text{dec}}\) is designed as the mathematical reconstruction operator of the encoding matrix \(\mathbf{M}_{\text{enc}}\). In cases where a direct inverse is not possible because the matrix is non-square (\(K < N\)), the system relies on the Moore-Penrose pseudoinverse \(\mathbf{M}_{\text{enc}}^+\) to find the least-squares optimal reconstruction of the original sound field:

\[\mathbf{M}_{\text{dec}} \approx \mathbf{M}_{\text{enc}}^+ = \mathbf{M}_{\text{enc}}^T (\mathbf{M}_{\text{enc}} \mathbf{M}_{\text{enc}}^T)^{-1}\]

In the libopus API, defining a custom mapping matrix allows developers to bypass predefined channel layouts (like stereo or 5.1) and inject these custom projection coefficients directly into the encoder and decoder pipelines. This enables precise mathematical control over how sound fields are compressed and rendered across arbitrary speaker arrays.