Nothing Special   »   [go: up one dir, main page]

CN118946930A - Parameterized spatial audio coding - Google Patents

Parameterized spatial audio coding Download PDF

Info

Publication number
CN118946930A
CN118946930A CN202280093921.XA CN202280093921A CN118946930A CN 118946930 A CN118946930 A CN 118946930A CN 202280093921 A CN202280093921 A CN 202280093921A CN 118946930 A CN118946930 A CN 118946930A
Authority
CN
China
Prior art keywords
resolution
entropy encoding
value
direction value
encode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280093921.XA
Other languages
Chinese (zh)
Inventor
A·瓦西拉凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of CN118946930A publication Critical patent/CN118946930A/en
Pending legal-status Critical Current

Links

Abstract

A device for the treatment of a patient with a disorder, including for execution to the following operative components: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.

Description

Parameterized spatial audio coding
Technical Field
The present application relates to apparatus and methods for spatial audio representation and coding, but is not limited to apparatus and methods for audio representation of an audio encoder.
Background
Immersive audio codecs are being implemented to support a large number of operating points ranging from low bit rate operation to transparency. Examples of such codecs are Immersive Voice and Audio Services (IVAS) codecs designed to be suitable for use on communication networks such as 3gpp 4G/5G networks, including use in immersive services such as, for example, immersive voice and audio for Virtual Reality (VR). The audio codec is intended to handle the encoding, decoding and rendering of speech, music and general audio. It is also contemplated to support channel-based audio and scene-based audio inputs, including spatial information about sound fields and sound sources. It is also contemplated that the codec operates with low latency to enable session services, as well as supporting high error robustness under various transmission conditions.
Metadata Assisted Spatial Audio (MASA) is one input format proposed for IVAS. It uses the audio signal and the corresponding spatial metadata. The spatial metadata comprises parameters defining spatial aspects of the audio signal and may contain, for example, direction and direct-to-total energy ratio (direct-to-total energy ratio) in the frequency band. The MASA stream may be obtained, for example, by capturing spatial audio with a microphone of a suitable capture device. For example, a mobile device including a plurality of microphones may be configured to capture microphone signals, wherein a set of spatial metadata may be estimated based on the captured microphone signals. The MASA stream may also be obtained from other sources such as a specific spatial audio microphone (such as panoramic surround sound (Ambisonics)), studio mix (e.g., 5.1 audio channel mix), or other content by suitable format conversion.
Disclosure of Invention
According to a first aspect, there is provided an apparatus comprising means for: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
The means for first resolution entropy encoding the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding may be for: at least one resolution entropy encoding at least one value determined from a difference of the at least one direction value compared to an average direction value for the frame and determining a number of bits used to encode the at least one value, and means for first resolution entropy encoding the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding may be for: at least one reduced value from a reduced difference based on the difference of the at least one direction value compared to the average direction value for the frame is at least one resolution entropy encoded and a number of bits used to encode the at least one reduced value is determined.
The means for encoding at least one direction value for at least one sub-frame of each sub-band of the frame based on at least one resolution entropy encoding may be further for: performing a second resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the second resolution entropy encoding, wherein when the frame includes more than one time-frequency tile within a sub-band, the second resolution entropy encoding may be a lower resolution encoding than the first resolution entropy encoding and exploiting similarities between the time-frequency tiles within the sub-band within the frame; and selecting the second resolution entropy encoding of the at least one value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The above components may further be used to: the second resolution entropy encoding of the at least one value based on the at least one direction value is selected when the number of bits used to encode the at least one value based on the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one direction value but less than the determined relaxed number of bits. The relaxed number of bits may be a number of bits relative to a fraction of the allowed number of bits for encoding the at least one direction value.
The means for encoding at least one direction value for at least one sub-frame of each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding may be further configured to: performing a third resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the third resolution entropy encoding, wherein a quantization resolution of the third resolution is lower than the first resolution entropy encoding and the second resolution entropy encoding; and selecting a third resolution entropy encoding of the at least one value based on the at least one direction value when the number of bits used to encode the at least one value based on the first resolution entropy encoding or the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The above components may further be used to: at least one energy ratio value for at least one sub-frame of each sub-band of a frame of an audio signal is encoded.
The means for encoding at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal may be for: generating a weighted average of the at least one energy ratio value; and encoding a weighted average of the at least one energy ratio value.
The means for encoding a weighted average of the at least one energy ratio value may be further for: scalar non-uniform quantization is performed on at least one weighted average of at least one energy ratio value.
The at least one entropy encoding may be a Golomb Rice encoding.
The above components are further used for: the encoded at least one direction value is stored and/or transmitted.
According to a second aspect, there is provided a method comprising: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
The first resolution entropy encoding of the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding may include: at least one resolution entropy encoding at least one value determined from a difference of the at least one direction value compared to an average direction value for the frame and determining a number of bits used to encode the at least one value, and means for first resolution entropy encoding the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding may be for: at least one reduced value from a reduced difference based on the difference of the at least one direction value compared to the average direction value for the frame is at least one resolution entropy encoded and a number of bits used to encode the at least one reduced value is determined.
Encoding at least one direction value of at least one subframe for each subband of the frame based on the at least one resolution entropy encoding may comprise: performing a second resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the second resolution entropy encoding, wherein when the frame includes more than one time-frequency tile within a sub-band, the second resolution entropy encoding may be a lower resolution encoding than the first resolution entropy encoding and exploiting similarities between the time-frequency tiles within the sub-band within the frame; and selecting the second resolution entropy encoding of the at least one value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The method may further comprise: the second resolution entropy encoding of the at least one value based on the at least one direction value is selected when the number of bits used to encode the at least one value based on the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one direction value but less than the determined relaxed number of bits.
The relaxed number of bits may be a number of bits relative to a fraction of the allowed number of bits for encoding the at least one direction value.
Encoding at least one direction value of at least one subframe for each subband of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding may further comprise: performing a third resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the third resolution entropy encoding, wherein a quantization resolution of the third resolution is lower than the first resolution entropy encoding and the second resolution entropy encoding; and selecting a third resolution entropy encoding of the at least one value based on the at least one direction value when the number of bits used to encode the at least one value based on the first resolution entropy encoding or the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The method may further comprise: at least one energy ratio value for at least one sub-frame of each sub-band of a frame of an audio signal is encoded.
Encoding at least one energy ratio value for at least one sub-frame of each sub-band of a frame of an audio signal may comprise: generating a weighted average of the at least one energy ratio value; and encoding a weighted average of the at least one energy ratio value.
Encoding the weighted average of the at least one energy ratio value may further comprise: scalar non-uniform quantization is performed on at least one weighted average of at least one energy ratio value.
The at least one entropy encoding may be a Golomb Rice encoding.
The method may further comprise: the encoded at least one direction value is stored and/or transmitted.
According to a third aspect, there is provided an apparatus comprising: at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
The apparatus being caused to first-resolution entropy encode the at least one direction value and determine a number of bits used to encode the at least one direction value based on the first entropy encoding may be caused to: at least one resolution entropy encoding at least one value determined from a difference of the at least one direction value compared to an average direction value for the frame and determining a number of bits used to encode the at least one value, and means for first resolution entropy encoding the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding may be for: at least one reduced value from a reduced difference based on the difference of the at least one direction value compared to the average direction value for the frame is at least one resolution entropy encoded and a number of bits used to encode the at least one reduced value is determined.
The apparatus being caused to encode at least one direction value for at least one subframe of each subband of a frame based on at least one resolution entropy encoding may be caused to: performing a second resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the second resolution entropy encoding, wherein when the frame includes more than one time-frequency tile within a sub-band, the second resolution entropy encoding may be a lower resolution encoding than the first resolution entropy encoding and exploiting similarities between the time-frequency tiles within the sub-band within the frame; and selecting the second resolution entropy encoding of the at least one value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The apparatus may be further caused to: the second resolution entropy encoding of the at least one value based on the at least one direction value is selected when the number of bits used to encode the at least one value based on the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one direction value but less than the determined relaxed number of bits.
The relaxed number of bits may be a number of bits relative to a fraction of the allowed number of bits for encoding the at least one direction value.
The apparatus may be caused to: encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding may be such that: performing a third resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the third resolution entropy encoding, wherein a quantization resolution of the third resolution is lower than the first resolution entropy encoding and the second resolution entropy encoding; and selecting a third resolution entropy encoding of the at least one value based on the at least one direction value when the number of bits used to encode the at least one value based on the first resolution entropy encoding or the second resolution entropy encoding is greater than a fraction of the allowed number of bits used to encode the at least one value.
The at least one value based on the at least one direction value may be at least one difference value from the at least one direction value compared to the average direction value for the frame.
The apparatus may be further caused to: at least one energy ratio value for at least one sub-frame of each sub-band of a frame of an audio signal is encoded.
The apparatus caused to encode at least one energy ratio value for at least one sub-frame of each sub-band of a frame of an audio signal may be caused to: generating a weighted average of the at least one energy ratio value; and encoding a weighted average of the at least one energy ratio value.
The apparatus being caused to encode a weighted average of at least one energy ratio value may be further caused to: scalar non-uniform quantization is performed on at least one weighted average of at least one energy ratio value.
The at least one entropy encoding may be a Golomb Rice encoding.
The apparatus may be further caused to: the encoded at least one direction value is stored and/or transmitted.
According to a fourth aspect, there is provided an apparatus comprising: an obtaining circuit configured to obtain values of parameters representative of an audio signal, the values comprising at least one direction value and at least one energy ratio value for at least one subframe of each subband of a frame of the audio signal; an obtaining circuit configured to obtain an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; an encoding circuit configured to encode at least one direction value of at least one subframe for each subband of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
According to a fifth aspect, there is provided a computer program [ or a computer readable medium comprising program instructions ] comprising instructions for causing an apparatus to at least: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
According to a sixth aspect, there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to at least: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
According to a seventh aspect, there is provided an apparatus comprising: means for obtaining a value representing a parameter of an audio signal, wherein the value comprises at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; means for obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; means for encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
According to an eighth aspect, there is provided a computer readable medium comprising program instructions for causing an apparatus to at least: obtaining a value representing a parameter of an audio signal, the value comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; obtaining an allowable number of bits for encoding at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal; encoding at least one direction value of at least one sub-frame for each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to: performing a first resolution entropy encoding on the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding; performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding; selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a fraction of the allowed number of bits used to encode the at least one direction value; and selecting the first resolution entropy encoding of the at least one reduced direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than a portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the methods described herein.
An electronic device may comprise an apparatus as described herein.
A chipset may comprise an apparatus as described herein.
Embodiments of the present application aim to address the problems associated with the prior art.
Drawings
For a better understanding of the present application, reference will now be made, by way of example, to the accompanying drawings in which:
FIG. 1 schematically illustrates a device system suitable for implementing some embodiments;
FIG. 2 schematically illustrates a decoder shown in a system of devices as shown in FIG. 1, in accordance with some embodiments;
FIG. 3 illustrates a flow chart of the operation of an example decoder, as shown in FIG. 2, in accordance with some embodiments;
FIG. 4 schematically illustrates an example composition processor as shown in FIG. 2, in accordance with some embodiments;
FIG. 5 illustrates a flowchart of the operation of an example composition processor as shown in FIG. 4, in accordance with some embodiments; and
Fig. 6 shows an example apparatus suitable for implementing the apparatus shown in the previous figures.
Detailed Description
Suitable means and possible mechanisms for encoding a parameterized spatial audio stream comprising a transmitted audio signal and spatial metadata are described in more detail below.
As described above, metadata Assisted Spatial Audio (MASA) is an example of a parameterized spatial audio format and representation suitable as an input format for IVAS.
It can be regarded as an audio representation consisting of "N channels + spatial metadata". It is a scene-based audio format that is particularly suited for spatial audio capture on a utility device such as a smart phone. The idea is to describe the sound scene in terms of time-frequency varying sound source direction and e.g. energy ratio. Acoustic energy that is not defined (described) by direction is described as diffuse (from all directions).
As discussed above, the spatial metadata associated with the audio signal may include a plurality of parameters per time-frequency block, such as a plurality of directions and a direct-to-total ratio (direct-to-total ratio) associated with each direction (or direction value), extended coherence (distance), distance, etc. The spatial metadata may also include other parameters, or may be associated with other parameters that are considered non-directional, such as surrounding coherence (surround coherence), diffuse-to-total energy ratio (diffuse-to-total energy ratio), residual-to-total energy ratio (remainder-to-total energy ratio), but when combined with the directional parameters can be used to define characteristics of the audio scene. For example, a rational design choice to be able to produce good quality output is to determine that the spatial metadata includes one or more directions for each time-frequency subframe (and the direct-to-total ratio associated with each direction, extended coherence, distance value, etc.).
As described above, parameterized spatial metadata represents that multiple concurrent spatial directions may be used. With MASA, the proposed maximum number of concurrency directions is two. For each concurrency direction, there may be associated parameters such as: a direction index; direct to total ratio; expanding coherence; and distance. In some embodiments, other parameters are defined, such as the diffusion to total energy ratio; surrounding coherence; and the remaining to total energy ratio.
At very low bit rates (e.g., about 13.2-16.4 kbps), the number of bits available to encode metadata is very small. For example, only about 3kbps may be used for encoding of metadata to obtain a sufficient bit rate for an audio signal codec.
In order to have sufficient frequency and time resolution (e.g. with 5 frequency bands and with a time resolution of 20 milliseconds), in many cases only a few bits per value (e.g. direction parameter) can be used. In practice this means that the quantization step size is relatively large. Thus, for example, for a certain time-frequency tile, the quantization points are at 0, ±45, ±90, ±135, and 180 degrees azimuth.
Dynamic resolution (or dynamic quantization resolution) may be implemented in an attempt to improve the resulting encoder output. For example, as described in GB1811071.8, an entropy encoder is implemented in which the angular resolution is provided by the energy ratio of each subband. If the resulting number of bits is higher than the maximum allowed number of bits, the quantization resolution is reduced and an entropy encoder such as described in EP3861548 is used. However, for some frames, the quantization resolution reduction may be too high, because the directional resolution of human hearing is about 1-2 degrees in the azimuthal direction, and any azimuthal jump from, for example, 0 degrees to 45 degrees is easily perceived and the audio quality is significantly reduced, making reproduction unnatural.
The concepts as discussed in the embodiments herein attempt to offset the loss of angular resolution. In some embodiments, the allowed maximum number of bits limit is relaxed. Furthermore, in some embodiments, a check is made whether a slightly less accurate quantization of the angle can be achieved within the entropy encoder by implementing a pseudo-embedded bitstream. In some embodiments, quantization is further modified in cases where the input spatial metadata has only 1 subframe subband.
Embodiments will be described with reference to an example capture (or encoder/analyzer) and play (or decoder/synthesizer) device or system 100 as shown in fig. 1. In the following examples, the audio signal input is an audio signal input from a microphone array, but it will be appreciated that the audio input may be in any suitable audio input format, and the following description will describe in detail, wherein differences in processing may occur when different input formats are used.
The system 100 is shown with a capture portion and a play (decoder/synthesizer) portion.
In some embodiments, the capture portion includes a microphone array audio signal input 102. The input audio signal may be from any suitable source, such as: two or more microphones, other microphone arrays, e.g., B-format microphones or EIGENMIKE, mounted on the mobile phone. In some embodiments, as described above, the input may be any suitable audio signal input, such as an Ambisonic signal, for example, first Order Ambisonics (FOA), higher Order Ambisonics (HOA), or speaker surround mix and/or an object.
The microphone array audio signal input 102 may be provided to a microphone array front end 103. In some embodiments, the microphone array front end is configured to implement an analysis processor function configured to generate or determine suitable (spatial) metadata associated with the audio signal, and to implement a suitable transmission signal generator function to generate the transmission audio signal.
Thus, the analysis processor function is configured to perform a spatial analysis on the input audio signal, thereby generating suitable spatial metadata 106 in the frequency band. For all the above input types, there are known methods to generate suitable spatial metadata in the frequency band, e.g. direction and direct-to-total energy ratio (or similar parameters such as diffuseness (diffuseness), i.e. ambient-to-total ratio). These methods are not described in detail herein, however, some examples may include performing a suitable time-frequency transformation on the input signal, thereby estimating in the frequency band a delay value between a microphone pair that maximizes inter-microphone correlation when the input is a mobile phone microphone array, formulating a corresponding direction value for the delay (as described in GB patent application number 1619573.7 and PCT patent application number PCT/FI 2017/050778), and formulating a ratio parameter based on the correlation value. The direct-to-total energy ratio parameter for the multichannel acquisition microphone array signal may be estimated based on a normalized cross-correlation parameter cor' (k, n) between the microphone pairs of band k, the cross-correlation parameter having a value between-1 and 1. The direct-to-total energy ratio parameter r (k, n) may be determined by comparing the normalized cross-correlation parameter with the diffusion field normalized cross-correlation parameter cor' D (k, n)The direct to total energy ratio is further described in PCT publication WO2017/005978, which is incorporated herein by reference.
The metadata may take various forms and in some embodiments includes spatial metadata and other metadata. A typical parameterization for the spatial metadata is one direction parameter in each frequency band, characterized by azimuth value phi (k, n) and elevation value theta (k, n) and an associated direct-to-total energy ratio r (k, n) in each frequency band, where k is the frequency band index and n is the time frame index.
In some embodiments, the parameters generated vary from frequency band to frequency band. Thus, for example, in band X, all parameters are generated and transmitted, whereas in band Y, only one parameter is generated and transmitted, and in band Z, no parameter is generated or transmitted. A practical example may be that for some frequency bands, such as the highest frequency band, some of the parameters are not needed for perceptual reasons.
In some embodiments, when the audio input is a FOA signal or a B-format microphone, the analysis processor function may be configured to determine parameters such as intensity vectors, obtain direction parameters based on the intensity vector parameters, and compare the intensity vector length to the total sound field energy estimate to determine ratio parameters. This method is referred to in the literature as directional audio coding (DirAC).
In some embodiments, when the input is a HOA signal, the analysis processor function may take a FOA subset of the signal and use the above method, or divide the HOA signal into multiple parts/sectors (sectors), with the above method used in each part/sector. This part/sector based approach is referred to in the literature as higher order DirAC (HO-DirAC). In this case, there is more than one simultaneous direction parameter per band.
In some embodiments, when the input format is speaker surround mix and/or object, the analysis processor function may be configured to convert the signal to a FOA signal (by using spherical harmonic coding gain) and analyze the above-described direction and ratio parameters.
Thus, the output of the analysis processor function is the (spatial) metadata 106 determined in the frequency band. The (spatial) metadata 106 may relate to direction and energy ratio in the frequency band, but may also have any of the metadata types listed above. The (spatial) metadata 106 may vary with time and frequency.
In some embodiments, the analysis functionality is implemented external to the system 100. For example, in some embodiments, spatial metadata associated with the input audio signal may be provided to encoder 107 as a separate bitstream. In some embodiments, the spatial metadata may be provided as a set of spatial (direction) index values.
As described above, the microphone array front end 103 is further configured to implement a transmission signal generator function in order to generate the appropriate transmission audio signal 104. The transmission signal generator function 113 is configured to receive an input audio signal, which may be, for example, a microphone array audio signal 103, and to generate a transmission audio signal 104. The transmission audio signal may be a multi-channel, stereo, binaural or mono audio signal. The generation of the transmitted audio signal 104 may be accomplished using any suitable method, as outlined below.
When the input is a microphone array audio signal, the transmit signal generator function may select left and right microphone pairs and apply appropriate processing to the signal such as automatic gain control, microphone noise removal, wind noise removal, and equalization.
When the input is a FOA/HOA signal or a B format microphone, the transmission signal 104 may be a directional beam signal, such as two opposite heart shaped signals, that are directed in the left and right directions.
When the input is a speaker surround mix and/or object, the transmission signal 104 may be a downmix signal that combines the left side channels to the left side downmix channel and vice versa, and adds the center channel to both transmission channels with a suitable gain.
In some embodiments, the transmission signal 104 is an input audio signal, for example, a microphone array audio signal. For example, in some cases, analysis and synthesis occur at the same device at a single processing step without intermediate encoding. The number of transmission channels may also be any suitable number (instead of one or two as discussed in the examples).
In some embodiments, the capture portion may include an encoder 107. The encoder 107 may be configured to receive the transmitted audio signal 104 and the spatial metadata 106. The encoder 107 may also be configured to generate a bitstream 108 comprising metadata information in encoded or compressed form and the transmitted audio signal.
For example, the encoder 107 may be implemented as an IVAS encoder or any other suitable encoder. In such an embodiment, the encoder 107 is configured to encode the audio signal and metadata and form an IVAS bitstream.
Further, the bit stream 108 may be transmitted/stored as shown by the dashed line.
The system 100 may also include a decoder 109 portion. The decoder 109 is configured to receive, retrieve or otherwise obtain the bitstream 108 and generate a suitable spatial audio signal 110 from the bitstream for presentation to a listener/listener playback device.
Accordingly, the decoder 109 is configured to receive the bit stream 108 and to de-multiplex the encoded stream, thereby decoding the audio signal to obtain the transmission signal and the metadata.
The decoder 109 may also be configured to generate a spatial audio signal output 110 from the transmission audio signal and the spatial metadata, e.g. a binaural audio signal that may be reproduced by headphones.
Referring to fig. 2, a schematic example of encoder 107 is shown in further detail.
In fig. 2, an encoder 107 is shown, wherein the transmission signal 104 is input to a transmission signal encoder 201. The transmission signal encoder 201 may be any suitable audio signal encoder. For example, an Enhanced Voice Service (EVS) or an Immersive Voice and Audio Service (IVAS) stereo core encoder implementation may be applied to the transmission (audio) signal to generate a suitable encoded transmission audio signal 204, which may be passed to the bitstream generator 207 or output as a separate bitstream to the spatial metadata parameters.
In some embodiments, encoder 107 is configured to receive spatial metadata 106 or spatial parameters and pass these to parameter quantizer 203. For example, the determined direction parameters (azimuth and elevation or other coordinate systems) may be quantized by the parameter quantizer 203, and an index identifying the quantized values is passed to the quantization parameter entropy encoder 205.
In some embodiments, the encoder 107 further comprises a quantization parameter entropy encoder 205 configured to obtain or receive quantization parameters and encode them to generate encoded spatial metadata 202 that may be passed to a bitstream generator 207.
In some embodiments, the encoder 107 further comprises a bitstream generator 207 configured to obtain or receive the encoded transmission audio signal 204 and the encoded spatial metadata 202 (including the extension and surround coherence parameters) and to generate the bitstream 108 or a separate bitstream.
In the following example, the encoder 107 is configured to encode spatial audio parameters (MASA), in other words, spatial metadata 106. For example, the direction values (azimuth and elevation values phi (k, n) and theta (k, n)) may first be quantized according to a spherical quantization scheme. Such a solution can be found in patent publication EP 3707706. As described above, each type of spatial audio parameter is first quantized to obtain quantization indexes.
Further, the resulting quantization indices for spatial audio parameters (e.g., MASA parameters) may be entropy encoded at different encoding rates in response to factors specifying the number of bits allocated for the task. In addition, the codec may use a plurality of different coding rates and apply them to coding of indexes of spatial audio parameters.
Thus, examples that may be returned below are that the audio metadata includes azimuth, elevation, and energy ratio data for each sub-band. The audio metadata may also include extended and surrounding coherence, but is encoded first, and then the remaining number of available bits is calculated by subtracting the coherence bits from the total number of bits.
In the MASA format, the direction data is represented in 16 bits, so that the azimuth is represented in approximately 9 bits and the elevation is represented in 7 bits. The energy ratio is expressed in 8 bits. For each frame, there are between n=5 and m=4 time blocks, so (16+8) xMxN bits are required to store uncompressed metadata for each frame. At higher frequency resolutions, there may be 20 or 24 frequency subbands.
In the following example, the encoder and decoder operate with n=5 time blocks, but in other examples the number of time blocks, the bits used, and the subbands and parameters may be different.
With respect to fig. 3, an example quantization parameter entropy encoder 205 is shown. In some embodiments, quantization parameter entropy encoder 205 includes energy ratio value encoder 301. The energy ratio value encoder 301 is configured to receive quantized energy ratio values 300 and to generate encoded energy ratio values 302. In some examples, each energy ratio value is encoded using 3 bits. In addition, instead of transmitting all energy ratio values for all TF tiles, only one weighted average per subband is transmitted. The average is calculated by taking into account the total energy per time block, thus favoring the values of the subbands with more energy.
In some embodiments, the example quantized entropy encoder includes more than one entropy encoder configured to receive the quantization direction value 302. In this example, a first direction (average/difference) entropy encoder 303, a second lower resolution entropy encoder 305, and a third lowest resolution average/difference entropy encoder 307 are shown. The first direction (average/difference) entropy encoder 303 (EC 1), the second lower resolution entropy encoder 305 (EC 2), and the third lowest resolution entropy encoder 307 (EC 3) are configured to receive the quantization direction values and generate encoded values that are passed to the encoding selector 309.
In some embodiments, the example quantized entropy encoder includes an encoding selector 309 that receives the outputs of the first, second, and third entropy encoders and selects one of these as the encoding direction value 310 output. In some embodiments, the selection may be based on the number of bits generated by each encoder and the number of bits allowed, such as described in the following encoding of index values for the direction parameters of all TF tiles in a frame.
In the arrangement shown in fig. 3, the first, second and third entropy encoders are linked or connected in series such that, as described in further detail, the first entropy encoder 303 is operated first, and when the first entropy encoder 303 is unable to encode parameters in an acceptable manner, then the second entropy encoder 305 is operated or activated. Similarly, when the second entropy encoder 305 cannot encode the parameters in an acceptable manner, then the third entropy encoder 307 is operated or activated. Further, the code selector is configured to select the output of the last received encoder or the output of an encoder selected based on a sequence such as the third encoder/the second encoder/the first encoder.
However, in some embodiments, all three encoders operate in parallel or substantially in parallel, and one of the three encoder outputs is selected by the encoding selector based on which output is acceptable. In such an embodiment, the first encoder output is checked, if acceptable, the second encoder output is checked, otherwise the third encoder output is checked, if acceptable.
For example, the encoder selector or encoder operation may implement or perform the encoding selection based on the following pseudo code.
Input: index of quantized direction parameters (azimuth and elevation) and number of allowed bits B allowed
In the above, the first direction (average/difference) entropy encoder 303 (EC 1) corresponds to a first entropy encoding scheme in which azimuth and elevation indexes may be encoded separately. In some embodiments, the scheme uses optimized fixed value average indices that are subtracted from each index to yield a difference index for each direction index. Further, each of the resulting difference indexes may be converted into a positive value and then entropy-encoded using a Golomb-Rice scheme. The optimized average index may also be entropy encoded for transmission to a decoder. Furthermore, in some embodiments, the directional entropy encoder is further based on a time-averaged directional value and a difference from the time-averaged directional value. For example, the difference index value is the difference between the current azimuth or elevation value and the azimuth or elevation value averaged over the previous frame or subframe, or in some embodiments, the difference based on the reference azimuth or elevation value. In some embodiments, the average value (the number of subtractions) is chosen such that the resulting number of bits for encoding is minimized. Thus, in some examples, the apparatus tests the value given by the average and also tests the variants around the average (e.g., average and ±1 value or more) and selects the value that yields the smallest number of encoded bits to send to the decoder as the "average".
The second entropy encoder 305 (EC 2) corresponds to a second entropy encoding scheme that encodes the difference index at a resolution less or lower than EC 1. Details of a suitable second entropy coding scheme can be found in patent publication WO 2021/048468.
The third entropy encoder 307 (EC 3) corresponds to a third entropy encoding scheme that encodes the difference index with a resolution less than EC 2. In this regard, EC3 may constitute the lowest resolution quantization scheme in the above-described general framework. Details of a scheme suitable for use can be found in patent publication EP 3861548.
As can be seen from the general framework described above, the selection of the coding rate (and coding scheme) may be determined in part by parameter B allowed, which indicates the number of bits allowed to be used to encode the directional index for the frame. B allowed may be a parameter determined by the encoding system for the overall operating point/bit rate of the encoder for a particular time frame.
As can be seen from the above, the parameter B allowed can be used to determine the entropy coding scheme by basically checking whether the number of bits required for the entropy coding scheme is smaller than the parameter B allowed. The checking procedure is performed in descending order of the number of bits required by the entropy coding scheme. The result of this checking procedure is to select the highest order entropy coding scheme (of the number of coded bits) that satisfies the B allowed constraint.
For example, if the number of bits required for the first entropy coding scheme EC1 (bits_ec1) is less than B allowed, the first entropy coding scheme is used. However, if it is determined that the bits required for EC1 are greater than constraint B allowed, the number of bits required for the second entropy coding scheme EC2 (bits_ec2) is checked against B allowed. Furthermore, in some embodiments, the second entropy encoding scheme EC2 is tested only for non-2D cases (i.e., when elevation angles on all tiles in the frame are non-zero). If the second check indicates that the bits required for EC2 are less than B allowed, then the directional index is entropy encoded for the frame using a second entropy encoding scheme EC 2. However, if the second check indicates that the bits required for EC2 are greater than (or equal to) B allowed, then a third entropy coding scheme EC3 is selected to encode the directional index. The above general framework may be extended for any number of coding rates, with each entropy coding scheme being selected according to the required number of bits (bits_ ECn) and the allowed number of bits B allowed.
With respect to fig. 4, a method or operation implemented by the first entropy encoder (EC 1) is shown, according to some embodiments. In this example, the first entropy encoder is configured to perform entropy encoding (EC 1) on the direction being encoded in a pseudo-embedded manner.
Thus, for example, as shown by step 401 in fig. 4, the average direction over all time-frequency tiles whose energy ratio is above a threshold is calculated.
Further, as shown by step 403 in fig. 4, the remaining TF tiles are jointly encoded in elevation and azimuth, with a spherical index per tile.
The average direction is then encoded by transmitting the elevation and azimuth angles, respectively. This uses the number of bits given by the maximum alphabet of elevation and azimuth angles, respectively, from the TF tile under consideration, as shown by step 405 in fig. 4.
The elevation and azimuth differences for the average are encoded separately. In these embodiments, one stream is used for the azimuth difference value and one stream is used for the elevation difference value, as shown by step 407 in fig. 4.
Further, as shown by step 409 in fig. 4, for each angle value, a difference for the average is calculated with respect to the average projected at the resolution of the corresponding tile.
The difference for the average in the index domain is then transformed into the positive domain, as shown by step 411 in fig. 4, using the following function:
Further, the positive difference index is encoded with the GR code of the determined (optimal) order. The determined (optimal) GR orders are calculated on each dataset, one GR order value for the azimuth difference index and one GR order value for the elevation difference index.
Since the GR code is longer for larger values and shorter for smaller values, this means that if for the difference index, if the encoder uses a smaller value with two units, the difference for the average will have the same sign, but be smaller and the number of bits needed for encoding will also be smaller.
Thus, in some embodiments, a reduced/decreased differential index code is generated, and in addition, a determination is made as to how many bits can be obtained by reducing/decreasing some differential indexes. This is shown by step 415 in fig. 4.
Further, as shown by step 417 in fig. 4, based on the number of bits obtained, a coding index or a reduced/reduced differential coding index is selected. For example, if the number of bits resulting from the encoding produced by the first entropy encoder (EC 1) is higher than the allowed number of bits by a value NB, and the maximum number of bits that can be obtained is higher than NB, then a reduced difference index value is selected. The second entropy encoder EC2 or the third entropy encoder EC3 method is used if neither the encoding by the first entropy encoder nor the reduced/reduced encoding can obtain the required number of bits.
The condition for checking whether the difference index can be reduced/decremented is that the difference must be greater than 0 and the angular resolution is above a given threshold. In some embodiments, the angular resolution may be given by the length of the alphabet of angular values. In an example, 20 degrees may be used as the minimum threshold for the elevation alphabet and 40 degrees may be used as the minimum threshold for the azimuth alphabet. In some embodiments, the portion corresponding to elevation angle may be applied only when the azimuth alphabet is adjusted based on the modified elevation angle value. In some implementations, no modification to elevation angle is used. However, if these values are used, the azimuth alphabet should be updated and the original values re-quantized when checking the azimuth.
An example implementation in the C language for determining the number of bits available may be as follows:
In some embodiments, the maximum bit number limit when reducing quantization resolution may be relaxed. For example, the number of bits that need to be reduced before implementing the third entropy encoder (EC 3) method is limited at most to the number of TF tiles for which encoding is completed. As a result of implementing this relaxed bit constraint, the bit consumption may exceed the maximum bits allowed for metadata for some frames. However, encoding is configured to handle this situation, as the encoder typically operates below the required bit limit, and thus on average, the number of bits used may be relaxed without exceeding the total number of bits over a reasonable period of time.
In some embodiments, when the encoder determines that there is only one subframe per subband in the input spatial audio data, then the second entropy encoder is disabled or deactivated and the second entropy encoding (EC 2) method is not considered and signaled. The second entropy encoder is disabled/deactivated because the method used in the second entropy encoder is a method that checks and exploits the similarity between TF tiles within a sub-band, but for this case there is only one tile per sub-band and therefore there will be no similarity.
These embodiments may be implemented for lower bit rates.
In some embodiments, the decoder 109 comprises a demultiplexer (not shown) configured to accept and demultiplex the bitstream to obtain an encoded transmission audio signal and encoded spatial audio parameter metadata (MASA metadata) comprising an encoding energy ratio value 302 and an encoding direction value 310.
In some embodiments, the decoder 109 further comprises a transmission audio signal decoder (not shown) configured to decode the encoded transmission audio signal, thereby producing a transmission audio signal stream that is passed to the spatial synthesizer. The decoding process performed by the transmission audio signal decoder may be a suitable audio signal decoding scheme for encoding the transmission audio signal, such as an EVS decoder when EVS encoding is used.
Fig. 5 shows in further detail a metadata decoder 509 configured to accept encoded spatial metadata (the encoded energy ratio value 302 and the encoded direction value 310) and to decode the metadata to produce decoded spatial metadata (the energy ratio value 502 and the direction value or index 504).
In some embodiments, metadata decoder 509 includes an energy ratio value decoder 501 configured to receive encoded energy ratio value 302 and determine an energy ratio value based on the value.
In addition, the metadata decoder 509 further comprises an entropy decoder 503 configured to obtain the encoding direction value 310 and to output a direction value 504.
In the above, the difference is determined with respect to the subframe or other directional values within the frame. However, in some embodiments, the difference may be determined relative to past subframes. In other words, the average may be determined within the current subframe, within the current frame, or over several time frames.
With respect to fig. 6, an example electronic device is shown that may be used as any of the apparatus portions of the system described above. The device may be any suitable electronic device or apparatus. For example, in some embodiments, the device 1400 is a mobile device, a user device, a tablet, a computer, an audio playback apparatus, or the like. The device may be configured, for example, to implement an encoder/analyzer section and/or a decoder section as shown in fig. 1, or any of the functional blocks described above.
In some embodiments, the device 1400 includes at least one processor or central processing unit 1407. The processor 1407 may be configured to execute various program code, such as the methods described herein.
In some embodiments, device 1400 includes at least one memory 1411. In some embodiments, at least one processor 1407 is coupled to memory 1411. The memory 1411 may be any suitable storage component. In some embodiments, memory 1411 includes program code portions for storing program code that may be implemented on processor 1407. Further, in some embodiments, memory 1411 may also include a portion of stored data for storing data (e.g., data that has been processed or is to be processed according to embodiments described herein). The implemented program code stored in the program code portion and the data stored in the stored data portion may be retrieved by the processor 1407 via a memory-processor coupling when needed.
In some embodiments, the device 1400 includes a user interface 1405. In some embodiments, the user interface 1405 may be coupled to the processor 1407. In some embodiments, the processor 1407 may control the operation of the user interface 1405 and receive input from the user interface 1405. In some embodiments, the user interface 1405 may enable a user to input commands to the device 1400, for example, via a keyboard. In some embodiments, the user interface 1405 may enable a user to obtain information from the device 1400. For example, the user interface 1405 may include a display configured to display information from the device 1400 to a user. In some embodiments, the user interface 1405 may include a touch screen or touch interface that enables information to be entered into the device 1400 and also displays information to a user of the device 1400. In some embodiments, the user interface 1405 may be a user interface for communications.
In some embodiments, device 1400 includes input/output ports 1409. In some embodiments, the input/output port 1409 includes a transceiver. In such embodiments, the transceiver may be coupled to the processor 1407 and configured to enable communication with other devices or electronic devices, for example, via a wireless communication network. In some embodiments. The transceiver or any suitable transceiver or transmitter and/or receiver component may be configured to communicate with other electronic devices or apparatuses via wired or wired coupling.
The transceiver may communicate with other devices via any suitable known communication protocol. For example, in some embodiments, the transceiver may use a suitable radio access architecture based on: advanced long term evolution (LTE ADVANCED, LTE-a) or New Radio (NR) (alternatively referred to as 5G), universal Mobile Telecommunications System (UMTS) radio access network (UTRAN or E-UTRAN), long term evolution (LTE, same as E-UTRA), 2G network (legacy network technology), wireless local area network (WLAN or Wi-Fi), worldwide Interoperability for Microwave Access (WiMAX), wireless access (WLAN or Wi-Fi),Personal Communication Services (PCS),Wideband Code Division Multiple Access (WCDMA), systems using Ultra Wideband (UWB) technology, sensor networks, mobile ad hoc networks (MANETs), cellular internet of things (IoT) RANs, and internet protocol multimedia subsystems (IMS), any other suitable options, and/or any combination thereof.
The transceiver input/output port 1409 may be configured to receive signals.
In some embodiments, the device 1400 may be used as at least a portion of a synthetic device. The input/output port 1409 may be coupled to headphones (which may be a head-tracking or non-tracking headphone) or the like and speakers.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well known that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of a mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Further in this regard, it should be noted that any blocks of logic flows as in the figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on a physical medium such as a memory chip or memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as a DVD and its data variants, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory, and removable memory. The data processor may be of any type suitable to the local technical environment and may include, as non-limiting examples, one or more of a general purpose computer, a special purpose computer, a microprocessor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a gate level circuit based on a multi-core processor architecture, and a processor.
Embodiments of the invention may be practiced in various components such as integrated circuit modules. The design of integrated circuits is generally a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs such as those provided by Synopsys, inc. of mountain view, california and CADENCE DESIGN of san Jose, california use sophisticated design rules and libraries of pre-stored design modules to automatically route conductors and locate components on a semiconductor chip. Once the design of the semiconductor circuit is completed, the resulting design in a standardized electronic format (e.g., opus, GDSII, or the like) may be transferred to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of exemplary embodiments of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims (20)

1. A device for the treatment of a patient with a disorder, including for execution to the following operative components:
Obtaining values representing parameters of an audio signal, the values comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal;
Obtaining an allowable number of bits for encoding the at least one direction value and the at least one energy ratio value for the at least one subframe of each subband of the frame of the audio signal;
Encoding the at least one direction value for the at least one sub-frame of each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to:
Performing a first resolution entropy encoding of the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding;
Performing a first resolution entropy encoding of at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding;
selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value; and
The first resolution entropy encoding of the at least one reduced direction value is selected when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to the portion of the allowed number of bits used to encode the at least one direction value.
2. The apparatus of claim 1, wherein,
Means for first resolution entropy encoding the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding, for: at least one resolution entropy encoding at least one value determined from the difference of the at least one direction value compared to the average direction value for the frame and determining the number of bits used to encode the at least one value, and
Means for first resolution entropy encoding the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding, for: at least one reduced value from a reduced difference based on the at least one direction value compared to the average direction value for the frame is at least one resolution entropy encoded and a number of bits used to encode the at least one reduced value is determined.
3. The apparatus of any of claims 1 or 2, wherein the means for encoding the at least one direction value for at least one subframe of each subband of a frame based on at least one resolution entropy encoding is further to:
Performing a second resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the second resolution entropy encoding, wherein the second resolution entropy encoding is a lower resolution encoding than the first resolution entropy encoding and exploits similarities between time-frequency tiles within sub-bands within the frame when the frame includes more than one time-frequency tile within a sub-band; and
Selecting the second resolution entropy encoding of the at least one value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is less than or equal to the portion of the allowed number of bits used to encode the at least one direction value.
4. The apparatus of claim 3, wherein the at least one value based on the at least one direction value is at least one difference value from the at least one direction value compared to an average direction value for the frame.
5. The apparatus of any of claims 3 or 4, wherein the means is further for: selecting the second resolution entropy encoding of the at least one value based on the at least one direction value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one direction value but less than the determined relaxed number of bits.
6. The apparatus of claim 5, wherein the relaxed number of bits is a number of bits relative to the portion of the allowed number of bits used to encode the at least one direction value.
7. The apparatus of claim 5 or 6, wherein the means is configured to encode the at least one direction value for the at least one subframe for each subband of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further configured to:
Performing a third resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the third resolution entropy encoding, wherein a quantization resolution of the third resolution is lower than the first resolution entropy encoding and the second resolution entropy encoding; and
The third resolution entropy encoding of the at least one value based on the at least one direction value is selected when the number of bits used to encode the at least one value based on the first resolution entropy encoding or the second resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one value.
8. The apparatus of claim 7, wherein the at least one value based on the at least one direction value is at least one difference value from the at least one direction value compared to an average direction value for the frame.
9. The apparatus of any of claims 1-8, wherein the means is further for: the at least one energy ratio value for the at least one sub-frame of each sub-band of the frame of the audio signal is encoded.
10. The apparatus of claim 6, wherein the means for encoding the at least one energy ratio value for the at least one subframe of each subband of a frame of the audio signal is to:
Generating a weighted average of the at least one energy ratio value; and
Encoding said weighted average of said at least one energy ratio value.
11. The apparatus of claim 10, wherein the means for encoding the weighted average of the at least one energy ratio value is further to: scalar non-uniform quantization is performed on at least one weighted average of the at least one energy ratio value.
12. The apparatus of any one of claims 1 to 11, wherein the at least one entropy encoding is Golomb Rice encoding.
13. The apparatus of any of claims 1 to 12, wherein the means is further for: the encoded at least one direction value is stored and/or transmitted.
14. A method, comprising:
Obtaining values representing parameters of an audio signal, the values comprising at least one direction value and at least one energy ratio value for at least one sub-frame of each sub-band of a frame of the audio signal;
Obtaining an allowable number of bits for encoding the at least one direction value and the at least one energy ratio value for the at least one subframe of each subband of the frame of the audio signal;
Encoding the at least one direction value for the at least one sub-frame of each sub-band of the frame based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding is further to:
Performing a first resolution entropy encoding of the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding;
Performing a first resolution entropy encoding of at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding;
selecting the first resolution entropy encoding of the at least one direction value when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to a portion of the allowed number of bits used to encode the at least one direction value; and
The first resolution entropy encoding of the at least one reduced direction value is selected when the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one direction value and the number of bits used to encode the at least one direction value based on the first resolution entropy encoding is less than or equal to the portion of the allowed number of bits used to encode the at least one direction value.
15. The method of claim 14, wherein first resolution entropy encoding the at least one direction value and determining a number of bits used to encode the at least one direction value based on the first entropy encoding comprises: at least one resolution entropy encoding at least one value determined from the difference of the at least one direction value compared to the average direction value for the frame and determining the number of bits used to encode the at least one value, and
Performing a first resolution entropy encoding of the at least one reduced direction value and determining a number of bits used to encode the at least one reduced direction value based on the first entropy encoding includes: at least one reduced value from a reduced difference based on the at least one direction value compared to the average direction value for the frame is at least one resolution entropy encoded and a number of bits used to encode the at least one reduced value is determined.
16. The method of any of claims 14 or 15, wherein encoding the at least one direction value for at least one subframe of each subband of a frame based on at least one resolution entropy encoding further comprises:
Performing a second resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the second resolution entropy encoding, wherein the second resolution entropy encoding is a lower resolution encoding than the first resolution entropy encoding and exploits similarities between time-frequency tiles within sub-bands within the frame when the frame includes more than one time-frequency tile within a sub-band; and
Selecting the second resolution entropy encoding of the at least one value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is less than or equal to the portion of the allowed number of bits used to encode the at least one direction value.
17. The method of claim 16, wherein the at least one value based on the at least one direction value is at least one difference value from the at least one direction value compared to an average direction value for the frame.
18. The method of any one of claims 16 or 17, wherein the method further comprises:
selecting the second resolution entropy encoding of the at least one value based on the at least one direction value when the number of bits used to encode the at least one value based on the second resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one direction value but less than the determined relaxed number of bits.
19. The method of claim 18, wherein the relaxed number of bits is a number of bits relative to the portion of the allowed number of bits used to encode the at least one direction value.
20. The method of any of claims 18 or 19, wherein the at least one direction value of the at least one subframe for each subband of the frame is encoded based on at least one resolution entropy encoding, wherein one of the at least one resolution entropy encoding further comprises:
Performing a third resolution entropy encoding of at least one value based on the at least one direction value and determining a number of bits used to encode the at least one value based on the third resolution entropy encoding, wherein a quantization resolution of the third resolution is lower than the first resolution entropy encoding and the second resolution entropy encoding; and
The third resolution entropy encoding of the at least one value based on the at least one direction value is selected when the number of bits used to encode the at least one value based on the first resolution entropy encoding or the second resolution entropy encoding is greater than the portion of the allowed number of bits used to encode the at least one value.
CN202280093921.XA 2022-03-22 Parameterized spatial audio coding Pending CN118946930A (en)

Publications (1)

Publication Number Publication Date
CN118946930A true CN118946930A (en) 2024-11-12

Family

ID=

Similar Documents

Publication Publication Date Title
JP7405962B2 (en) Spatial audio parameter encoding and related decoding decisions
US20150371643A1 (en) Stereo audio signal encoder
US11096002B2 (en) Energy-ratio signalling and synthesis
US20230047237A1 (en) Spatial audio parameter encoding and associated decoding
CN111316353A (en) Determining spatial audio parameter encoding and associated decoding
EP4315324A1 (en) Combining spatial audio streams
CN112567765B (en) Spatial audio capture, transmission and reproduction
US20210319799A1 (en) Spatial parameter signalling
CN116762127A (en) Quantizing spatial audio parameters
US20230410823A1 (en) Spatial audio parameter encoding and associated decoding
WO2022223133A1 (en) Spatial audio parameter encoding and associated decoding
CN118946930A (en) Parameterized spatial audio coding
CN116547749A (en) Quantization of audio parameters
WO2023179846A1 (en) Parametric spatial audio encoding
US20230197087A1 (en) Spatial audio parameter encoding and associated decoding
KR20240152893A (en) Parametric spatial audio rendering
EP4430603A1 (en) Spatial audio parameter decoding
CN116982109A (en) Audio codec with adaptive gain control of downmix signals
WO2020193865A1 (en) Determination of the significance of spatial audio parameters and associated encoding

Legal Events

Date Code Title Description
PB01 Publication