WO2022257824A1 - 一种三维音频信号的处理方法和装置 - Google Patents
一种三维音频信号的处理方法和装置 Download PDFInfo
- Publication number
- WO2022257824A1 WO2022257824A1 PCT/CN2022/096546 CN2022096546W WO2022257824A1 WO 2022257824 A1 WO2022257824 A1 WO 2022257824A1 CN 2022096546 W CN2022096546 W CN 2022096546W WO 2022257824 A1 WO2022257824 A1 WO 2022257824A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal group
- bit allocation
- virtual
- virtual loudspeaker
- proportion
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 166
- 238000003672 processing method Methods 0.000 title abstract description 8
- 230000005540 biological transmission Effects 0.000 claims abstract description 194
- 238000000034 method Methods 0.000 claims description 142
- 238000012545 processing Methods 0.000 claims description 55
- 230000015654 memory Effects 0.000 claims description 40
- 238000004422 calculation algorithm Methods 0.000 claims description 24
- 101000934489 Homo sapiens Nucleosome-remodeling factor subunit BPTF Proteins 0.000 claims description 12
- 102100025062 Nucleosome-remodeling factor subunit BPTF Human genes 0.000 claims description 12
- 101100326803 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) fac-2 gene Proteins 0.000 claims description 11
- 230000008569 process Effects 0.000 description 38
- 238000004364 calculation method Methods 0.000 description 32
- 238000004891 communication Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 23
- 238000012512 characterization method Methods 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 239000000203 mixture Substances 0.000 description 6
- 238000011022 operating instruction Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 210000003454 tympanic membrane Anatomy 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 239000003570 air Substances 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the present application relates to the technical field of audio processing, in particular to a method and device for processing three-dimensional audio signals.
- Three-dimensional audio technology has been widely used in wireless communication voice, virtual reality/augmented reality and media audio.
- Three-dimensional audio technology is an audio technology for acquiring, processing, transmitting and rendering playback of sound events and three-dimensional sound field information in the real world.
- the three-dimensional audio technology makes the sound have a strong sense of space, envelopment and immersion, giving people an extraordinary auditory experience of "immersive sound”.
- Higher order ambisonics (HOA) technology has the property of being independent of the speaker layout in the recording, encoding and playback stages and the rotatable playback characteristics of HOA format data, which has higher flexibility in three-dimensional audio playback. Therefore, it has also received more extensive attention and research.
- the acquisition device (such as a microphone) collects a large amount of data to record the three-dimensional sound field information, and transmits the three-dimensional audio signal to the playback device (such as a speaker, earphone, etc.), so that the playback device can play the three-dimensional audio signal. Due to the large amount of data of the three-dimensional sound field information, a large amount of storage space is required to store the data, and the bandwidth requirement for transmitting the three-dimensional audio signal is relatively high. In order to solve the above problems, the three-dimensional audio signal can be compressed, and the compressed data can be stored or transmitted.
- the encoder can use multiple pre-configured virtual speakers to encode the 3D audio signal, but after the encoder encodes the 3D audio signal, how to allocate the bits of the signal is still an unresolved problem.
- Embodiments of the present application provide a method and device for processing a three-dimensional audio signal, which are used to implement bit allocation for the signal.
- an embodiment of the present application provides a method for processing a three-dimensional audio signal, including: spatially encoding the three-dimensional audio signal to be encoded to obtain a transmission channel signal and transmission channel attribute information, wherein the transmission channel signal includes: At least one virtual speaker signal group and at least one residual signal group; determining the bit allocation ratio of the virtual speaker signal group and the bit allocation ratio of the residual signal group according to the transmission channel attribute information.
- the transmission channel signal and transmission channel attribute information are obtained through three-dimensional audio signal encoding.
- the transmission channel signal may include at least one virtual speaker signal group and at least one residual signal group.
- the transmission The channel attribute information can be used to respectively determine the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group, thereby solving the problem that the bit allocation of the signal cannot be determined.
- the transmission channel attribute information includes: coding efficiency of a virtual speaker; performing spatial coding on the 3D audio signal to be encoded to obtain the transmission channel attribute information includes: using a virtual speaker to encode the performing signal reconstruction on the encoded three-dimensional audio signal to obtain a reconstructed three-dimensional audio signal; obtaining an energy representative value of the reconstructed three-dimensional audio signal and an energy representative value of the three-dimensional audio signal to be encoded; according to the reconstruction The energy representative value of the final three-dimensional audio signal and the energy representative value of the three-dimensional audio signal to be encoded are used to obtain the encoding efficiency of the virtual speaker.
- the encoder first performs signal reconstruction using a virtual speaker to obtain a reconstructed three-dimensional audio signal.
- the encoding end can calculate the energy characterization value of the signal of each transmission channel, for example, the energy characterization value of the reconstructed 3D audio signal and the energy characterization value of the 3D audio signal to be encoded can be obtained.
- the energy characterization value of the 3D audio signal is in the signal It is different before and after the reconstruction, so the coding efficiency of the virtual loudspeaker can be calculated through the transformation of the energy representation value before and after the signal reconstruction.
- the transmission channel attribute information includes: the energy ratio of the virtual speaker signal group; the method further includes: according to the energy characterization of each virtual speaker signal in the virtual speaker signal group Acquire the energy representative value of the virtual loudspeaker signal group; obtain the energy representative value of the residual signal group according to the energy representative value of each residual signal in the residual signal group; according to the virtual speaker signal group The energy representative value and the energy representative value of the residual signal group are used to obtain the energy ratio of the virtual loudspeaker signal group.
- the encoding end first obtains the energy representative value of each virtual speaker signal in the virtual speaker signal group, and then adds the energy representative values of all virtual speaker signals in the same group to obtain the virtual speaker signal group The energy representation value of .
- each group can calculate the energy representative value of the virtual loudspeaker signal group in the above manner.
- the encoding end can obtain the energy representative value of the residual signal group according to the energy representative value of each residual signal in the residual signal group.
- the encoding end can obtain the energy ratio of the virtual loudspeaker signal group according to the energy representative value of the virtual loudspeaker signal group and the energy representative value of the residual signal group.
- the energy proportion of the virtual loudspeaker signal group can indicate the proportion of the virtual loudspeaker signal group in the total transmission channel signal energy. If the energy ratio of the virtual loudspeaker signal group is relatively low, it means that the virtual loudspeaker signal group is not dominant (that is, weaker) in the total transmission channel signal energy.
- the transmission channel attribute information includes: a virtual speaker code identifier, where the virtual speaker code identifier is used to indicate whether the bit allocation of the virtual speaker signal group is dominant; Performing spatial encoding on the audio signal to obtain transmission channel attribute information, including: performing spatial encoding on the three-dimensional audio signal to be encoded to obtain the number of distinct sound sources and virtual speaker coding efficiency of the transmission channel signal; according to the transmission The number of distinct sound sources of the channel signals and the coding efficiency of the virtual loudspeaker are used to obtain the coding identifier of the virtual loudspeaker.
- the coding end after obtaining the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual speaker, the coding end obtains the virtual speaker code according to the judgment condition that the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual speaker meet.
- the specific value of the identifier is the specific value of the identifier.
- the acquiring the virtual speaker coding identifier according to the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual speaker includes: when the heterogeneous sound source of the transmission channel signal When the number of sources is less than or equal to the preset threshold of the number of dissimilar sound sources, and the virtual speaker coding efficiency is greater than or equal to the preset first virtual speaker coding efficiency threshold, it is determined that the virtual speaker coding flag is dominant; or, When the number of distinct sound sources of the transmission channel signal is greater than a preset threshold of the number of distinct sound sources, or the coding efficiency of the virtual speaker is less than a preset first virtual speaker coding efficiency threshold, determine the virtual speaker coding identifier For not dominant.
- the encoding end can determine the virtual speaker coding identifier by comparing the number of heterogeneous sound sources, the virtual speaker coding efficiency and the above judgment conditions, so that the virtual speaker coding identifier can be used to determine the bit allocation ratio of the virtual speaker signal group , and the bit allocation ratio of the residual signal group.
- the encoding end can further divide the case where the virtual speaker code identification is dominant, that is, two cases of the virtual speaker code identification being sub-dominant and strongly dominant can be obtained. It can be understood that, if the virtual loudspeaker coding flag is strongly dominant, more bits need to be allocated to the virtual loudspeaker signal group, for example, after the initial bit ratio of the virtual loudspeaker signal group is determined, the bit ratio can be increased. If the virtual speaker code is identified as sub-dominant, the virtual speaker signal group needs to allocate less bits than when the virtual speaker code is marked as strongly dominant, but the virtual speaker signal group still needs to allocate more bits than the virtual speaker signal group. For example, after determining the initial bit ratio of the virtual loudspeaker signal group, the bit ratio can be increased. In comparison, in the case of strong dominance, the increased bit ratio is greater than that in the case of subdominance.
- the transmission channel attribute information includes: the energy proportion of the virtual speaker signal group, and/or a virtual speaker code identifier; the determination of the virtual speaker according to the transmission channel attribute information
- the bit allocation proportion of the signal group and the bit allocation proportion of the residual signal group include: when the energy proportion of the virtual loudspeaker signal group is greater than or equal to the preset first energy proportion threshold, and/or the When the virtual loudspeaker coding flag is strongly dominant, determine the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group according to the preset first signal group bit allocation algorithm; when the virtual When the energy proportion of the loudspeaker signal group is greater than or equal to the preset second energy proportion threshold and less than the preset first energy proportion threshold, and/or the virtual loudspeaker code is identified as secondary dominant, according to the preset
- the second signal group bit allocation algorithm determines the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group; wherein, the second energy proportion threshold is smaller than the
- the encoder can preset multiple signal group bit allocation algorithms, and when the transmission channel attribute information meets different conditions, different signal group bit allocation algorithms can be used, so that the transmission channel attribute information can meet certain conditions.
- the virtual loudspeaker signal group and the residual signal group are assigned bit allocation ratios that are suitable for this condition, so the coding efficiency of the three-dimensional audio signal at the coding end can be improved.
- the S is the number of heterogeneous sound sources
- the ⁇ represents the coding efficiency of the virtual speaker
- the maxdirectionalNrgRatio is the preset maximum virtual speaker signal group bit allocation ratio
- the transmission channel signal includes a virtual speaker signal group and a residual signal group. After obtaining the bit allocation ratio Ratio1_1 of the virtual speaker signal group, the bit allocation ratio of the residual signal group can be obtained through the calculation formula of Ratio2 above.
- the transmission channel signal includes a virtual speaker signal group and a residual signal group. After obtaining the bit allocation ratio Ratio1_1 of the virtual speaker signal group, the bit allocation ratio of the residual signal group can be obtained through the calculation formula of Ratio2 above.
- the proportion of the bit allocation of each residual signal group in all residual signal groups can be determined according to the number of transmission channels of each residual signal group.
- R_i/C represents the transmission channel ratio of the i-th residual signal group to all residual signal groups
- the bit allocation ratio of the i-th residual signal group can be obtained through (R_i/C) and Ratio2.
- the third signal group bit allocation algorithm determines the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group, including: when directionalNrgRatio ⁇ TH3 is satisfied, or S>TH0 is satisfied, or ⁇
- the bit allocation proportion of the above, the TH3 is the second energy proportion threshold, the TH4 is the coding efficiency threshold of the first virtual speaker, the S is the number of the heterogeneous sound sources, and the n represents the The encoding efficiency of the virtual loudspeak
- the method further includes: according to the bit allocation ratio of the virtual loudspeaker signal group, the bit allocation ratio of the residual signal group, and the total number of transmission channel bits, respectively determine the The number of bits of the virtual loudspeaker signal group, the number of bits of the residual signal group; the bit allocation of the virtual speaker signal group according to the bit number of the virtual loudspeaker signal group, and the bit allocation of the virtual speaker signal group according to the bit number of the residual signal group Bit allocation is performed on the residual signal group.
- the encoding end allocates bits to the virtual loudspeaker signal group according to the number of bits of the virtual loudspeaker signal group, and allocates bits to the residual signal group according to the number of bits of the residual signal group, which solves the problem that the encoding end cannot provide a virtual loudspeaker The problem of bit allocation for signal and residual signal.
- the virtual loudspeaker signal group's bit allocation ratio, the residual signal group's bit allocation ratio, and the total number of transmission channel bits are respectively determined according to the virtual loudspeaker signal group.
- the number of bits, the Ratio1 is the proportion of the bit allocation of the virtual loudspeaker signal group, and the C_bitnum is the total transmission channel bit number
- the encoding end can predetermine the total number of transmission channel bits, and there is no limit to the value of the total transmission channel bit number, and the encoding end can calculate the number of bits and the residual of the virtual loudspeaker signal group through the above calculation formula The number of bits of the signal group realizes the problem of bit allocation for the virtual speaker signal and the residual signal at the encoding end.
- the method further includes: encoding the transmission channel signal, the bit allocation ratio of the virtual loudspeaker signal group, and the bit allocation ratio of the residual signal group, and writing input stream.
- the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group can be encoded into the bit stream, and the encoding end sends the bit stream to the decoding end, so that the decoding end can analyze the code Stream, the decoding end can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group through the bit stream, and the decoding end can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation of the residual signal group
- the ratio can obtain the number of bits allocated by the virtual speaker signal group and the number of bits allocated by the residual signal, so that the code stream can be decoded to obtain a three-dimensional audio signal.
- the embodiment of the present application also provides a method for processing a three-dimensional audio signal, including: receiving a code stream; decoding the code stream to obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group ratio; according to the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group, the virtual loudspeaker signal and the residual signal in the code stream are decoded to obtain a decoded three-dimensional audio signal .
- the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group can be encoded into the bit stream, and the encoding end sends the bit stream to the decoding end, so that the decoding end can analyze the code Stream, the decoding end can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group through the bit stream, and the decoding end can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation of the residual signal group
- the ratio can obtain the number of bits allocated by the virtual speaker signal group and the number of bits allocated by the residual signal, so that the code stream can be decoded to obtain a three-dimensional audio signal.
- the virtual speaker signal and the residual signal in the code stream are performed according to the bit allocation ratio of the virtual speaker signal group and the bit distribution ratio of the residual signal group Decoding, including: determining the number of available bits according to the code stream; determining the number of bits of the virtual speaker signal group according to the available bit number and the bit allocation ratio of the virtual speaker signal group; according to the virtual speaker signal group Decode the virtual loudspeaker signal in the code stream; determine the number of bits of the residual signal group according to the available bit number and the bit allocation ratio of the residual signal group; according to the residual The number of bits of the signal group decodes the residual signal in the code stream.
- an embodiment of the present application further provides a processing device for a 3D audio signal, including: an encoding module, configured to perform spatial encoding on the 3D audio signal to be encoded to obtain a transmission channel signal and transmission channel attribute information, wherein the The transmission channel signal includes: at least one virtual loudspeaker signal group and at least one residual signal group; a bit allocation proportion determining module, configured to determine the bit allocation proportion and the bit allocation proportion of the virtual loudspeaker signal group according to the transmission channel attribute information bit allocation proportion of the residual signal group.
- the constituent modules of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned first aspect and various possible implementations. For details, see the aforementioned first aspect and various possible implementations. Description in Implementation.
- the embodiment of the present application also provides a three-dimensional audio signal processing device, including: a receiving module, configured to receive a code stream; a decoding module, configured to decode the code stream to obtain the bit allocation of the virtual speaker signal group ratio and the bit allocation proportion of the residual signal group; the signal generation module is used to calculate the virtual frequency in the code stream according to the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group.
- the loudspeaker signal and the residual signal are decoded to obtain a decoded three-dimensional audio signal.
- the components of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned second aspect and various possible implementations.
- the components of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned second aspect and various possible implementations.
- the components of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned second aspect and various possible implementations.
- the components of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned second aspect and various possible implementations.
- the components of the three-dimensional audio signal processing device can also perform the steps described in the aforementioned second aspect and various possible implementations.
- the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer, the computer executes the above-mentioned first aspect or the second aspect. described method.
- the embodiment of the present application provides a computer program product containing instructions, which, when run on a computer, causes the computer to execute the method described in the first aspect or the second aspect above.
- the embodiment of the present application provides a computer-readable storage medium, including the code stream generated by the method described in the foregoing first aspect.
- the embodiment of the present application provides a communication device, which may include entities such as terminal equipment or chips, and the communication device includes: a processor and a memory; the memory is used to store instructions; the processor is used to Executing the instructions in the memory causes the communication device to execute the method as described in any one of the aforementioned first aspect or second aspect.
- the present application provides a chip system, which includes a processor, configured to support an audio encoder or an audio decoder to implement the functions involved in the above aspect, for example, to send or process the information involved in the above method data and/or information.
- the chip system further includes a memory, and the memory is used for storing necessary program instructions and data of the audio encoder or audio decoder.
- the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
- spatial encoding is first performed on the three-dimensional audio signal to be encoded to obtain transmission channel signals and transmission channel attribute information, wherein the transmission channel signals include: at least one virtual speaker signal group and at least one residual signal group; Then, the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group are determined according to the attribute information of the transmission channel.
- the transmission channel signal and transmission channel attribute information are obtained through three-dimensional audio signal encoding.
- the transmission channel signal may include at least one virtual speaker signal group and at least one residual signal group.
- the transmission channel attribute information can be used for The bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group are respectively determined, thereby solving the problem that the bit allocation of the signal cannot be determined.
- FIG. 1 is a schematic diagram of the composition and structure of an audio processing system provided by an embodiment of the present application
- FIG. 2a is a schematic diagram of an audio encoder and an audio decoder provided in an embodiment of the present application applied to a terminal device;
- FIG. 2b is a schematic diagram of an audio encoder provided by an embodiment of the present application applied to a wireless device or a core network device;
- FIG. 2c is a schematic diagram of an audio decoder provided by an embodiment of the present application applied to a wireless device or a core network device;
- FIG. 3a is a schematic diagram of a multi-channel encoder and a multi-channel decoder provided in an embodiment of the present application applied to a terminal device;
- FIG. 3b is a schematic diagram of a multi-channel encoder provided by an embodiment of the present application applied to a wireless device or a core network device;
- FIG. 3c is a schematic diagram of a multi-channel decoder provided in an embodiment of the present application applied to a wireless device or a core network device;
- FIG. 4 is a schematic diagram of a method for processing a three-dimensional audio signal provided in an embodiment of the present application
- FIG. 5 is a schematic diagram of a method for processing a three-dimensional audio signal provided in an embodiment of the present application
- FIG. 6 is a schematic diagram of an application scenario of a three-dimensional audio signal provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of the composition and structure of an audio encoding device provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of the composition and structure of an audio decoding device provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of the composition and structure of another audio encoding device provided by the embodiment of the present application.
- FIG. 10 is a schematic diagram of the composition and structure of another audio decoding device provided by an embodiment of the present application.
- Sound is a continuous wave produced by the vibration of an object. Objects that vibrate to emit sound waves are called sound sources. When sound waves propagate through a medium (such as air, solid or liquid), the auditory organs of humans or animals can perceive sound.
- a medium such as air, solid or liquid
- Characteristics of sound waves include pitch, intensity, and timbre.
- Pitch indicates how high or low a sound is.
- Pitch intensity indicates the volume of a sound.
- Pitch intensity can also be called loudness or volume.
- the unit of sound intensity is decibel (decibel, dB). Timbre is also called fret.
- the frequency of sound waves determines the pitch of the sound. The higher the frequency, the higher the pitch.
- the number of times an object vibrates within one second is called frequency, and the unit of frequency is hertz (Hz).
- the frequency of sound that can be recognized by the human ear is between 20Hz and 20000Hz.
- the amplitude of the sound wave determines the intensity of the sound. The greater the amplitude, the greater the sound intensity. The closer the distance to the sound source, the greater the sound intensity.
- the waveform of the sound wave determines the timbre.
- the waveforms of sound waves include square waves, sawtooth waves, sine waves, and pulse waves.
- sounds can be divided into regular sounds and irregular sounds.
- Random sound refers to the sound produced by the sound source vibrating randomly. Random sounds are, for example, noises that affect people's work, study, and rest.
- a regular sound refers to a sound produced by a sound source vibrating regularly. Regular sounds include speech and musical tones.
- regular sound is an analog signal that changes continuously in the time-frequency domain. The analog signals may be referred to as audio signals (acoustic signals).
- An audio signal is an information carrier that carries speech, music and sound effects.
- the human sense of hearing has the ability to distinguish the location and distribution of sound sources in space, when the listener hears the sound in the space, he can not only feel the pitch, intensity and timbre of the sound, but also feel the direction of the sound.
- Three-dimensional audio technology refers to the assumption that the space outside the human ear is a system, and the signal received at the eardrum is a three-dimensional audio signal that is output by filtering the sound from the sound source through a system outside the ear.
- a system other than the human ear can be defined as a system impulse response h(n)
- any sound source can be defined as x(n)
- the signal received at the eardrum is the convolution result of x(n) and h(n) .
- the three-dimensional audio signal described in the embodiment of the present application may refer to a higher order ambisonics (higher order ambisonics, HOA) signal or a first order ambisonics (first order ambisonics, FOA) signal.
- Three-dimensional audio can also be called three-dimensional audio, spatial audio, three-dimensional sound field reconstruction, virtual 3D audio, or binaural audio.
- the sound pressure p satisfies formula (1), is the Laplacian operator.
- the space system outside the human ear is a sphere, and the listener is at the center of the sphere, the sound from outside the sphere has a projection on the sphere, and the sound outside the sphere is filtered out.
- the sound source is distributed on the sphere, use the sphere
- the sound field generated by the above sound source is used to fit the sound field generated by the original sound source, that is, the three-dimensional audio technology is a method of fitting the sound field.
- the formula (1) equation is solved in the spherical coordinate system, and in the passive spherical region, the solution of the formula (1) is the following formula (2).
- r represents the radius of the ball
- ⁇ represents the horizontal angle
- k represents the wave number
- s represents the amplitude of the ideal plane wave
- m represents the order number of the three-dimensional audio signal (or the order number of the HOA signal).
- represents ⁇ The spherical harmonics of the direction, Spherical harmonics representing the direction of the sound source.
- the three-dimensional audio signal coefficients satisfy formula (3).
- formula (3) can be transformed into formula (4).
- N is an integer greater than or equal to 1.
- the value of N is an integer ranging from 2 to 6.
- the coefficients of the 3D audio signal described in the embodiments of the present application may refer to HOA coefficients or ambient stereo (ambisonic) coefficients.
- the three-dimensional audio signal is an information carrier carrying the spatial position information of the sound source in the sound field, and describes the sound field of the listener in the space.
- Formula (4) shows that the sound field can be expanded on the spherical surface according to the spherical harmonic function, that is, the sound field can be decomposed into the superposition of multiple plane waves. Therefore, the sound field described by the three-dimensional audio signal can be expressed by the superposition of multiple plane waves, and the sound field can be reconstructed through the coefficients of the three-dimensional audio signal.
- the HOA signal includes a large amount of data for describing the spatial information of the sound field. If the acquisition device (such as a microphone) transmits the three-dimensional audio signal to a playback device (such as a speaker), a large bandwidth needs to be consumed.
- the encoder can use the spatial squeezed surround audio coding (spatial squeezed surround audio coding, S3AC) method or the directional audio coding (directional audio coding, DirAC) method or the coding method based on virtual speaker selection to compress and code the three-dimensional audio signal to obtain the code stream, to transmit a code stream to a playback device, wherein the encoding method based on virtual speaker selection may also be referred to as a matching projection (matchPRojection, MP) encoding method, and the encoding method selected by a virtual speaker will be described later as an example.
- the playback device decodes the code stream, reconstructs the three-dimensional audio signal, and plays the reconstructed three-dimensional audio signal. Therefore, the amount of data transmitted to the playback device and the bandwidth occupation of the three-dimensional audio signal are reduced.
- the sound field classification of the 3D audio signal can be realized through the linear decomposition of the 3D audio signal, so that the sound field classification of the 3D audio signal can be accurately realized, and the sound field classification result of the current frame can be obtained.
- the embodiment of the present application provides an audio coding technology, especially a three-dimensional audio coding technology for three-dimensional audio signals, and specifically provides a coding technology that uses fewer channels to represent three-dimensional audio signals to improve traditional audio coding system.
- Audio coding (or commonly referred to as coding) includes two parts of audio coding and audio decoding. Audio encoding is performed on the source side and involves processing (eg, compressing) raw audio to reduce the amount of data needed to represent the audio for more efficient storage and/or transmission. Audio decoding is performed at the destination, including inverse processing relative to the encoder to reconstruct the original audio. The encoding part and the decoding part are also collectively referred to as encoding.
- the implementation of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
- the technical solution of the embodiment of the present application can be applied to various audio processing systems, as shown in FIG. 1 , which is a schematic diagram of the composition and structure of the audio processing system provided by the embodiment of the present application.
- the audio processing system 100 may include: an audio encoding device 101 and an audio decoding device 102 .
- the audio coding device 101 can be used to generate a code stream, and then the audio coded code stream can be transmitted to the audio decoding device 102 through an audio transmission channel, and the audio decoding device 102 can receive the code stream, and then perform the audio decoding function of the audio decoding device 102 , and finally get the reconstructed signal.
- the audio coding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
- the audio coding device can be the above-mentioned terminal device or wireless device or Audio encoder for core network equipment.
- the audio decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices. decoder.
- the audio encoder may include a radio access network, a media gateway of the core network, a transcoding device, a media resource server, a mobile terminal, a fixed network terminal, etc., and the audio encoder may also be a virtual reality (VR) ) audio encoders in streaming services.
- VR virtual reality
- the end-to-end audio signal processing flow includes: audio signal A passes through the acquisition module (audioPReprocessing) after (acquisition), the preprocessing operation includes filtering out the low frequency part of the signal, which can be 20Hz or 50Hz as the dividing point, extracting the orientation information in the signal, and then performing encoding processing (audio encoding) Package (file/segment encapsulation) and then send (delivery) to the decoding end, the decoding end first unpacks (file/segment decapsulation), then decodes (audio decoding), performs binaural rendering (audio rendering) processing on the decoded signal, and renders The processed signal is mapped onto the listener's headphones, which may be standalone headphones or headphones on a glasses device.
- FIG. 2a it is a schematic diagram of an audio encoder and an audio decoder provided in the embodiment of the present application applied to a terminal device.
- Each terminal device may include: an audio encoder, a channel encoder, an audio decoder, and a channel decoder.
- the channel encoder is used for channel coding the audio signal
- the channel decoder is used for channel decoding the audio signal.
- the first terminal device 20 may include: a first audio encoder 201 , a first channel encoder 202 , a first audio decoder 203 , and a first channel decoder 204 .
- the second terminal device 21 may include: a second audio decoder 211 , a second channel decoder 212 , a second audio encoder 213 , and a second channel encoder 214 .
- the first terminal device 20 is connected to a wireless or wired first network communication device 22, the first network communication device 22 is connected to a wireless or wired second network communication device 23 through a digital channel, and the second terminal device 21 is connected to a wireless or wired network communication device.
- the second network communication device 23 may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
- the terminal device as the sending end first collects audio, performs audio coding on the collected audio signal, and then performs channel coding, and then transmits in a digital channel through a wireless network or a core network.
- the terminal device as the receiving end performs channel decoding according to the received signal to obtain the code stream, and then recovers the audio signal through audio decoding, and the terminal device at the receiving end enters the audio playback.
- the wireless device or the core network device 25 includes: a channel decoder 251, other audio decoders 252, an audio encoder 253 provided in the embodiment of the present application, and a channel encoder 254, wherein the other audio decoders 252 refer to Audio codecs other than audio codecs.
- the channel decoder 251 is first used to perform channel decoding on the signal entering the device, and then other audio decoders 252 are used for audio decoding, and then the audio encoder 253 provided by the embodiment of the present application is used for decoding.
- the channel coder 254 is used to perform channel coding on the audio signal, and the channel coding is completed before transmission.
- the other audio decoder 252 performs audio decoding on the code stream decoded by the channel decoder 251 .
- FIG. 2c it is a schematic diagram of an audio decoder provided by the embodiment of the present application being applied to a wireless device or a core network device.
- the wireless device or the core network device 25 includes: a channel decoder 251, an audio decoder 255 provided in the embodiment of the present application, other audio encoders 256, and a channel encoder 254, wherein the other audio encoders 256 refer to Audio codecs other than audio codecs.
- the signal entering the device is first channel-decoded by the channel decoder 251, then the received audio coded stream is decoded using the audio decoder 255, and then other audio encoders 256 are used to Perform audio encoding, and finally use the channel encoder 254 to perform channel encoding on the audio signal, and then transmit it after completing the channel encoding.
- the wireless device refers to equipment related to radio frequency in communication
- the core network device refers to equipment related to core network in communication.
- the audio coding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
- the audio coding device can be the above-mentioned terminal device or wireless device Or a multi-channel encoder of a core network device.
- the audio decoding device can be applied to various terminal devices that require audio communication, wireless devices that require transcoding, and core network devices.
- the audio decoding device can be a combination of the above-mentioned terminal devices or wireless devices or core network devices. channel decoder.
- a schematic diagram of the application of the multi-channel encoder and multi-channel decoder provided by the embodiment of the present application to the terminal equipment may include: a multi-channel encoder, a channel encoder, Multi-channel decoder, channel decoder.
- the multi-channel encoder may execute the audio encoding method provided in the embodiment of the present application
- the multi-channel decoder may execute the audio decoding method provided in the embodiment of the present application.
- the channel encoder is used to perform channel coding on the multi-channel signal
- the channel decoder is used to perform channel decoding on the multi-channel signal.
- the first terminal device 30 may include: a first multi-channel encoder 301 , a first channel encoder 302 , a first multi-channel decoder 303 , and a first channel decoder 304 .
- the second terminal device 31 may include: a second multi-channel decoder 311 , a second channel decoder 312 , a second multi-channel encoder 313 , and a second channel encoder 314 .
- the first terminal device 30 is connected to a wireless or wired first network communication device 32, and the first network communication device 32 is connected to a wireless or wired second network communication device 33 through a digital channel, and the second terminal device 31 is connected to a wireless or wired network communication device.
- the second network communication device 33 is connected to a wireless or wired network communication device.
- the foregoing wireless or wired network communication equipment may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, and the like.
- the terminal device as the sending end performs multi-channel coding on the collected multi-channel signal, and then performs channel coding, and then transmits it in a digital channel through a wireless network or a core network.
- the terminal device as the receiving end performs channel decoding according to the received signal to obtain the coded stream of the multi-channel signal, and then restores the multi-channel signal through multi-channel decoding, and the terminal device as the receiving end plays it back.
- FIG. 3b it is a schematic diagram of a multi-channel encoder applied to a wireless device or a core network device provided by the embodiment of the present application, wherein the wireless device or the core network device 35 includes: a channel decoder 351, other audio decoders 352 , the multi-channel encoder 353, and the channel encoder 354 are similar to those in FIG. 2b, and will not be repeated here.
- FIG. 3c it is a schematic diagram of a multi-channel decoder applied to a wireless device or a core network device provided by the embodiment of the present application, wherein the wireless device or the core network device 35 includes: a channel decoder 351, a multi-channel decoder 355 , other audio encoder 356 , and channel encoder 354 are similar to those in FIG. 2 c and will not be repeated here.
- the audio encoding process can be a part of the multi-channel encoder, and the audio decoding process can be a part of the multi-channel decoder.
- performing multi-channel encoding on the collected multi-channel signal can be the After the multi-channel signal is processed, the audio signal is obtained, and then the obtained audio signal is encoded according to the method provided in the embodiment of the present application; the decoding end encodes the code stream according to the multi-channel signal, decodes the audio signal, and after the up-mixing process Recover the multi-channel signal. Therefore, the embodiments of the present application may also be applied to multi-channel encoders and multi-channel decoders in terminal devices, wireless devices, and core network devices. In wireless or core network equipment, if transcoding needs to be implemented, corresponding multi-channel encoding processing needs to be performed.
- the method can be executed by a terminal device, for example, the terminal device can be an audio encoding device (hereinafter referred to as an encoding terminal or an encoder).
- the terminal device may also be a three-dimensional audio signal processing device.
- the processing method of the three-dimensional audio signal mainly includes the following:
- the encoding end may acquire a three-dimensional audio signal
- the three-dimensional audio signal may be a scene audio signal.
- the three-dimensional audio signal may be a time-domain signal or a frequency-domain signal.
- the 3D audio signal may also be a down-sampled signal.
- the virtual speaker signals corresponding to these virtual speakers can be obtained, and then these virtual speaker signals are grouped to obtain the at least one virtual speaker signal group or, after determining the virtual speakers that encode the three-dimensional audio signal from the set of candidate virtual speakers, these virtual speakers can be grouped to obtain at least one virtual speaker group, and then each of the at least one virtual speaker group can be obtained virtual speaker signals corresponding to the virtual speakers, so as to obtain the at least one virtual speaker signal group.
- the three-dimensional audio signal includes: a high-order ambisonic HOA signal, or a first-order ambisonic FOA signal.
- the three-dimensional audio signal may also be other types of signals, and this is only an example of the present application, and is not intended to limit the embodiment of the present application.
- the 3D audio signal may be a time-domain HOA signal or a frequency-domain HOA signal.
- the 3D audio signal may include all channels of the HOA signal, or may include some HOA channels (for example, FOA channels).
- the three-dimensional audio signal may be all sample points of the HOA signal, or 1/Q down-sampling points after the HOA signal to be analyzed is down-sampled. Among them, Q is the downsampling interval, and 1/Q is the downsampling rate.
- the 3D audio signal includes multiple frames. Next, take the processing of a frame in the 3D audio signal as an example. For example, if this frame is the current frame, there is still The previous frame, there is a next frame after the current frame.
- the processing method of other frames of the 3D audio signal except the current frame in the embodiment of the present application is similar to the processing method of the current frame, and the processing of the current frame will be used as an example in the following.
- the three-dimensional audio signal is spatially encoded to obtain the transmission channel signal and transmission channel attribute information.
- the specific process of spatial encoding no further description will be given here. The process of outputting the virtual loudspeaker signal and the residual signal after spatial encoding will not be described again.
- the encoding end after the encoding end acquires the 3D audio signal to be encoded, it can perform spatial encoding on the 3D audio signal, and can output the transmission channel signal and transmission channel attribute information.
- the transmission channel signal includes the virtual speaker signal and residual
- the difference signal for example grouping the virtual loudspeaker signals, results in at least one group of virtual loudspeaker signals.
- the residual signals are grouped to obtain at least one residual signal group.
- the transmission channel attribute information corresponding to the transmission channel signal can also be output through spatial coding.
- the transmission channel attribute information is used to indicate the attribute of the transmission channel signal.
- the transmission channel attribute information includes: coding efficiency of the virtual speaker; the coding efficiency of the virtual speaker indicates the efficiency of reconstructing the 3D audio signal using the virtual speaker for the 3D audio signal.
- the transmission channel attribute information output by the encoder (which may also be the encoding end) through spatial encoding includes the coding efficiency of the virtual loudspeaker, and the calculation method of the coding efficiency of the virtual loudspeaker is described next.
- Step 401 performs spatial coding on the 3D audio signal to be coded to obtain transmission channel attribute information, including:
- the virtual speaker for signal reconstruction of the 3D audio signal to be encoded may be determined from the set of candidate virtual speakers as described above Virtual speakers for encoding 3D audio signals.
- the coding efficiency of the virtual speaker is obtained.
- the encoding end first performs signal reconstruction using a virtual speaker, and obtains a reconstructed three-dimensional audio signal.
- the encoding end can calculate the energy characterization value of the signal of each transmission channel, for example, the energy characterization value of the reconstructed 3D audio signal and the energy characterization value of the 3D audio signal to be encoded can be obtained.
- the energy characterization value of the 3D audio signal is in the signal It is different before and after the reconstruction, so the coding efficiency of the virtual loudspeaker can be calculated through the transformation of the energy representation value before and after the signal reconstruction.
- the encoding end calculates and reconstructs the energy representation value of each transmission channel of the HOA signal, which can be expressed as R1, R2,...,Rt, and the encoding end calculates The energy characterization value of each transmission channel of the original HOA signal can be expressed as N1, N2,...,Nt.
- the virtual loudspeaker coding efficiency ⁇ : ⁇ sum(R)/sum(N), wherein, sum(R) represents the summation of R1-Rt, and sum(N) represents the summation of N1-Nt.
- the transmission channel attribute information includes: the energy ratio of the virtual speaker signal group; the energy ratio of the virtual speaker signal group refers to the energy ratio of all virtual speaker signals in the virtual speaker signal group in all transmission channel signals proportion of the total energy.
- the methods performed by the encoding side also include:
- the energy proportion of the virtual loudspeaker signal group is obtained.
- the encoding end first obtains the energy representation value of each virtual speaker signal in the virtual speaker signal group, and then adds the energy representation values of all virtual speaker signals in the same group to obtain the energy representation of the virtual speaker signal group value. If there are multiple virtual loudspeaker signal groups, each group can calculate the energy representative value of the virtual loudspeaker signal group in the above manner.
- the encoding end can obtain the energy representative value of the residual signal group according to the energy representative value of each residual signal in the residual signal group.
- the encoding end can obtain the energy ratio of the virtual loudspeaker signal group according to the energy representative value of the virtual loudspeaker signal group and the energy representative value of the residual signal group.
- the energy proportion of the virtual loudspeaker signal group can indicate the proportion of the virtual loudspeaker signal group in the total transmission channel signal energy. If the energy ratio of the virtual loudspeaker signal group is relatively low, it means that the virtual loudspeaker signal group is not dominant (that is, weaker) in the total transmission channel signal energy.
- the transmission channel attribute information includes: a virtual speaker code identifier, where the virtual speaker code identifier is used to indicate whether the bit allocation of the virtual speaker signal group is dominant.
- the virtual speaker code identifier is used to indicate whether the bit allocation of at least one virtual speaker signal group is dominant, for example, the virtual speaker code identifier can be expressed as a flag, and the virtual speaker code identifier can indicate that the bit allocation of the virtual speaker signal group is dominant , or not dominant, different values of the virtual loudspeaker code identifier may indicate that the bit allocation of the virtual loudspeaker signal group is dominant or not dominant.
- the dominance can also be divided into strong dominance and second dominance (ie slightly dominance).
- Perform spatial encoding on the 3D audio signal to be encoded to obtain transmission channel attribute information including:
- Spatial encoding is performed on the 3D audio signal to be encoded to obtain the number of different sound sources of the transmission channel signal and the encoding efficiency of the virtual speaker;
- the coding identifier of the virtual loudspeaker is obtained according to the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual loudspeaker.
- the encoding end can classify the sound field of the transmission channel signal through spatial coding, and generate the sound field classification result.
- the sound field classification result can include the number of different sound sources.
- the specific calculation process for the number of different sound sources is not done here. limited.
- the coding end After obtaining the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual speaker, the coding end obtains the specific value of the virtual speaker coding identifier according to the judgment conditions met by the number of heterogeneous sound sources of the transmission channel signal and the coding efficiency of the virtual speaker , in the embodiment of the present application, there are many ways to realize the code identification of the virtual loudspeaker, please refer to the examples in the subsequent embodiments for details.
- the coding identification of the virtual speaker is obtained, including:
- the virtual speaker coding efficiency is smaller than a preset first virtual speaker coding efficiency threshold, it is determined that the virtual speaker coding flag is not dominant.
- the threshold of the number of dissimilar sound sources and the threshold of coding efficiency of the first virtual loudspeaker may be combined with application scenarios, and are not limited here.
- the threshold of the number of heterogeneous sound sources may be represented as TH0
- the threshold of coding efficiency of the first virtual loudspeaker may be represented as TH4.
- the virtual loudspeaker code is marked as dominant, which means that the virtual loudspeaker signal group is dominant in the total transmission channel signal, so the virtual loudspeaker signal group needs to allocate more bits, for example, when determining the initial bit occupation of the virtual loudspeaker signal group After the ratio, the bit ratio can be increased.
- the coding flag of the virtual loudspeaker is not dominant, indicating that the virtual loudspeaker signal group is not dominant in the total transmission channel signals, and at this time, less bits may be allocated to the virtual loudspeaker signal group. For example, after the initial bit ratio of the virtual loudspeaker signal group is determined, the bit ratio may be reduced.
- the encoding end can determine the virtual loudspeaker coding identifier by comparing the number of heterogeneous sound sources, the coding efficiency of the virtual loudspeaker and the above-mentioned judgment conditions, so that the virtual loudspeaker coding identifier can be used to determine the bit allocation of the virtual loudspeaker signal group. ratio, and the bit allocation ratio of the residual signal group.
- the dominance includes sub-dominance or strong dominance; determining that the virtual speaker code is identified as dominance includes:
- the virtual speaker encoding efficiency is greater than or equal to the first virtual speaker encoding efficiency threshold and the virtual speaker encoding efficiency is less than or equal to the preset second virtual speaker encoding efficiency threshold, it is determined that the virtual speaker encoding flag is sub-dominant; or,
- the second virtual speaker coding efficiency threshold is greater than the first virtual speaker coding efficiency threshold.
- the encoding end can further divide the situation that the virtual speaker encoding identification is dominant, that is, two cases of the virtual speaker encoding identification being sub-dominant and strongly dominant can be obtained. It can be understood that, if the virtual loudspeaker coding flag is strongly dominant, more bits need to be allocated to the virtual loudspeaker signal group, for example, after the initial bit ratio of the virtual loudspeaker signal group is determined, the bit ratio can be increased.
- the virtual speaker signal group needs to allocate less bits than when the virtual speaker code is marked as strongly dominant, but the virtual speaker signal group still needs to allocate more bits than the virtual speaker signal group. For example, after determining the initial bit ratio of the virtual loudspeaker signal group, the bit ratio can be increased. In comparison, in the case of strong dominance, the increased bit ratio is greater than that in the case of subdominance.
- the second virtual loudspeaker coding efficiency threshold may be denoted as TH2.
- the transmission channel attribute information can be used to perform bit allocation for the virtual loudspeaker signal group, and in addition , bit allocation can be performed for the residual signal group by using the attribute information of the transmission channel.
- the encoding end determines the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group according to the attribute information of the transmission channel.
- the bit allocation ratio refers to the ratio of the number of bits allocated for a signal group to the total number of bits of the transmission channel signal, and the bit allocation ratio may also be referred to as a "bit allocation ratio".
- the transmission channel signal includes at least one virtual speaker signal group and at least one residual signal group, so the bit allocation ratio of the virtual speaker signal group and the bit allocation ratio of the residual signal group can be obtained.
- the process of determining the bit allocation proportions of one virtual loudspeaker signal group and two residual signal groups is taken as an example for illustration.
- the spatial coding can output the transmission channel signal and transmission channel attribute information
- the core encoder obtains the transmission channel signal and transmission channel attribute information
- the core encoder then passes the transmission channel signal and transmission channel attribute information information, the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group can be obtained.
- the transmission channel attribute information includes: the energy ratio of the virtual loudspeaker signal group, and/or the code identifier of the virtual loudspeaker;
- the virtual loudspeaker signal group is determined according to the preset first signal group bit allocation algorithm The bit allocation proportion of and the bit allocation proportion of the residual signal group;
- the second signal group bit allocation algorithm determines the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group; wherein, the second energy proportion threshold is smaller than the first energy proportion threshold;
- the energy proportion of the virtual loudspeaker signal group is less than the preset first energy proportion threshold, or the virtual loudspeaker code is identified as not dominant, determine the bit allocation of the virtual loudspeaker signal group according to the preset third signal group bit allocation algorithm Percentage and bit allocation proportion of the residual signal group.
- the encoder can preset multiple signal group bit allocation algorithms, and different signal group bit allocation algorithms can be used when the transmission channel attribute information satisfies different conditions, so that the transmission channel attribute information can meet certain conditions.
- the virtual loudspeaker signal group and the residual signal group are assigned bit allocation ratios that are suitable for this condition, so the coding efficiency of the three-dimensional audio signal at the coding end can be improved.
- the first energy proportion threshold may be denoted as TH1
- the second energy proportion threshold may be denoted as TH3.
- the group bit allocation algorithm determines the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group, including:
- bit allocation ratio of the virtual loudspeaker signal group is calculated as follows:
- Ratio1_1 FAC1*directionalNrgRatio+(1–FAC1)*maxdirectionalNrgRatio;
- directionalNrgRatio represents the energy ratio of the virtual speaker signal group
- S is the number of heterogeneous sound sources
- ⁇ represents the coding efficiency of the virtual speaker
- maxdirectionalNrgRatio is the preset maximum virtual speaker signal group bit allocation ratio
- FAC1 is the preset first Adjustment factor
- Ratio1_1 is the bit allocation ratio of the virtual speaker signal group
- * means multiplication operation
- TH1 is the first energy ratio threshold
- TH0 is the threshold of the number of different sound sources
- TH2 is the second virtual speaker coding efficiency threshold
- bit allocation ratio of the residual signal group is calculated as follows:
- Ratio2 1-Ratio1_1;
- Ratio1_1 is the bit allocation ratio of the virtual loudspeaker signal group
- Ratio2 is the bit allocation ratio of the residual signal group
- the transmission channel signal includes a virtual speaker signal group and a residual signal group. After obtaining the bit allocation ratio Ratio1_1 of the virtual speaker signal group, the bit allocation ratio of the residual signal group can be obtained through the calculation formula of Ratio2 above.
- FAC1 may be flexibly determined according to a specific application scenario, and is not limited here.
- the method performed by the encoding end further includes:
- the bit allocation ratio of the virtual loudspeaker signal group is updated in the following manner:
- Ratio1_2 min(Ratio1_1, maxdirectionalNrgRatio+FAC2*Ratio1_1)
- Ratio1_2 represents the bit allocation ratio of the updated virtual speaker signal group
- FAC2 is the preset second adjustment factor
- maxdirectionalNrgRatio is the preset maximum virtual speaker signal group bit distribution ratio
- Ratio1_1 is the virtual speaker signal before update
- the bit allocation ratio of the group * indicates the multiplication operation
- min is the minimum value operation.
- FAC2 may be flexibly determined according to a specific application scenario, which is not limited here.
- the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group are determined according to the preset second signal group bit allocation algorithm; wherein, the second energy ratio threshold is smaller than the first energy ratio threshold ratio thresholds, including:
- Ratio1_1 is calculated as follows:
- Ratio1_1 FAC3*directionalNrgRatio+(1–FAC3)*maxdirectionalNrgRatio;
- maxdirectionalNrgRatio is the proportion of bit allocation of the preset virtual speaker signal group
- FAC3 is the third preset adjustment factor
- directionalNrgRatio represents the energy ratio of the virtual speaker signal group
- S is the number of heterogeneous sound sources
- ⁇ represents the coding efficiency of the virtual speaker
- Ratio1_1 is the bit allocation ratio of the virtual speaker signal group
- * means the multiplication operation
- TH0 is the threshold of the number of heterogeneous sound sources
- TH1 is the first energy ratio threshold
- TH2 is the second virtual speaker coding efficiency threshold
- TH3 is the second Two energy ratio thresholds
- TH4 is the first virtual speaker coding efficiency threshold
- bit allocation ratio of the residual signal group is calculated as follows:
- Ratio2 1-Ratio1_1;
- Ratio1_1 is the bit allocation ratio of the virtual loudspeaker signal group
- Ratio2 is the bit allocation ratio of the residual signal group
- FAC3 may be flexibly determined according to a specific application scenario, and is not limited here. For example, 0 ⁇ FAC3 ⁇ 0.5, FAC3>FAC1.
- the transmission channel signal includes a virtual speaker signal group and a residual signal group. After obtaining the bit allocation ratio Ratio1_1 of the virtual speaker signal group, the bit allocation ratio of the residual signal group can be obtained through the calculation formula of Ratio2 above.
- the method provided in the embodiment of the present application further includes:
- the bit allocation ratio of the virtual loudspeaker signal group is updated in the following manner:
- Ratio1_2 min(Ratio1_1, maxdirectionalNrgRatio+FAC4*Ratio1_1).
- Ratio1_2 represents the bit allocation ratio of the updated virtual speaker signal group
- FAC4 is the preset fourth adjustment factor
- maxdirectionalNrgRatio is the preset maximum virtual speaker signal group bit allocation ratio
- Ratio1_1 is the virtual speaker signal before update
- the bit allocation ratio of the group * indicates the multiplication operation
- min is the minimum value operation.
- FAC4 may be flexibly determined according to a specific application scenario, and is not limited here.
- the method provided in the embodiment of the present application further includes:
- Ratio2_i Ratio2*(R_i/C);
- R_i represents the number of transmission channels included in the i-th residual signal group
- C is the total number of transmission channels of all residual signal groups
- Ratio2_i is the bit allocation ratio of the i-th residual signal group
- * represents the relative In the multiplication operation
- Ratio2 assigns proportions to the bits of all residual signal groups.
- the proportion of the bit allocation of each residual signal group in all residual signal groups may be determined according to the number of transmission channels of each residual signal group.
- R_i/C represents the transmission channel ratio of the i-th residual signal group to all residual signal groups, and the bit allocation ratio of the i-th residual signal group can be obtained through (R_i/C) and Ratio2.
- the bit allocation according to the preset third signal group determines the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group, including:
- Ratio1_1 directionalNrgRatio
- directionalNrgRatio represents the energy ratio of the virtual speaker signal group
- Ratio1_1 is the bit allocation ratio of the virtual speaker signal group
- TH3 is the second energy ratio threshold
- TH4 is the first virtual speaker coding efficiency threshold
- S is the heterogeneous sound source Quantity
- ⁇ represents the coding efficiency of virtual loudspeaker
- TH0 is the threshold value of the number of heterogeneous sound sources
- bit allocation ratio of the residual signal group is calculated as follows:
- Ratio2_1 D/(F+D);
- Ratio2_1 is the bit allocation ratio of the residual signal group
- F represents the energy representative value of the virtual loudspeaker signal group
- D is the energy representative value of the residual signal group.
- the method provided in the embodiment of the present application further includes:
- Ratio1_1 groupBitsRatio1
- Ratio1_2 groupBitsRatio1
- Ratio1_2 FAC5*groupBitsRatio1+(1–FAC5)*Ratio1_1;
- Ratio1_2 represents the bit allocation ratio of the updated virtual speaker signal group
- FAC5 is the preset fifth adjustment factor
- Ratio1_1 is the bit distribution ratio of the virtual speaker signal group before the update
- * represents the multiplication operation
- groupBitsRatio1 is Preset virtual loudspeaker signal group bit allocation ratio
- Ratio2_1 groupBitsRatio2
- Ratio2_2 groupBitsRatio2
- Ratio2_2 indicates the bit allocation ratio of the updated residual signal group
- FAC6 is the preset sixth adjustment factor
- Ratio2_1 is the bit allocation ratio of the residual signal group before the update
- * indicates the multiplication operation
- groupBitsRatio2 is The preset residual signal group bit allocation ratio.
- FAC5 may be flexibly determined according to a specific application scenario, which is not limited here.
- the method provided in the embodiment of the present application further includes the following steps:
- bit allocation proportion of the virtual loudspeaker signal group the bit allocation proportion of the residual signal group and the total number of transmission channel bits, respectively determine the number of bits of the virtual loudspeaker signal group and the number of bits of the residual signal group;
- Bit allocation is performed on the virtual loudspeaker signal group according to the bit number of the virtual loudspeaker signal group, and bit allocation is performed on the residual signal group according to the bit number of the residual signal group.
- the encoding end can perform bit allocation for the virtual speaker signal group and the residual signal group respectively, so as to determine the Bit allocation results for the virtual loudspeaker signal group and bit allocation results for the residual signal group. For example, the encoding end obtains the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group, and then combines the total number of transmission channel bits to determine the bit number of the virtual speaker signal group and the residual signal group respectively.
- the number of bits, the number of bits of the virtual loudspeaker signal group indicates the actual number of bits that the encoder can allocate to the virtual speaker signal group
- the number of bits of the residual signal group indicates the actual number of bits that the encoder can allocate to the residual signal group.
- bit numbers The number of bits in the residual signal group, including:
- the number of bits for a virtual loudspeaker signal group is calculated as follows:
- F_bitnum is the number of bits of the virtual speaker signal group
- Ratio1 is the bit allocation ratio of the virtual speaker signal group
- C_bitnum is the total number of transmission channel bits
- the number of bits of the residual signal group is calculated as follows:
- D_bitnum Ratio2*C_bitnum
- D_bitnum is the number of bits of the residual signal group
- Ratio2 is the bit allocation ratio of the residual signal group
- C_bitnum is the total number of transmission channel bits.
- the encoding end can predetermine the total number of transmission channel bits, and there is no limit to the value of the total transmission channel bit number.
- the encoding end can calculate the number of bits of the virtual loudspeaker signal group and the residual signal group through the above calculation formula The number of bits, realizes the bit allocation problem for the virtual speaker signal and the residual signal at the encoding end.
- the above calculation formula is only an achievable way and is not a limitation to the embodiment of the present application.
- the number of bits of the virtual loudspeaker signal group and the number of bits of the residual signal group can be calculated by the above formula, and can also be calculated by
- the preset adjustment factor adjusts the value of the number of bits of the virtual loudspeaker signal group and the number of bits of the residual signal group to obtain a final value, and the above calculation process is not limited.
- the method performed at the encoding end may also include the following steps:
- the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group can be encoded into the code stream, and the encoding end sends the code stream to the decoding end, so that the decoding end parses the code stream to decode
- the terminal can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group through the code stream, and the decoding end can obtain the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group.
- the number of bits allocated to the virtual loudspeaker signal group and the number of bits allocated to the residual signal are obtained, so that the code stream can be decoded to obtain a three-dimensional audio signal.
- encoding the transmission channel signal, the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group may specifically include directly encoding the transmission channel signal, or first encoding
- the transmission channel signal is processed. After the virtual speaker signal and residual signal are obtained, the virtual speaker signal and residual signal are encoded.
- the encoding end can be a core encoder. and the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group to obtain a code stream.
- the code stream may also be referred to as an audio signal coded code stream.
- the processing method of the three-dimensional audio signal provided by the embodiment of the present application may include: an audio encoding method and an audio decoding method, wherein the audio encoding method is performed by an audio encoding device, the audio decoding method is performed by an audio decoding device, and the audio encoding device and the audio decoding device communication between them is possible.
- the aforementioned FIG. 4 is executed by the audio encoding device.
- the processing method of the three-dimensional audio signal performed by the audio decoding device hereinafter referred to as the decoding end
- FIG. 5 it mainly includes the following steps:
- the decoding end receives the code stream from the encoding end.
- the bit stream carries the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group.
- the decoding end parses the code stream, and obtains the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group from the code stream, and the bit allocation proportion of the virtual speaker signal group and the bit allocation of the residual signal group
- the ratio is obtained by the encoder according to the embodiment shown in FIG. 4 above.
- the decoding end uses the bit allocation proportion of the virtual speaker signal group and the bit allocation proportion of the residual signal group to analyze
- the code stream is used to obtain the decoded three-dimensional audio signal.
- the decoding end can determine the number of bits allocated to the virtual speaker signal and the number of bits allocated to the residual signal through the bit allocation ratio of the virtual speaker signal group and the bit allocation ratio of the residual signal group.
- the decoding method corresponding to the coding method of the terminal is decoded, so as to obtain the 3D audio signal sent by the coding terminal, and realize the transmission of the 3D audio signal from the coding terminal to the decoding terminal.
- the decoding end can determine the number of bits allocated to the virtual speaker signal and the number of bits allocated to the residual signal according to the bit allocation ratio of the virtual speaker signal group and the bit allocation ratio of the residual signal group transmitted in the code stream, which solves the problem of The problem that the decoding end cannot determine the allocated bits of the signal.
- step 503 decodes the virtual speaker signal and the residual signal in the code stream according to the bit allocation ratio of the virtual speaker signal group and the bit allocation ratio of the residual signal group, including:
- the number of bits in the residual signal group is determined according to the number of available bits and the bit allocation ratio of the residual signal group; and the residual signal in the code stream is decoded according to the number of bits in the residual signal group.
- the decoding end first determines the number of available bits, which is the total number of bits that can be allocated to the transmission channel.
- the decoding end can obtain the bit allocation ratio of the virtual speaker signal group by analyzing the code stream, so that the bit number of the virtual speaker signal group can be determined according to the available bits and the bit allocation ratio of the virtual speaker signal group.
- the number is the number of bits used by the encoding end to encode the virtual speaker signal group, and the decoding end can also decode the virtual speaker signal in the bit stream according to the bit number of the virtual speaker signal group, so that the decoding end can decode the virtual speaker signal from the bit stream. Speaker signal.
- the decoder can obtain the bit allocation ratio of the residual signal group by analyzing the code stream, so that the number of bits in the residual signal group can be determined according to the number of available bits and the bit allocation ratio of the residual signal group.
- the number of bits in the group is the number of bits used by the encoding end to encode the residual signal group, and the decoding end can also decode the residual signal in the code stream according to the number of bits in the residual signal group, so that the decoding end can obtain the residual signal from the code stream Decode the residual signal.
- bit allocation ratio parameters between groups include : The bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group. bitsRatio occupies 4 bits, indicating the parameter of the bit allocation ratio within the group.
- the parameter of the bit allocation ratio within the group includes: the bit allocation ratio of each virtual speaker signal group in all virtual speaker signal groups, and the ratio of each residual signal group in all residual The bit allocation ratio within the signal group.
- the decoding end may include a bit allocation module.
- the main function of the bit allocation module is to allocate the remaining available bits after removing other side information to each transmission channel according to the bit allocation ratio parameter obtained by decoding in the code stream. Among them, other The encoding of side information also takes up bits.
- availableBits the number of available bits remaining in the current frame after deducting other side information, which is recorded as availableBits.
- the general algorithm for calculating availableBits is expressed as follows:
- bitsPerFrame is the initial number of bits per frame
- bitsUsed is the number of bits occupied before the bit allocation.
- It may represent the bit distribution ratio of the virtual loudspeaker signal group in all transmission channel signals, or may represent the bit distribution ratio of the residual signal group in all transmission channel signals.
- groupBytes represents the total allocated bits of the virtual speaker signal group.
- groupBytes represents the total number of allocated bits of the residual signal group.
- the number of bits of each group of channels can be calculated.
- the decoding end can also calculate the bit allocation ratio of the virtual loudspeaker signal group and the bit allocation ratio of the residual signal group in a similar manner to the encoding end, for example, using the aforementioned calculation process of Ratio1 and Ratio2, which is not described here. Let me repeat.
- the three-dimensional audio signal is taken as an example of the HOA signal.
- the embodiment of the present application provides a bit allocation method for the virtual speaker signal and the residual signal. First, the virtual speaker signal and the residual signal are grouped, and then according to the signal characteristics The bit allocation ratio between groups is obtained by summing the sound field characteristics, and finally the channel bit allocation is realized.
- the purpose of the embodiment of the present application is to obtain the bit allocation result of the transmission channel signal, and the transmission channel signal is composed of a virtual loudspeaker signal and a residual signal.
- the transmission channel signals are divided into groups of virtual loudspeaker signals and residual signal groups.
- the bit allocation ratio between groups is obtained, and then the bit number of the virtual loudspeaker signal group and the bit number of the residual signal group are obtained through the total number of bits.
- the encoder encodes at a certain rate, the total number of bits allocated to each frame is determined.
- the bits are allocated under the number of available bits in this frame. For example, in the constant bitrate encoding mode (constant bitrate, CBR), the code rate is 384kbps, and the number of bits per frame is about 7680 bits at this time, and the actual number of available bits is less than 7680 bits. Bits are allocated.
- the coding efficiency of the virtual loudspeaker is high, for example, when the number of heterogeneous sound sources is less than or equal to the number of transmission channels of the virtual loudspeaker signal, to increase the number of coding bits of the virtual loudspeaker signal, by increasing the number of virtual loudspeaker signal groups
- the inter-bit allocation ratio is obtained.
- the number of encoded bits of the virtual loudspeaker signal and the number of encoded bits of the residual signal can conform to the actual situation of the sound field classification of the current frame, which solves the problem of the need to determine the number of encoded bits of the virtual loudspeaker signal when encoding the current frame.
- the problem of the number of coding bits of the residual signal is the problem of the number of coding bits of the residual signal.
- the embodiment of the present application is in the core codec, and the execution flow of the core codec will be described next.
- the HOA signal to be encoded is subjected to HOA space encoding to obtain the transmission channel signal and attribute information.
- the transmission channel signal includes: a virtual loudspeaker signal and a residual signal
- the attribute information is the aforementioned transmission single-channel attribute information, including sound field classification results and virtual speaker coding efficiency ⁇ .
- the sound field classification result includes the number of different sound sources, or the sound field classification result includes the number of different sound sources and the type of sound field;
- the virtual speaker encoding efficiency ⁇ represents the efficiency of reconstructing the HOA signal using a virtual speaker in the current frame.
- norm() is a norm operation
- SNt is the MDCT coefficient of the t-th channel of the original HOA signal
- t is (HOA order + 1) 2 .
- Virtual loudspeaker coding efficiency ⁇ sum(R)/sum(N); sum(R) means the summation of R1-Rt, and sum(N) means the summation of N1-Nt.
- the transmission channel signals are grouped, assuming that the transmission channel signals are composed of M virtual speaker signals and N residual signals. Further, the N residual signals may be divided into K groups. If the M virtual speaker signals are divided into one group, the transmission channels are divided into K+1 groups. The number of channels in each group may be the same or different, and the grouping of each frame may be the same or different, which will not affect the subsequent process of the embodiment of the present application.
- K is equal to 2 as an example. It is not limited that the value of K may also be 3 or other values, which are not limited here.
- the number of virtual speakers included in the virtual speaker signal group is equal to 2
- the number of residual signals included in residual signal group 1 is equal to 4
- the number of residual signals included in residual signal group 2 is equal to 5.
- step S2 the following steps S21 to S23 are included.
- the method in S1 can be used to calculate the energy characterization value of each channel, and then add the channel energy characterization values in each group to obtain the energy characterization value of each group, for example, the energy characterization value of the virtual speaker signal group is F, and the residual signal group 1 energy The characteristic value is D1, and the energy characteristic value of residual signal group 2 is D2.
- the bit allocation ratio between the transmission channel groups is determined, assuming that the bit allocation ratio of the virtual loudspeaker signal group is Ratio1, and the residual signal group 1
- the bit allocation ratio is Ratio2
- the residual signal group 2 bit allocation ratio is Ratio3.
- the virtual speaker signal group energy ratio directionalNrgRatio and/or the virtual speaker coding efficiency ⁇ determine that the current frame virtual speaker signal group bit allocation is dominant, it is necessary to increase the virtual speaker signal group bit allocation ratio, and the residual signal group The proportion of bit allocation is reduced. Different adjustment methods can be selected to increase the bit allocation proportion of the virtual loudspeaker signal group under different preset conditions.
- the judging condition includes the loudspeaker signal group energy ratio directionalNrgRatio, and/or the virtual loudspeaker coding flag Flag.
- the virtual speaker encoding flag is obtained by the following method:
- Flag strongly dominant (High).
- the judging conditions may include the following conditions 1 to 6.
- Ratio1 FAC1*directionalNrgRatio+(1-FAC1)*maxdirectionalNrgRatio.
- maxdirectionalNrgRatio is a preset maximum virtual loudspeaker signal group bit allocation ratio
- FAC1 is a preset first adjustment factor, 0 ⁇ FAC1 ⁇ 0.5.
- limit security bits to Ratio1 for example:
- Ratio1 min(Ratio1, maxdirectionalNrgRatio+FAC2*Ratio1).
- FAC2 is a preset second adjustment factor, 0 ⁇ FAC2 ⁇ 0.5.
- Ratio2 (1-Ratio1)*residual signal group 1 channel number/(residual signal group 1 channel number+residual signal group 2 channel number);
- Ratio3 (1-Ratio1)*the number of channels in the residual signal group 2/(the number of channels in the residual signal group 1+the number of channels in the residual signal group 2).
- TH0 is the number of codec matching virtual speakers or the number of codec virtual speaker signals.
- TH0 2.
- 0.8 ⁇ TH1 ⁇ 1, for example TH2 0.875. It can be considered that the bit allocation of the virtual loudspeaker signal group is strongly dominant. At this time, the bit allocation ratio between the transmission channel groups is adjusted as follows:
- Ratio1 FAC3*directionalNrgRatio+(1-FAC3)*maxdirectionalNrgRatio.
- maxdirectionalNrgRatio is the ratio of bit allocation of preset virtual loudspeaker signal groups
- FAC3 is a preset third adjustment factor, 0 ⁇ FAC3 ⁇ 0.5; FAC3>FAC1.
- limit security bits to Ratio1 for example:
- Ratio1 min(Ratio1, maxdirectionalNrgRatio+TH8FAC4*Ratio1).
- FAC4 is a preset fourth adjustment factor, 0 ⁇ FAC4 ⁇ 0.5, FAC4 ⁇ FAC2;
- Ratio2 (1-Ratio1)*residual signal group 1 channel number/(residual signal group 1 channel number+residual signal group 2 channel number);
- Ratio3 (1-Ratio1)*the number of channels in the residual signal group 2/(the number of channels in the residual signal group 1+the number of channels in the residual signal group 2).
- Ratio1 directionalNrgRatio.
- Ratio2 D1/(F+D1+D2).
- Ratio3 D2/(F+D1+D2).
- limit security bits to Ratio1, Ratio2, Ratio3, for example:
- Ratio1 FAC5*groupBitsRatio1+(1–FAC5)*Ratio1;
- Ratio2 FAC6*groupBitsRatio2+(1–FAC6)*Ratio2;
- Ratio3 FAC7*groupBitsRatio3+(1–FAC7)*Ratio3;
- groupBitsRatio1, groupBitsRatio2, and groupBitsRatio3 are respectively the proportion of the preset virtual speaker signal group bit allocation, the preset residual signal group 1 bit allocation proportion, the preset residual signal group 2 bit allocation proportion, and FAC5 is the preset first Five adjustment factors, 0.5 ⁇ FAC5 ⁇ 1, FAC6 is the preset sixth adjustment factor, 0.5 ⁇ FAC6 ⁇ 1, FAC7 is the preset seventh adjustment factor, 0.5 ⁇ FAC7 ⁇ 1, FAC5, FAC6, FAC7 can be equal May not be equal.
- Ratio1, Ratio2 and Ratio3 After the above-mentioned Ratio1, Ratio2 and Ratio3 are obtained, Ratio1, Ratio2 and Ratio3 can be quantized and written into the code stream.
- step S3 is an optional step, and the execution sequence of step S3 may be before step S2 or after step S2.
- the number of bits in each group is determined by the proportion of bit allocation among groups in step S2 and the total number of available bits, for example:
- the number of bits of the virtual loudspeaker signal group Ratio1 * the total number of available bits.
- Number of bits in one residual signal group Ratio2*total number of available bits.
- the number of bits in the residual signal group 2 Ratio3 * the total number of available bits.
- determining the number of bits of each channel can be implemented in various ways, such as performing bit allocation according to the energy ratio of each channel.
- the decoding end receives the bit stream sent by the encoding end, and then parses Ratio1, Ratio2, and Ratio3 from the bit stream, and then can perform bit allocation to the transmission channel signal, for example, bit allocation to the transmission channel signal can be obtained in the aforementioned step S4.
- Each channel bit number method is described in detail below.
- the encoding end of the embodiment of the present application can group the transmission channels, and determine the group bit allocation ratio according to the energy of the virtual loudspeaker signal group, the number of different sound sources, and the reconstructed HOA signal.
- the adjustment of the allocation ratio between groups can be realized through the above-mentioned various conditions. Therefore, in the embodiment of the present application, the bit allocation efficiency of the transmission channel can be effectively improved.
- a processing device for a three-dimensional audio signal is specifically an audio coding device 700, which may include: a coding module 701, a bit allocation ratio determination module 702, wherein,
- An encoding module configured to spatially encode the three-dimensional audio signal to be encoded to obtain transmission channel signals and transmission channel attribute information, wherein the transmission channel signals include: at least one virtual speaker signal group and at least one residual signal group;
- a bit allocation proportion determining module configured to determine the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group according to the transmission channel attribute information.
- a processing device for a three-dimensional audio signal provided by an embodiment of the present application, for example, the processing device for a three-dimensional audio signal is specifically an audio decoding device 800, which may include: a receiving module 801, a decoding module 802 and a signal Generate module 803, wherein,
- the receiving module is used to receive code stream
- a decoding module configured to decode the code stream to obtain the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group;
- a signal generating module configured to decode the virtual speaker signal and the residual signal in the code stream according to the bit allocation proportion of the virtual loudspeaker signal group and the bit allocation proportion of the residual signal group, and obtain the decoded 3D audio signal.
- the embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes some or all of the steps described in the above method embodiments.
- the audio coding device 900 includes:
- a receiver 901 , a transmitter 902 , a processor 903 and a memory 904 (the number of processors 903 in the audio encoding device 900 can be one or more, one processor is taken as an example in FIG. 9 ).
- the receiver 901 , the transmitter 902 , the processor 903 and the memory 904 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 9 .
- the memory 904 may include read-only memory and random-access memory, and provides instructions and data to the processor 903 .
- a part of the memory 904 may also include a non-volatile random access memory (non-volatile random access memory, NVRAM).
- NVRAM non-volatile random access memory
- the memory 904 stores operating systems and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
- the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
- the processor 903 controls the operation of the audio encoding device, and the processor 903 may also be called a central processing unit (central processing unit, CPU).
- CPU central processing unit
- various components of the audio encoding device are coupled together through a bus system, wherein the bus system may include a power bus, a control bus, and a status signal bus, etc. in addition to a data bus.
- the various buses are referred to as bus systems in the figures.
- the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 903 or implemented by the processor 903 .
- the processor 903 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 903 or instructions in the form of software.
- the above-mentioned processor 903 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
- the storage medium is located in the memory 904, and the processor 903 reads the information in the memory 904, and completes the steps of the above method in combination with its hardware.
- the receiver 901 can be used to receive input digital or character information, and generate signal input related to the relevant settings and function control of the audio encoding device.
- the transmitter 902 can include a display device such as a display screen, and the transmitter 902 can be used to output through an external interface. Numeric or character information.
- the processor 903 is configured to execute the method performed by the audio encoding device shown in FIG. 4 of the foregoing embodiment.
- the audio decoding device 1000 includes:
- a receiver 1001 , a transmitter 1002 , a processor 1003 and a memory 1004 (the number of processors 1003 in the audio decoding device 1000 can be one or more, one processor is taken as an example in FIG. 10 ).
- the receiver 1001 , the transmitter 1002 , the processor 1003 and the memory 1004 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 10 .
- the memory 1004 may include read-only memory and random-access memory, and provides instructions and data to the processor 1003 . A portion of memory 1004 may also include NVRAM.
- the memory 1004 stores operating systems and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
- the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
- the processor 1003 controls the operation of the audio decoding device, and the processor 1003 may also be referred to as a CPU.
- various components of the audio decoding device are coupled together through a bus system, wherein the bus system may include a power bus, a control bus, and a status signal bus, etc. in addition to a data bus.
- the various buses are referred to as bus systems in the figures.
- the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 1003 or implemented by the processor 1003 .
- the processor 1003 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 1003 or instructions in the form of software.
- the aforementioned processor 1003 may be a general processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
- the storage medium is located in the memory 1004, and the processor 1003 reads the information in the memory 1004, and completes the steps of the above method in combination with its hardware.
- the processor 1003 is configured to execute the method performed by the audio decoding device shown in FIG. 5 of the foregoing embodiment.
- the chip when the audio encoding device or the audio decoding device is a chip in the terminal, the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example Input/output interface, pin or circuit, etc.
- the processing unit may execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the audio encoding method of any one of the above-mentioned first aspect, or the audio decoding method of any one of the second aspect.
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read -only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
- ROM read-only memory
- RAM random access memory
- the processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling program execution of the method of the first aspect or the second aspect.
- the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
- the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application .
- a computer device which can be a personal computer, a server, or a network device, etc.
- all or part of them may be implemented by software, hardware, firmware or any combination thereof.
- software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wired eg, coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
- the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims (27)
- 一种三维音频信号的处理方法,其特征在于,包括:对待编码的三维音频信号进行空间编码,以得到传输通道信号和传输通道属性信息,其中,所述传输通道信号包括:至少一个虚拟扬声器信号组和至少一个残差信号组;根据所述传输通道属性信息确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比。
- 根据权利要求1所述的方法,其特征在于,所述传输通道属性信息包括:虚拟扬声器编码效率;所述对待编码的三维音频信号进行空间编码,以得到传输通道属性信息,包括:采用虚拟扬声器对所述待编码的三维音频信号进行信号重建,以得到重建后的三维音频信号;获取所述重建后的三维音频信号的能量表征值,以及所述待编码的三维音频信号的能量表征值;根据所述重建后的三维音频信号的能量表征值,以及所述待编码的三维音频信号的能量表征值,获取所述虚拟扬声器编码效率。
- 根据权利要求1或2所述的方法,其特征在于,所述传输通道属性信息包括:所述虚拟扬声器信号组的能量占比;所述方法还包括:根据所述虚拟扬声器信号组中每个虚拟扬声器信号的能量表征值获取所述虚拟扬声器信号组的能量表征值;根据所述残差信号组中每个残差信号的能量表征值获取所述残差信号组的能量表征值;根据所述虚拟扬声器信号组的能量表征值和所述残差信号组的能量表征值,获取所述虚拟扬声器信号组的能量占比。
- 根据权利要求1所述的方法,其特征在于,所述传输通道属性信息包括:虚拟扬声器编码标识,所述虚拟扬声器编码标识用于指示所述虚拟扬声器信号组的比特分配是否占优;所述对待编码的三维音频信号进行空间编码,以得到传输通道属性信息,包括:所述对待编码的三维音频信号进行空间编码,以得到所述传输通道信号的相异性声源数量和虚拟扬声器编码效率;根据所述传输通道信号的相异性声源数量和所述虚拟扬声器编码效率获取所述虚拟扬声器编码标识。
- 根据权利要求4所述的方法,其特征在于,所述根据所述传输通道信号的相异性声源数量和所述虚拟扬声器编码效率获取所述虚拟扬声器编码标识,包括:当所述传输通道信号的相异性声源数量小于或等于预设的相异性声源数量阈值,且所述虚拟扬声器编码效率大于或等于预设的第一虚拟扬声器编码效率阈值时,确定所述虚拟扬声器编码标识为占优;或当所述传输通道信号的相异性声源数量大于预设的相异性声源数量阈值,或所述虚拟 扬声器编码效率小于预设的第一虚拟扬声器编码效率阈值时,确定所述虚拟扬声器编码标识为不占优。
- 根据权利要求5所述的方法,其特征在于,所述占优包括次占优或强占优;所述确定所述虚拟扬声器编码标识为占优,包括:当所述虚拟扬声器编码效率大于或等于所述第一虚拟扬声器编码效率阈值、且所述虚拟扬声器编码效率小于或等于预设的第二虚拟扬声器编码效率阈值时,确定所述虚拟扬声器编码标识为次占优;或当所述虚拟扬声器编码效率大于或等于所述第一虚拟扬声器编码效率阈值、且所述虚拟扬声器编码效率大于预设的第二虚拟扬声器编码效率阈值时,确定所述虚拟扬声器编码标识为强占优;其中,所述第二虚拟扬声器编码效率阈值大于所述第一虚拟扬声器编码效率阈值。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述传输通道属性信息包括:所述虚拟扬声器信号组的能量占比,和/或虚拟扬声器编码标识;所述根据所述传输通道属性信息确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比,包括:当所述虚拟扬声器信号组的能量占比大于或等于预设的第一能量占比阈值,和/或所述虚拟扬声器编码标识为强占优时,按照预设的第一信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比;当所述虚拟扬声器信号组的能量占比大于或等于预设的第二能量占比阈值且小于预设的第一能量占比阈值,和/或所述虚拟扬声器编码标识为次占优时,按照预设的第二信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比;其中,所述第二能量占比阈值小于所述第一能量占比阈值;或当所述虚拟扬声器信号组的能量占比小于预设的第一能量占比阈值,或所述虚拟扬声器编码标识为不占优时,按照预设的第三信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比。
- 根据权利要求7所述的方法,其特征在于,所述当所述虚拟扬声器信号组的能量占比大于或等于预设的第一能量占比阈值,和/或所述虚拟扬声器编码标识为强占优时,按照预设的第一信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比,包括:当满足directionalNrgRatio≥TH1,和/或,S≤TH0且η>TH2时,通过如下方式计算所述虚拟扬声器信号组的比特分配占比:Ratio1_1=FAC1*directionalNrgRatio+(1–FAC1)*maxdirectionalNrgRatio;其中,所述directionalNrgRatio表示所述虚拟扬声器信号组的能量占比,所述S为所述相异性声源数量,所述η表示所述虚拟扬声器编码效率,所述maxdirectionalNrgRatio为预设的最大虚拟扬声器信号组比特分配占比,所述FAC1为预设的第一调整因子,所述Ratio1_1为所述虚拟扬声器信号组的比特分配占比,所述*表示相乘运算,所述TH1为所述第一能量占比阈值,所述TH0为所述相异性声源数量阈值,所述TH2为所述第二虚拟扬声器编码效率阈值;通过如下方式计算所述残差信号组的比特分配占比:Ratio2=1-Ratio1_1;其中,所述Ratio1_1为所述虚拟扬声器信号组的比特分配占比,所述Ratio2为所述残差信号组的比特分配占比。
- 根据权利要求8所述的方法,其特征在于,获取所述虚拟扬声器信号组的比特分配占比之后,所述方法还包括:通过如下方式对所述虚拟扬声器信号组的比特分配占比进行更新:Ratio1_2=min(Ratio1_1,maxdirectionalNrgRatio+FAC2*Ratio1_1)其中,所述Ratio1_2表示更新后的虚拟扬声器信号组的比特分配占比,所述FAC2为预设的第二调整因子,所述maxdirectionalNrgRatio为预设的最大虚拟扬声器信号组比特分配占比,所述Ratio1_1为更新前的虚拟扬声器信号组的比特分配占比,所述*表示相乘运算,所述min为取最小值运算。
- 根据权利要求7所述的方法,其特征在于,所述当所述虚拟扬声器信号组的能量占比大于或等于预设的第二能量占比阈值且小于预设的第一能量占比阈值,和/或所述虚拟扬声器编码标识为次占优时,按照预设的第二信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比;其中,所述第二能量占比阈值小于所述第一能量占比阈值,包括:当满足TH3≤directionalNrgRatio<TH1,和/或,满足S≤TH0且TH4≤η≤TH2时,通过如下方式计算Ratio1_1:Ratio1_1=FAC3*directionalNrgRatio+(1–FAC3)*maxdirectionalNrgRatio;其中,所述maxdirectionalNrgRatio为预设虚拟扬声器信号组比特分配占比,所述FAC3为预设的第三调整因子,所述directionalNrgRatio表示所述虚拟扬声器信号组的能量占比,所述S为所述相异性声源数量,所述η表示所述虚拟扬声器编码效率,所述Ratio1_1为所述虚拟扬声器信号组的比特分配占比,所述*表示相乘运算,所述TH0为所述相异性声源数量阈值,所述TH1为所述第一能量占比阈值,所述TH2为所述第二虚拟扬声器编码效率阈值,所述TH3为所述第二能量占比阈值,所述TH4为所述第一虚拟扬声器编码效率阈值;通过如下方式计算所述残差信号组的比特分配占比:Ratio2=1-Ratio1_1;其中,所述Ratio1_1为所述虚拟扬声器信号组的比特分配占比,所述Ratio2为所述残差信号组的比特分配占比。
- 根据权利要求10所述的方法,其特征在于,获取所述虚拟扬声器信号组的比特分配占比之后,所述方法还包括:通过如下方式对所述虚拟扬声器信号组的比特分配占比进行更新:Ratio1_2=min(Ratio1_1,maxdirectionalNrgRatio+FAC4*Ratio1_1)其中,所述Ratio1_2表示更新后的虚拟扬声器信号组的比特分配占比,所述FAC4为预设的第四调整因子,所述maxdirectionalNrgRatio为预设的最大虚拟扬声器信号组比特分配占比,所述Ratio1_1为更新前的虚拟扬声器信号组的比特分配占比,所述*表示相乘 运算,所述min为取最小值运算。
- 根据权利要求8至11中任一项所述的方法,其特征在于,所述方法还包括:所述残差信号组为多个,通过如下方式计算第i个残差信号组的比特分配占比:Ratio2_i=Ratio2*(R_i/C);其中,所述R_i表示第i个残差信号组包括的传输通道个数,所述C为所有残差信号组的总传输通道个数,所述Ratio2_i为所述第i个残差信号组的比特分配占比,所述*表示相乘运算,所述Ratio2为所有残差信号组的比特分配占比。
- 根据权利要求7所述的方法,其特征在于,所述当所述虚拟扬声器信号组的能量占比小于预设的第一能量占比阈值,或所述虚拟扬声器编码标识为不占优时,按照预设的第三信号组比特分配算法确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比,包括:当满足directionalNrgRatio<TH3,或,满足S>TH0,或η<TH4时,通过如下方式计算所述虚拟扬声器信号组的比特分配占比:Ratio1_1=directionalNrgRatio;其中,所述directionalNrgRatio表示所述虚拟扬声器信号组的能量占比,所述Ratio1_1为所述虚拟扬声器信号组的比特分配占比,所述TH3为所述第二能量占比阈值,所述TH4为所述第一虚拟扬声器编码效率阈值,所述S为所述相异性声源数量,所述η表示所述虚拟扬声器编码效率,所述TH0为所述相异性声源数量阈值;通过如下方式计算所述残差信号组的比特分配占比:Ratio2_1=D/(F+D);其中,所述Ratio2_1为所述残差信号组的比特分配占比,所述F表示所述虚拟扬声器信号组的能量表征值,所述D为所述残差信号组的能量表征值。
- 根据权利要求13所述的方法,其特征在于,所述方法还包括:获取所述虚拟扬声器信号组的比特分配占比之后,通过如下方式对所述虚拟扬声器信号组的比特分配占比进行更新:当Ratio1_1<groupBitsRatio1时,Ratio1_2=groupBitsRatio1;当Ratio1_1≥groupBitsRatio1时,Ratio1_2=FAC5*groupBitsRatio1+(1–FAC5)*Ratio1_1;其中,所述Ratio1_2表示更新后的虚拟扬声器信号组的比特分配占比,所述FAC5为预设的第五调整因子,所述Ratio1_1为更新前的虚拟扬声器信号组的比特分配占比,所述*表示相乘运算,所述groupBitsRatio1为预设的虚拟扬声器信号组比特分配占比;获取所述残差信号组的比特分配占比之后,通过如下方式对所述残差信号组的比特分配占比进行更新:当Ratio2_1<groupBitsRatio2时,Ratio2_2=groupBitsRatio2;当Ratio2_1≥groupBitsRatio2时,Ratio2_2=FAC6*groupBitsRatio2+(1–FAC6)*Ratio2_1;其中,所述Ratio2_2表示更新后的残差信号组的比特分配占比,所述FAC6为预设的第六调整因子,所述Ratio2_1为更新前的残差信号组的比特分配占比,所述*表示相乘运 算,所述groupBitsRatio2为预设的残差信号组比特分配占比。
- 根据权利要求1至14中任一项所述的方法,其特征在于,所述方法还包括:根据所述虚拟扬声器信号组的比特分配占比、所述残差信号组的比特分配占比和总的传输通道比特数,分别确定所述虚拟扬声器信号组的比特数、所述残差信号组的比特数;根据所述虚拟扬声器信号组的比特数对所述虚拟扬声器信号组进行比特分配,以及根据所述残差信号组的比特数对所述残差信号组进行比特分配。
- 根据权利要求15所述的方法,其特征在于,所述根据所述所述虚拟扬声器信号组的比特分配占比、所述残差信号组的比特分配占比和总的传输通道比特数,分别确定所述虚拟扬声器信号组的比特数、所述残差信号组的比特数,包括:通过如下方式计算虚拟扬声器信号组的比特数:F_bitnum=Ratio1*C_bitnum;其中,所述F_bitnum为所述虚拟扬声器信号组的比特数,所述Ratio1为所述虚拟扬声器信号组的比特分配占比,所述C_bitnum为总的传输通道比特数;通过如下方式计算所述残差信号组的比特数:D_bitnum=Ratio2*C_bitnum;其中,所述D_bitnum为所述残差信号组的比特数,所述Ratio2为所述残差信号组的比特分配占比,所述C_bitnum为总的传输通道比特数。
- 根据权利要求1至16中任一项所述的方法,其特征在于,所述方法还包括:对所述传输通道信号、所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比进行编码,并写入码流。
- 一种三维音频信号的处理方法,其特征在于,包括:接收码流;解码所述码流以获得虚拟扬声器信号组的比特分配占比和残差信号组的比特分配占比;根据所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比对所述码流中的虚拟扬声器信号和残差信号进行解码,获得解码后的三维音频信号。
- 根据权利要求18所述的方法,其特征在于,所述根据所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比对所述码流中的虚拟扬声器信号和残差信号进行解码,包括:根据所述码流确定可用比特数;根据所述可用比特数和所述虚拟扬声器信号组的比特分配占比确定所述虚拟扬声器信号组的比特数;根据所述虚拟扬声器信号组的比特数对所述码流中的虚拟扬声器信号进行解码;根据所述可用比特数和所述残差信号组的比特分配占比确定所述残差信号组的比特数;根据所述残差信号组的比特数对所述码流中的残差信号进行解码。
- 一种三维音频信号的处理装置,其特征在于,包括:编码模块,用于对待编码的三维音频信号进行空间编码,以得到传输通道信号和传输通道属性信息,其中,所述传输通道信号包括:至少一个虚拟扬声器信号组和至少一个残 差信号组;比特分配占比确定模块,用于根据所述传输通道属性信息确定所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比。
- 一种三维音频信号的处理装置,其特征在于,包括:接收模块,用于接收码流;解码模块,用于解码所述码流以获得虚拟扬声器信号组的比特分配占比和残差信号组的比特分配占比;信号生成模块,用于根据所述虚拟扬声器信号组的比特分配占比和所述残差信号组的比特分配占比对所述码流中的虚拟扬声器信号和残差信号进行解码,获得解码后的三维音频信号。
- 一种三维音频信号的处理装置,其特征在于,所述三维音频信号的处理装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求1至17中任一项所述的方法。
- 根据权利要求22所述的三维音频信号的处理装置,其特征在于,所述三维音频信号的处理装置还包括:所述存储器。
- 一种三维音频信号的处理装置,其特征在于,所述三维音频信号的处理装置包括至少一个处理器,所述至少一个处理器用于与存储器耦合,读取并执行所述存储器中的指令,以实现如权利要求18至19中任一项所述的方法。
- 根据权利要求24所述的三维音频信号的处理装置,其特征在于,所述音频解码装置还包括:所述存储器。
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至17、或者18至19中任意一项所述的方法。
- 一种计算机可读存储介质,包括如权利要求1至17任一项所述的方法所生成的码流。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237044825A KR20240013221A (ko) | 2021-06-11 | 2022-06-01 | 3차원 오디오 신호 처리 방법 및 장치 |
EP22819422.1A EP4354430A4 (en) | 2021-06-11 | 2022-06-01 | METHOD AND DEVICE FOR PROCESSING THREE-DIMENSIONAL AUDIO SIGNALS |
US18/532,085 US20240112684A1 (en) | 2021-06-11 | 2023-12-07 | Three-dimensional audio signal processing method and apparatus |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110657283 | 2021-06-11 | ||
CN202110657283.7 | 2021-06-11 | ||
CN202110700570.1A CN115472170A (zh) | 2021-06-11 | 2021-06-23 | 一种三维音频信号的处理方法和装置 |
CN202110700570.1 | 2021-06-23 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/532,085 Continuation US20240112684A1 (en) | 2021-06-11 | 2023-12-07 | Three-dimensional audio signal processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022257824A1 true WO2022257824A1 (zh) | 2022-12-15 |
Family
ID=84363426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/096546 WO2022257824A1 (zh) | 2021-06-11 | 2022-06-01 | 一种三维音频信号的处理方法和装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240112684A1 (zh) |
EP (1) | EP4354430A4 (zh) |
KR (1) | KR20240013221A (zh) |
CN (1) | CN115472170A (zh) |
WO (1) | WO2022257824A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118800257A (zh) * | 2023-04-13 | 2024-10-18 | 华为技术有限公司 | 场景音频解码方法及电子设备 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264533A (zh) * | 1997-07-16 | 2000-08-23 | 多尔拜实验特许公司 | 多声道低比特率编码解码方法和设备 |
CN101030379A (zh) * | 2007-03-26 | 2007-09-05 | 北京中星微电子有限公司 | 一种数字音频信号比特分配的方法和装置 |
CN102859584A (zh) * | 2009-12-17 | 2013-01-02 | 弗劳恩霍弗实用研究促进协会 | 用以将第一参数式空间音频信号转换成第二参数式空间音频信号的装置与方法 |
CN103489450A (zh) * | 2013-04-07 | 2014-01-01 | 杭州微纳科技有限公司 | 基于时域混叠消除的无线音频压缩、解压缩方法及其设备 |
CN105637582A (zh) * | 2013-10-17 | 2016-06-01 | 株式会社索思未来 | 音频编码装置及音频解码装置 |
CN107493542A (zh) * | 2012-08-31 | 2017-12-19 | 杜比实验室特许公司 | 用于在听音环境中播放音频内容的扬声器系统 |
CN110728986A (zh) * | 2018-06-29 | 2020-01-24 | 华为技术有限公司 | 立体声信号的编码方法、解码方法、编码装置和解码装置 |
CN112513980A (zh) * | 2018-05-31 | 2021-03-16 | 诺基亚技术有限公司 | 空间音频参数信令 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20140128565A (ko) * | 2013-04-27 | 2014-11-06 | 인텔렉추얼디스커버리 주식회사 | 오디오 신호 처리 방법 및 장치 |
-
2021
- 2021-06-23 CN CN202110700570.1A patent/CN115472170A/zh active Pending
-
2022
- 2022-06-01 KR KR1020237044825A patent/KR20240013221A/ko unknown
- 2022-06-01 EP EP22819422.1A patent/EP4354430A4/en active Pending
- 2022-06-01 WO PCT/CN2022/096546 patent/WO2022257824A1/zh active Application Filing
-
2023
- 2023-12-07 US US18/532,085 patent/US20240112684A1/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264533A (zh) * | 1997-07-16 | 2000-08-23 | 多尔拜实验特许公司 | 多声道低比特率编码解码方法和设备 |
CN101030379A (zh) * | 2007-03-26 | 2007-09-05 | 北京中星微电子有限公司 | 一种数字音频信号比特分配的方法和装置 |
CN102859584A (zh) * | 2009-12-17 | 2013-01-02 | 弗劳恩霍弗实用研究促进协会 | 用以将第一参数式空间音频信号转换成第二参数式空间音频信号的装置与方法 |
CN107493542A (zh) * | 2012-08-31 | 2017-12-19 | 杜比实验室特许公司 | 用于在听音环境中播放音频内容的扬声器系统 |
CN103489450A (zh) * | 2013-04-07 | 2014-01-01 | 杭州微纳科技有限公司 | 基于时域混叠消除的无线音频压缩、解压缩方法及其设备 |
CN105637582A (zh) * | 2013-10-17 | 2016-06-01 | 株式会社索思未来 | 音频编码装置及音频解码装置 |
CN112513980A (zh) * | 2018-05-31 | 2021-03-16 | 诺基亚技术有限公司 | 空间音频参数信令 |
CN110728986A (zh) * | 2018-06-29 | 2020-01-24 | 华为技术有限公司 | 立体声信号的编码方法、解码方法、编码装置和解码装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4354430A4 |
Also Published As
Publication number | Publication date |
---|---|
EP4354430A4 (en) | 2024-07-24 |
KR20240013221A (ko) | 2024-01-30 |
CN115472170A (zh) | 2022-12-13 |
EP4354430A1 (en) | 2024-04-17 |
US20240112684A1 (en) | 2024-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230298600A1 (en) | Audio encoding and decoding method and apparatus | |
WO2022262576A1 (zh) | 三维音频信号编码方法、装置、编码器和系统 | |
WO2022237851A1 (zh) | 一种音频编码、解码方法及装置 | |
WO2022257824A1 (zh) | 一种三维音频信号的处理方法和装置 | |
US20240087580A1 (en) | Three-dimensional audio signal coding method and apparatus, and encoder | |
US20240105187A1 (en) | Three-dimensional audio signal processing method and apparatus | |
WO2024146408A1 (zh) | 场景音频解码方法及电子设备 | |
CN115376529B (zh) | 三维音频信号编码方法、装置和编码器 | |
WO2024212895A1 (zh) | 场景音频信号的解码方法和装置 | |
WO2024212898A1 (zh) | 场景音频信号的编码方法和装置 | |
WO2022242481A1 (zh) | 三维音频信号编码方法、装置和编码器 | |
WO2022242479A1 (zh) | 三维音频信号编码方法、装置和编码器 | |
WO2024212894A1 (zh) | 场景音频信号的解码方法和装置 | |
WO2024212638A1 (zh) | 场景音频解码方法及电子设备 | |
WO2024212896A1 (zh) | 场景音频信号的解码方法和装置 | |
WO2024212897A1 (zh) | 场景音频信号的解码方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22819422 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202337083725 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022819422 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20237044825 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020237044825 Country of ref document: KR |
|
ENP | Entry into the national phase |
Ref document number: 2022819422 Country of ref document: EP Effective date: 20231220 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |