Disclosure of Invention
An object of the embodiments of the present invention is to provide a method, an apparatus and an electronic device for processing audio data, so as to make the latest immersive audio system compatible with the traditional 5.1 channels, and ideally to use the advantages of more channels to render better immersive effects.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for processing audio data, including:
acquiring a target audio in a target format;
extracting a first audio unit which represents that a target object emits sound in the target audio, wherein the target corresponds to an object with altitude angle information;
acquiring the altitude angle and the horizontal angle of the target object;
performing HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit;
generating an audio signal of the first audio unit from the HOA coding information of the first audio unit.
According to an embodiment of the present invention, after extracting the video unit, the method further includes:
removing the first audio unit from the target audio to obtain a second audio unit;
performing HOA coding on the second audio unit to obtain HOA coding information of the second audio unit;
generating an audio signal of the second audio unit from the HOA coding information of the second audio unit;
and performing new signal combination on the audio signal of the first audio unit and the audio signal of the second audio unit.
According to an embodiment of the present invention, obtaining the altitude angle and the horizontal angle corresponding to the first audio unit includes:
obtaining the horizontal angle of the target object according to the audio content and volume of each sound channel when the 5.1 sound system is used for playing the first audio unit;
and obtaining the altitude angle corresponding to the target according to the type of the target object and the preset altitude angle-object type corresponding relation.
According to one embodiment of the invention, the target object comprises thunder and rain.
In a second aspect, an embodiment of the present invention further provides an apparatus for processing audio data, including:
the acquisition module is used for acquiring a target audio in a target format;
the control processing module is used for extracting a first audio unit which represents that a target object emits sound in the target audio, and the target corresponds to an object with altitude angle information; the control processing module is further configured to obtain an altitude angle and a horizontal angle of the target object, perform HOA encoding on the first audio unit according to the altitude angle and the horizontal angle, obtain HOA encoding information of the first audio unit, and generate an audio signal of the first audio unit according to the HOA encoding information of the first audio unit.
According to an embodiment of the present invention, the control processing module is further configured to remove the first audio unit from the target audio to obtain a second audio unit, then perform HOA encoding on the second audio unit to obtain HOA encoding information of the second audio unit, then generate an audio signal of the second audio unit according to the HOA encoding information of the second audio unit, and finally perform audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.
According to an embodiment of the present invention, the control processing module is specifically configured to obtain a horizontal angle of the target object according to the audio content and volume of each channel when the 5.1 sound system is used to play the first audio unit, and obtain an elevation angle corresponding to the target according to the type of the target object and a preset correspondence between an elevation angle and an object type.
According to one embodiment of the invention, the target objects include lightning strikes and rain.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions for being executed to perform the method according to the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
the audio data processing method, the audio data processing device and the electronic equipment provided by the embodiment of the invention can enable the latest immersive audio system to be compatible with the traditional 5.1 sound channel, and meanwhile, the problem of better immersive effect can be better rendered by utilizing the advantages of more channels.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Fig. 1 is a flowchart of an audio data processing method according to an embodiment of the present invention. As shown in fig. 1, a method for processing audio data according to an embodiment of the present invention includes:
s1: and acquiring the target audio in the target format. Wherein the target format is audio of a 5.1 channel system. The target audio may be audio information of a movie work.
S2: a first audio unit representing the sound emitted by the target object in the target audio is extracted. Wherein the target corresponds to an object having altitude angle information.
In particular, the theater auditorium (e.g., IMAX) may have an immersive sensory experience for the patrons by providing more channels in addition to the multiple channels of the 5.1 channel system. Therefore, the present embodiment extracts audio information corresponding to the object having the height information.
In this embodiment, a model for identifying object audio, such as a neural network model, may be trained in advance, and by inputting target audio into the trained model, which part of audio is indicative of the sound emitted by the target object may be identified by the model, such as which audio is thunder, which audio is raining, and so on. The present embodiment extracts this part of the audio as the first audio unit.
S3: and acquiring the altitude angle and the horizontal angle of the target object.
In one embodiment of the present invention, step S3 includes:
s3-1: the horizontal angle of the target object is derived from the volume of each channel when the first audio unit is played using the 5.1 sound system.
Fig. 2 is a top view block diagram of a 5.1 channel system in one example of the invention. As shown in FIG. 2, the horizontal angles of left channel L, right channel R, center channel C, left surround LS, and right surround RS are-30 degrees, 0 degrees, -110 degrees, and 110 degrees, respectively. The present embodiment obtains the horizontal angle of the target object according to the volume of each channel when the first audio unit is played.
S3-2: and obtaining the altitude angle corresponding to the target according to the type of the target object and the preset altitude angle-object type corresponding relation. For example, the target object is thunder or rain, and if the altitude angles corresponding to the thunder and the rain are both 90 degrees, which is preset in the altitude angle-object type correspondence relationship, the altitude angle corresponding to the target object is 90 degrees at this time.
S4: and carrying out HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit.
Fig. 3 is a schematic diagram of a WXYZ coordinate system in one example of the invention. As shown in fig. 3, in this embodiment, 5 channels are set as audio objects according to the above positions, and are converted into scene audio by an ambisonic transcoding method.
Wherein Si is single-channel audio signal, phi horizontal angle azimuth, theta vertical angle elevation
The audio object or the single-channel audio signal can be conveniently converted into the ambisonic signal through the formula.
FOA sound fields can only provide limited spatial resolution, and Higher Order ambisonic HOAs (high Order Ambisonics) can provide high quality sound field acquisition and rendering based on Spherical Harmonic functions (Spherical Harmonic functions).
Where j is a spherical Bessel function, a is a spherical harmonic function, Y is a spherical harmonic function, PnmRepresenting the associated legendre function.
S5: an audio signal for the first audio unit is generated based on the HOA encoding information for the first audio unit.
In an embodiment of the present invention, after step S2, the method further includes: removing the first audio unit from the target audio to obtain a second audio unit; HOA coding is carried out on the second audio unit to obtain HOA coding information of the second audio unit; generating an audio signal of a second audio unit from HOA encoding information of the second audio unit; and carrying out audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit. The present embodiment may convert not only audio units with an elevation angle into HOA encoding, but also audio units without an elevation angle into HOA encoding.
The audio data processing method provided by the embodiment of the invention can enable the latest immersive audio system to be compatible with the traditional 5.1 sound channel, and simultaneously, ideally, the problem of rendering better immersive effect by using the advantages of more channels is solved.
Fig. 4 is a block diagram of an audio data processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus for processing audio data according to the embodiment of the present invention includes: an acquisition module 100 and a control processing module 200.
The obtaining module 100 is configured to obtain a target audio in a target format. The control processing module 200 is configured to extract a first audio unit in the target audio that represents a sound emitted by a target object, where the target corresponds to an object having altitude information. The control processing module 200 is further configured to obtain an altitude angle and a horizontal angle of the target object, perform HOA encoding on the first audio unit according to the altitude angle and the horizontal angle, obtain HOA encoding information of the first audio unit, and generate an audio signal of the first audio unit according to the HOA encoding information of the first audio unit.
In an embodiment of the present invention, the control processing module 200 is further configured to remove the first audio unit from the target audio to obtain a second audio unit, perform HOA encoding on the second audio unit to obtain HOA encoding information of the second audio unit, generate an audio signal of the second audio unit according to the HOA encoding information of the second audio unit, and perform audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.
In an embodiment of the present invention, the control processing module 200 is specifically configured to obtain a horizontal angle of the target object according to the audio content and volume of each channel when the 5.1 sound system is used to play the first audio unit, and obtain an elevation angle corresponding to the target according to the type of the target object and a preset correspondence between the elevation angle and the object type.
In one embodiment of the invention, the target object includes thunder and rain.
It should be noted that, the specific implementation of the apparatus for processing audio data in the embodiment of the present invention is similar to the specific implementation of the method for processing audio data in the embodiment of the present invention, and specific reference is specifically made to the description of the method for processing audio data, and details are not repeated for reducing redundancy.
In addition, other configurations and functions of the audio data processing apparatus according to the embodiment of the present invention are known to those skilled in the art, and are not described in detail for reducing redundancy.
An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for processing audio data according to the first aspect.
The disclosed embodiments of the present invention provide a computer-readable storage medium having stored therein computer program instructions, which, when run on a computer, cause the computer to execute the above-described audio data processing method.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.