CN112584297B

CN112584297B - Audio data processing method and device and electronic equipment

Info

Publication number: CN112584297B
Application number: CN202011387031.9A
Authority: CN
Inventors: 徐涛; 孙学京; 董强国; 周令非; 刘知一; 张辉
Original assignee: Beijing Tuoling Inc; CHINA FILM SCIENCE AND TECHNOLOGY INST
Current assignee: Beijing Tuoling Inc; China Film Science And Technology Research Institute Film Technology Quality Inspection Institute Of Central Propaganda Department
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2022-04-08
Anticipated expiration: 2040-12-01
Also published as: CN112584297A

Abstract

The embodiment of the invention discloses a method and a device for processing audio data and electronic equipment, wherein the processing method comprises the following steps: acquiring a target audio in a target format; extracting a first audio unit which represents that a target object emits sound in the target audio, wherein the target corresponds to an object with altitude angle information; acquiring the altitude angle and the horizontal angle of the target object; performing HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit; generating an audio signal of the first audio unit from the HOA coding information of the first audio unit. The invention can make the latest immersive audio system compatible with the traditional 5.1 sound channel, and simultaneously, ideally, the problem of rendering better immersive effect by using the advantages of more channels is solved.

Description

Audio data processing method and device and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of audio processing, in particular to a method and a device for processing audio data and electronic equipment.

Background

In recent years, with the development of high-definition video, from 2K to 4K, even 8K, and with the development of virtual reality VR and AR, the requirement for audio hearing has been increased. People no longer satisfy the sound effects of stereo sound, 5.1, 7.1 and the like which are popular for many years, and pursue 3D sound effects or immersive sound effects which have more immersion and reality. At present, immersive audio processing is mainly based on channel-based audio (CBA), object-based audio (OBA), and Ambisonics scene audio (SBA) technologies, including technologies of audio production, codec, packaging, and rendering.

Ambisonics utilizes spherical harmonic function record sound field and drive speaker, can put high quality at speaker central point and rebuild original sound field, compares traditional 5.1 etc. and can build more smooth, smooth listening sense, supports 360x360 all-round immersive audio frequencies based on the surround sound technique of sound channel, supports the height, provides better telepresence, more accurate space orientation. Ambisonic supports sound field interactive operation such as head tracking, rotation, displacement and the like, and provides a good technical framework for the next generation audio system.

There are a large amount of conventional 5.1 channel content on the market today, and how to make the latest immersive audio system compatible with the conventional 5.1 channel is an urgent problem to be solved, and it is more desirable to use the advantages of more channels to render better immersive effects.

Disclosure of Invention

An object of the embodiments of the present invention is to provide a method, an apparatus and an electronic device for processing audio data, so as to make the latest immersive audio system compatible with the traditional 5.1 channels, and ideally to use the advantages of more channels to render better immersive effects.

In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:

in a first aspect, an embodiment of the present invention provides a method for processing audio data, including:

acquiring a target audio in a target format;

extracting a first audio unit which represents that a target object emits sound in the target audio, wherein the target corresponds to an object with altitude angle information;

acquiring the altitude angle and the horizontal angle of the target object;

performing HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit;

generating an audio signal of the first audio unit from the HOA coding information of the first audio unit.

According to an embodiment of the present invention, after extracting the video unit, the method further includes:

removing the first audio unit from the target audio to obtain a second audio unit;

performing HOA coding on the second audio unit to obtain HOA coding information of the second audio unit;

generating an audio signal of the second audio unit from the HOA coding information of the second audio unit;

and performing new signal combination on the audio signal of the first audio unit and the audio signal of the second audio unit.

According to an embodiment of the present invention, obtaining the altitude angle and the horizontal angle corresponding to the first audio unit includes:

obtaining the horizontal angle of the target object according to the audio content and volume of each sound channel when the 5.1 sound system is used for playing the first audio unit;

and obtaining the altitude angle corresponding to the target according to the type of the target object and the preset altitude angle-object type corresponding relation.

According to one embodiment of the invention, the target object comprises thunder and rain.

In a second aspect, an embodiment of the present invention further provides an apparatus for processing audio data, including:

the acquisition module is used for acquiring a target audio in a target format;

the control processing module is used for extracting a first audio unit which represents that a target object emits sound in the target audio, and the target corresponds to an object with altitude angle information; the control processing module is further configured to obtain an altitude angle and a horizontal angle of the target object, perform HOA encoding on the first audio unit according to the altitude angle and the horizontal angle, obtain HOA encoding information of the first audio unit, and generate an audio signal of the first audio unit according to the HOA encoding information of the first audio unit.

According to an embodiment of the present invention, the control processing module is further configured to remove the first audio unit from the target audio to obtain a second audio unit, then perform HOA encoding on the second audio unit to obtain HOA encoding information of the second audio unit, then generate an audio signal of the second audio unit according to the HOA encoding information of the second audio unit, and finally perform audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.

According to an embodiment of the present invention, the control processing module is specifically configured to obtain a horizontal angle of the target object according to the audio content and volume of each channel when the 5.1 sound system is used to play the first audio unit, and obtain an elevation angle corresponding to the target according to the type of the target object and a preset correspondence between an elevation angle and an object type.

According to one embodiment of the invention, the target objects include lightning strikes and rain.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions for being executed to perform the method according to the first aspect.

The technical scheme provided by the embodiment of the invention at least has the following advantages:

the audio data processing method, the audio data processing device and the electronic equipment provided by the embodiment of the invention can enable the latest immersive audio system to be compatible with the traditional 5.1 sound channel, and meanwhile, the problem of better immersive effect can be better rendered by utilizing the advantages of more channels.

Drawings

Fig. 1 is a flowchart of an audio data processing method according to an embodiment of the present invention.

Fig. 2 is a top view block diagram of a 5.1 channel system in one example of the invention.

Fig. 3 is a schematic diagram of a WXYZ coordinate system in one example of the invention.

Fig. 4 is a block diagram of an audio data processing apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Fig. 1 is a flowchart of an audio data processing method according to an embodiment of the present invention. As shown in fig. 1, a method for processing audio data according to an embodiment of the present invention includes:

s1: and acquiring the target audio in the target format. Wherein the target format is audio of a 5.1 channel system. The target audio may be audio information of a movie work.

S2: a first audio unit representing the sound emitted by the target object in the target audio is extracted. Wherein the target corresponds to an object having altitude angle information.

In particular, the theater auditorium (e.g., IMAX) may have an immersive sensory experience for the patrons by providing more channels in addition to the multiple channels of the 5.1 channel system. Therefore, the present embodiment extracts audio information corresponding to the object having the height information.

In this embodiment, a model for identifying object audio, such as a neural network model, may be trained in advance, and by inputting target audio into the trained model, which part of audio is indicative of the sound emitted by the target object may be identified by the model, such as which audio is thunder, which audio is raining, and so on. The present embodiment extracts this part of the audio as the first audio unit.

S3: and acquiring the altitude angle and the horizontal angle of the target object.

In one embodiment of the present invention, step S3 includes:

s3-1: the horizontal angle of the target object is derived from the volume of each channel when the first audio unit is played using the 5.1 sound system.

Fig. 2 is a top view block diagram of a 5.1 channel system in one example of the invention. As shown in FIG. 2, the horizontal angles of left channel L, right channel R, center channel C, left surround LS, and right surround RS are-30 degrees, 0 degrees, -110 degrees, and 110 degrees, respectively. The present embodiment obtains the horizontal angle of the target object according to the volume of each channel when the first audio unit is played.

S3-2: and obtaining the altitude angle corresponding to the target according to the type of the target object and the preset altitude angle-object type corresponding relation. For example, the target object is thunder or rain, and if the altitude angles corresponding to the thunder and the rain are both 90 degrees, which is preset in the altitude angle-object type correspondence relationship, the altitude angle corresponding to the target object is 90 degrees at this time.

S4: and carrying out HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit.

Fig. 3 is a schematic diagram of a WXYZ coordinate system in one example of the invention. As shown in fig. 3, in this embodiment, 5 channels are set as audio objects according to the above positions, and are converted into scene audio by an ambisonic transcoding method.

Wherein Si is single-channel audio signal, phi horizontal angle azimuth, theta vertical angle elevation

The audio object or the single-channel audio signal can be conveniently converted into the ambisonic signal through the formula.

FOA sound fields can only provide limited spatial resolution, and Higher Order ambisonic HOAs (high Order Ambisonics) can provide high quality sound field acquisition and rendering based on Spherical Harmonic functions (Spherical Harmonic functions).

Where j is a spherical Bessel function, a is a spherical harmonic function, Y is a spherical harmonic function, P_nmRepresenting the associated legendre function.

S5: an audio signal for the first audio unit is generated based on the HOA encoding information for the first audio unit.

In an embodiment of the present invention, after step S2, the method further includes: removing the first audio unit from the target audio to obtain a second audio unit; HOA coding is carried out on the second audio unit to obtain HOA coding information of the second audio unit; generating an audio signal of a second audio unit from HOA encoding information of the second audio unit; and carrying out audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit. The present embodiment may convert not only audio units with an elevation angle into HOA encoding, but also audio units without an elevation angle into HOA encoding.

The audio data processing method provided by the embodiment of the invention can enable the latest immersive audio system to be compatible with the traditional 5.1 sound channel, and simultaneously, ideally, the problem of rendering better immersive effect by using the advantages of more channels is solved.

Fig. 4 is a block diagram of an audio data processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus for processing audio data according to the embodiment of the present invention includes: an acquisition module 100 and a control processing module 200.

The obtaining module 100 is configured to obtain a target audio in a target format. The control processing module 200 is configured to extract a first audio unit in the target audio that represents a sound emitted by a target object, where the target corresponds to an object having altitude information. The control processing module 200 is further configured to obtain an altitude angle and a horizontal angle of the target object, perform HOA encoding on the first audio unit according to the altitude angle and the horizontal angle, obtain HOA encoding information of the first audio unit, and generate an audio signal of the first audio unit according to the HOA encoding information of the first audio unit.

In an embodiment of the present invention, the control processing module 200 is further configured to remove the first audio unit from the target audio to obtain a second audio unit, perform HOA encoding on the second audio unit to obtain HOA encoding information of the second audio unit, generate an audio signal of the second audio unit according to the HOA encoding information of the second audio unit, and perform audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.

In an embodiment of the present invention, the control processing module 200 is specifically configured to obtain a horizontal angle of the target object according to the audio content and volume of each channel when the 5.1 sound system is used to play the first audio unit, and obtain an elevation angle corresponding to the target according to the type of the target object and a preset correspondence between the elevation angle and the object type.

In one embodiment of the invention, the target object includes thunder and rain.

It should be noted that, the specific implementation of the apparatus for processing audio data in the embodiment of the present invention is similar to the specific implementation of the method for processing audio data in the embodiment of the present invention, and specific reference is specifically made to the description of the method for processing audio data, and details are not repeated for reducing redundancy.

In addition, other configurations and functions of the audio data processing apparatus according to the embodiment of the present invention are known to those skilled in the art, and are not described in detail for reducing redundancy.

An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for processing audio data according to the first aspect.

The disclosed embodiments of the present invention provide a computer-readable storage medium having stored therein computer program instructions, which, when run on a computer, cause the computer to execute the above-described audio data processing method.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of processing audio data, comprising:

acquiring a target audio in a target format; the target format is audio of a 5.1 channel system;

extracting a first audio unit which represents that a target object emits sound in the target audio, wherein the target object is an object with elevation angle information;

acquiring the altitude angle and the horizontal angle of the target object; specifically, the horizontal angle of the target object is obtained according to the audio content and volume of each channel when the first audio unit is played by using a 5.1 sound system;

obtaining the altitude angle of the target object according to the type of the target object and the preset corresponding relation between the altitude angle and the object type; performing HOA coding on the first audio unit according to the altitude angle and the horizontal angle to obtain HOA coding information of the first audio unit;

2. The method according to claim 1, wherein after extracting the first audio unit representing the target object emitting sound in the target audio, the method further comprises:

and carrying out audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.

3. The audio data processing method according to claim 1, wherein the target object includes thunder and rain.

4. An apparatus for processing audio data, comprising:

the acquisition module is used for acquiring a target audio in a target format; the target format is audio of a 5.1 channel system;

the control processing module is used for extracting a first audio unit which represents that a target object emits sound in the target audio, and the target object is an object with altitude angle information; the control processing module is further configured to obtain an altitude angle and a horizontal angle of the target object, perform HOA encoding on the first audio unit according to the altitude angle and the horizontal angle, obtain HOA encoding information of the first audio unit, and generate an audio signal of the first audio unit according to the HOA encoding information of the first audio unit;

specifically, the horizontal angle of the target object is obtained according to the audio content and volume of each channel when the first audio unit is played by using a 5.1 sound system;

and obtaining the altitude angle of the target object according to the type of the target object and the preset corresponding relation between the altitude angle and the object type.

5. The apparatus as claimed in claim 4, wherein the control processing module is further configured to remove the first audio unit from the target audio to obtain a second audio unit, then perform HOA encoding on the second audio unit to obtain HOA encoding information of the second audio unit, then generate an audio signal of the second audio unit according to the HOA encoding information of the second audio unit, and finally perform audio signal synthesis on the audio signal of the first audio unit and the audio signal of the second audio unit.

6. The apparatus according to claim 4, wherein the control processing module is specifically configured to obtain the horizontal angle of the target object according to the audio content and volume of each channel when the first audio unit is played using a 5.1 sound system, and obtain the elevation angle of the target object according to the corresponding relationship between the type of the target object and a preset elevation angle-object type.

7. The apparatus for processing audio data according to claim 4, wherein the target object includes thunder and rain.

8. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;

the memory is to store one or more program instructions;

the processor, operable to execute one or more program instructions to perform the method of processing audio data according to any one of claims 1 to 3.

9. A computer-readable storage medium containing one or more program instructions for executing the method of processing audio data according to any one of claims 1 to 3.