Nothing Special   »   [go: up one dir, main page]

CN118264971B - Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method - Google Patents

Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method Download PDF

Info

Publication number
CN118264971B
CN118264971B CN202410694626.0A CN202410694626A CN118264971B CN 118264971 B CN118264971 B CN 118264971B CN 202410694626 A CN202410694626 A CN 202410694626A CN 118264971 B CN118264971 B CN 118264971B
Authority
CN
China
Prior art keywords
audio
sound
virtual
ear
received
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410694626.0A
Other languages
Chinese (zh)
Other versions
CN118264971A (en
Inventor
谭波
刘少鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lingjing Acoustic Technology Co ltd
Suzhou Lingjing Av Technology Co ltd
Original Assignee
Shanghai Lingjing Acoustic Technology Co ltd
Suzhou Lingjing Av Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lingjing Acoustic Technology Co ltd, Suzhou Lingjing Av Technology Co ltd filed Critical Shanghai Lingjing Acoustic Technology Co ltd
Priority to CN202410694626.0A priority Critical patent/CN118264971B/en
Publication of CN118264971A publication Critical patent/CN118264971A/en
Application granted granted Critical
Publication of CN118264971B publication Critical patent/CN118264971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/301Automatic calibration of stereophonic sound system, e.g. with test microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/307Frequency adjustment, e.g. tone control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

The invention discloses a space audio system based on a loudspeaker, an audio processor, a vehicle, a virtual surround sound conversion method and an audio rendering method, wherein the method for converting space audio into virtual surround sound on the premise of not increasing the number of sound channels comprises the following steps: receiving initial audio stream data; performing a spatial rendering operation, comprising: presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space; simulating sound propagation of the audio in the virtual space by using an HRTF algorithm, and simulating sound propagation of ears in the virtual space; analyzing the sound difference received by the left ear and the right ear; estimating the position of a virtual sound source in a virtual space according to the sound difference received by the left ear and the right ear; after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information. By simulating the sound field and the environmental reaction, the user can feel the positioning and environmental sense of the audio source in the virtual space, and the immersion sense and the realism of the audio experience are enhanced.

Description

Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method
Technical Field
The present invention relates to the field of audio signal processing, and in particular, to a speaker-based spatial audio system, an audio processor, a vehicle, a virtual surround sound conversion method, and an audio rendering method.
Background
At present, mobile terminals such as mobile phones, tablet computers, notebook computers and other devices mostly adopt stereophonic playing devices, wherein stereophonic playing devices refer to devices capable of generating sound effects of left and right channels, and the sounds of the left and right channels are respectively input into left and right ears so as to simulate the stereophonic effects. These devices typically include an audio source (e.g., music or video files), an audio processor (e.g., a decoder or processing chip), and speakers.
Conventional stereo playback devices typically employ a two-speaker arrangement, one speaker for producing left channel audio and another speaker for producing right channel audio. This arrangement allows the user to feel the distribution of sound sources in the left-right direction, thereby achieving a certain stereo effect. The advantage of this technology is that it is simple, low cost, and suitable for most consumer electronics devices.
Although stereo playback devices have achieved significant improvements, there are still some disadvantages:
Space limitations: traditional stereo layouts are limited in space when simulating stereo effects. Since only two speakers are used, a real three-dimensional sound field effect cannot be provided; under the layout and acoustic limitations of speakers in small devices (e.g., smartphones, notebook computers, etc.), a broad and immersive sound field effect cannot be provided, which means that users cannot feel the true positioning and surrounding sense of the sound source, resulting in limitations of audio experience;
Comfort of experience: at present, earphone wearing is needed to experience surrounding sense for content in formats such as apple spatial audio and dolby panoramic sound of a mobile terminal, and long-time earphone wearing can cause uncomfortable feeling to human ears;
subjective difference: the hearing characteristics and preferences of each person are different, and the perception and satisfaction of the stereo effect are different, so it is a challenge to design a general stereo system that can meet all the user needs;
Restriction of mobile devices: mobile devices (e.g., cell phones and tablet computers) typically have smaller size and limited speaker layouts, which can limit their ability to provide stereo effects; built-in speakers used in small devices are often limited in size, power and design, resulting in limited acoustic performance, which may result in insufficient clarity of sound quality, low resolution, and inability to accurately reproduce the details and dynamic range of the audio signal;
environmental interference: noise and acoustic reflections in the external environment may interfere with the stereo effect, reducing the quality of the audio experience;
multichannel requirements: conventional stereo systems typically require multiple speaker channels to achieve the immersion and surround sound effects, which requires the user to purchase expensive surround sound equipment systems and faces certain complexities and limitations in installation and configuration.
The above disclosure of background art is only for aiding in understanding the inventive concept and technical solution of the present application, and it does not necessarily belong to the prior art of the present patent application, nor does it necessarily give technical teaching; the above background should not be used to assess the novelty and creativity of the present application in the event that no clear evidence indicates that such is already disclosed prior to the filing date of the present patent application.
Disclosure of Invention
It is an object of the present invention to provide a spatial audio system for converting virtual surround sound that creates a broader and immersive sound field effect without hardware modification of the device.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A spatial audio system based on speaker apparatus, comprising speaker apparatus and an audio processor integrated with HRTF algorithm, the audio processor receiving in real time initial audio stream data in a stereo format or a surround sound format of the speaker apparatus and converting it into virtual surround sound audio based on the premise of not adding channels, comprising:
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
After spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information and returned to the speaker apparatus.
Further, in any one of the above-mentioned aspects or a combination of the above-mentioned aspects, the audio processor separates left and right channel signals of the preprocessed audio in the stereo format, or separates each channel signal of the preprocessed audio in the surround format, to obtain mono audio corresponding to each channel;
Simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
Further, any one or a combination of the foregoing, the position of the virtual sound source in the virtual space is estimated by:
Presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear;
After determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database;
and determining a comprehensive offset vector, and accordingly, offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
Further, in accordance with any one or combination of the foregoing, re-encoding the audio into a virtual surround sound audio including spatial location information includes:
And carrying out enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, wherein the enhancement processing comprises the following steps:
increasing the audio width;
And/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio;
And/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio;
And/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
Further, the speaker apparatus according to any one or a combination of the foregoing aspects, wherein the speaker apparatus is configured with two or more speakers, and a distance between the two speakers is 15cm or more;
the audio processor is integrated on the speaker device; or the speaker device is configured with a client application program, and the audio processor is arranged at a server end or a cloud end corresponding to the client application program.
Further, according to any one or a combination of the foregoing aspects, the virtual space is designed in advance according to a pitch of speakers of the speaker apparatus by using an HRTF algorithm: the larger the distance between the speakers is, the larger the virtual space is;
The speaker device comprises one or more of a mobile phone, a tablet personal computer, a notebook computer, a PC computer, an audio playing terminal, a television and a vehicle-mounted sound device.
Further, the system according to any one or a combination of the foregoing aspects, further comprising a human-computer interaction module electrically connected to the audio processor, configured with one or more of the following adjusting units:
a loudness unification adjustment unit for adjusting a loudness value unifying the audio loudness of the initial audio stream data in each channel;
A virtual space adjusting unit for adjusting the size of the virtual space and/or adjusting the position of the virtual ear in the virtual space, and for resetting the initial setting parameters of the virtual space by one key;
the apparatus includes an ambisonic setting unit for selecting a desired reverberations mode from a plurality of preset ambisonic modes, and the audio processor is configured to match associated channel equalization parameters and mix parameters according to the selected desired reverberations mode, the preset ambisonic modes including a plate reverberations mode, a room reverberations mode, a hall reverberations mode.
Further, any one or a combination of the foregoing, the associated channel equalization parameters and the mixing parameters are preset for the ambisonic mode in advance by:
Operating an audio system of the speaker device multiple times for each spatial reverberation mode;
Evaluating each operation according to a preset evaluation factor to calculate the tone quality score of the operation;
Operating for preset times, and selecting a sound channel equalization parameter and a sound mixing parameter under the operation corresponding to the highest tone quality score as associated optimization parameters; or stopping the operation until the tone quality score reaches a preset optimizing score threshold value, and taking the channel equalization parameter and the mixing parameter under the last operation as related optimizing parameters.
According to another aspect of the present invention, there is provided an audio processor for processing audio to obtain virtual surround sound audio by:
acquiring initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio that includes spatial location information, and its channels are the same as those of the initial audio stream data.
Further, taking the foregoing any one or combination of the foregoing technical solutions, separating left and right channel signals of the preprocessed audio in the stereo format, or separating each channel signal of the preprocessed audio in the surround format, to obtain mono audio corresponding to each channel;
Simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
Further, any one or a combination of the foregoing, the position of the virtual sound source in the virtual space is estimated by: presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear; after determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database; determining a comprehensive offset vector, and accordingly offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space;
Or re-encoding the audio into virtual surround sound audio containing spatial location information includes: and carrying out enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, wherein the enhancement processing comprises the following steps: increasing the audio width; and/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio; and/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio; and/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
According to still another aspect of the present invention, there is provided a virtual surround sound conversion method of spatial audio, which converts spatial audio into virtual surround sound without increasing the number of channels, the method comprising the steps of:
receiving initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
Presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear; simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information.
Further, in any one or a combination of the foregoing aspects, before analyzing the difference between the sound received by the left ear and the sound received by the right ear, the method further includes: separating left and right channel signals of the preprocessed audio in the stereo format or separating channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel; simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
Further, any one or a combination of the foregoing, the position of the virtual sound source in the virtual space is estimated by:
Presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear;
After determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database;
and determining a comprehensive offset vector, and accordingly, offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
Further, any one or a combination of the foregoing, wherein the means for recoding the audio into the virtual surround sound audio including the spatial location information includes:
increasing the audio width;
And/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio;
And/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio;
And/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
According to still another aspect of the present invention, there is provided a spatial rendering method of spatial audio, including the steps of:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of spatial audio to be rendered in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space, wherein the spatial audio is in a stereo format or a surround sound format;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
The audio is re-encoded into virtual surround sound audio containing spatial location information according to the estimated location of the virtual sound source within the virtual space.
In addition, the invention also provides a vehicle which is characterized by comprising the spatial audio system, wherein the speaker equipment contained in the spatial audio system is vehicle-mounted sound equipment.
The technical scheme provided by the invention has the following beneficial effects:
a. sound field improvement: the spatial audio technology of the patent can break the limitation of the traditional sound field, creates wider and immersive sound field effect on the premise of not carrying out hardware transformation on equipment through algorithm and signal processing technology, and can enable a user to feel the positioning and surrounding sense of a sound source in a three-dimensional space and provide more real audio experience;
b. no multi-channel configuration is required: unlike traditional multi-channel configurations, the spatial audio technique of this patent enables better sound field effects without requiring special audio formats or multiple speaker channels, meaning that users can enjoy immersive, clear, balanced sound quality experience without purchasing expensive surround sound equipment systems;
c. the spatial audio algorithm is implanted in the audio processor, and sound from all media sources such as application programs on the speaker device is processed by the algorithm and rendered in real time into virtual surround sound with immersive spatial audio effects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
Fig. 1 is a schematic diagram of a speaker-based spatial audio system provided in an exemplary embodiment of the present invention;
FIG. 2 is a flow chart of a method for virtual surround sound conversion of spatial audio according to an exemplary embodiment of the present invention;
FIG. 3 is a flow chart of estimating a position of a virtual sound source in a virtual space according to an exemplary embodiment of the present invention;
FIG. 4 is a flow chart of a method for spatial rendering of spatial audio according to an exemplary embodiment of the present invention;
FIG. 5 is a schematic diagram of sound field width of untreated audio;
Fig. 6 is a schematic diagram of the width of the audio field after the audio in fig. 5 is converted into virtual surround sound.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
In one embodiment of the present invention, a spatial audio system based on speaker devices is provided, as shown in fig. 1, and in particular, to a spatial audio system based on stereo speaker devices, where the spatial audio system includes a speaker device and an audio processor integrated with an HRTF algorithm, the speaker device has to be configured with more than two speakers, and the distance between the two speakers is preferably more than 15 cm.
The audio processor receives the initial audio stream data in the stereo format or the surround sound format of the speaker device in real time, and converts the initial audio stream data into virtual surround sound based on the premise of not increasing sound channels, the space is greatly beneficial to improving the sound effect converted into virtual surround sound, and the applicable speaker devices comprise mobile phones, tablet computers, notebook computers, PC computers, audio playing terminals, televisions and vehicle-mounted sound.
The spatial audio system of the present embodiment aims to provide an immersive spatial audio effect, so that a user can render and play audio content with more immersive sensation on an existing device in real time, and two real-time schemes can be adopted to realize:
the first real-time scheme is that an audio processor is integrated on the loudspeaker device, and a spatial audio algorithm is implanted into the audio processor;
Another real-time solution is to allow a third party application to be installed for a speaker device, develop a specific client application, and set an audio processor at a server end or cloud end corresponding to the client application, that is, a spatial audio algorithm is implanted on a server of an App.
The flow of the audio processor converting stereo or surround sound into virtual surround sound that enables a more immersive spatial audio experience is specifically shown in fig. 2:
preprocessing the initial audio stream data to unify the audio loudness of each channel, and ensuring that the audio to be processed subsequently accords with the national standard loudness; techniques used include digital filters and dynamic range compression.
Performing spatial rendering operation on the preprocessed audio, including:
The virtual space is designed by utilizing an HRTF algorithm according to the distance between the loudspeakers of the loudspeaker device in advance: the larger the distance between the speakers is, the larger the virtual space is; setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear; the virtual space is a three-dimensional sound environment constructed by HRTF algorithms and data models, which allows sound to propagate in a simulated manner within this space. Such space is defined by using HRTF data to simulate how sound reaches the listener's ears from different directions and distances, thereby creating a stereo or surround sound effect. Sound propagation within the virtual space mimics acoustic properties in the real world, including direction, reflection, attenuation, etc. of sound.
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
After spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information and returned to the speaker apparatus. Once the spatial location of the virtual sound source is determined, the algorithm processes and enhances the audio content based on the location information, including adjusting the direction of sound, distance perception, and environmental reflection, to simulate a real three-dimensional sound field. The algorithm may apply acoustic models, filters, enhancement algorithms, etc. to achieve the generation of spatial audio effects, i.e. re-encoding to obtain virtual surround sound audio containing spatial location information.
The purpose of recoding is to perform enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, and specifically, the following means or a combination thereof can be adopted:
means one: the audio frequency width is increased, the processed audio frequency sound field width is as shown in fig. 6, and compared with the sound field width of the original audio frequency before processing in fig. 5, the wide and immersive sound field effect can be provided;
Means II: simulating the audio signals to be received by the left ear and the right ear in different directions through an HRTF algorithm, and evaluating corresponding audio effects to determine the direction adjustment of the audio; i.e. by using Head Related Transfer Functions (HRTFs) to simulate how sound is received by the human ear in different directions to adjust the sound direction.
Means III: simulating an audio signal to be received by the left ear and the right ear with different volume and filtering effects through an HRTF algorithm, and evaluating the corresponding audio effect to determine the frequency adjustment and the phase adjustment of the audio; the volume and filtering effect of the sound are adjusted to simulate the distance between the sound source and the listener, and the more the sound is, the smaller the volume is, and the high-frequency component is reduced.
Means IV: simulating an audio signal to be received by the left ear and the right ear with different reverberation and echo effects through an HRTF algorithm, and evaluating the corresponding audio effect to determine reflection adjustment of the audio; reverberation and echo effects are used to simulate the reflection characteristics of sound in different environments, increasing the sense of environment of the sound.
Specifically, before analyzing the sound difference received by the left ear and the right ear, the audio processor separates left and right channel signals of the preprocessed audio in the stereo format or separates channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel;
Simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear, wherein the difference of the phases also affects the hearing positioning sense; and/or, carrying out frequency domain analysis on the mono audio, analyzing the frequency spectrum of each channel through an FFT (fast Fourier transform) method, and identifying each frequency component so as to know the frequency distribution and the characteristics of the sound and determine the frequency distribution of the sound received by the left ear and the right ear.
The algorithm deduces that the stereo source position generally depends on the characteristics of the Head Related Transfer Function (HRTF). By analyzing the interaction between the audio signal and the HRTF, the algorithm can simulate how the human ear and brain localize the sound source based on the arrival time differences of the sound, the volume differences, and the frequency filtering effects (caused by factors such as head, ear shape, etc.). These differences are encoded into the audio signal so that the algorithm can infer the position and distance of the sound source in virtual space, thereby reproducing an audio scene with a high spatial impression in the headphones or loudspeakers of the listener. The process of deducing the location of a sound source involves several key steps: first, by analyzing the frequency response and phase information of the sound signal, in combination with HRTF data, how the sound reaches the listener's ears from various directions is simulated. The algorithm considers the filtering effects of the sound before it reaches the listener due to blockage by the head, ears, etc., which effects change the frequency content and phase of the sound. The algorithm then uses these changes to calculate the virtual source direction and distance of the sound. By accurate analysis of the differences in sound received by the left and right ears, including the arrival time difference and the intensity difference, the position of the sound source in three-dimensional space can be deduced.
In a specific embodiment, after analyzing the difference between the sounds received by the left ear and the right ear, the manner of estimating the position of the virtual sound source in the virtual space is provided as shown in fig. 3:
Presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear;
after determining the time difference between the sound received by the left ear and the sound received by the right ear, determining a first offset vector representing the offset of the initialization source point in the direction perpendicular to the perpendicular bisector based on acoustic transmission characteristics stored in an HRTF database;
After the phase difference of the sound received by the left ear and the right ear is determined, a second offset vector is determined based on the acoustic transmission characteristics stored in the HRTF database, wherein the second offset vector represents the offset of the initialization source point on the circumference taking the midpoint between the left ear and the right ear as the center of a circle;
After the frequency distribution of the sound received by the left ear and the right ear is determined, a third offset vector is determined based on the acoustic transmission characteristics stored in the HRTF database, wherein the third offset vector characterizes the offset of the initialization source point in the direction of the center vertical line and the direction vertical to the center vertical line respectively;
And combining the first offset vector, the second offset vector and the third offset vector to determine a comprehensive offset vector, and accordingly offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
In one embodiment, the system further comprises a human-machine interaction module electrically connected to the audio processor, as shown in fig. 1, configured with one or more of the following adjustment units:
a loudness unification adjustment unit for adjusting a loudness value unifying the audio loudness of the initial audio stream data in each channel;
A virtual space adjusting unit for adjusting the size of the virtual space and/or adjusting the position of the virtual ear in the virtual space, and for resetting the initial setting parameters of the virtual space by one key;
the apparatus includes an ambisonic setting unit for selecting a desired reverberations mode from a plurality of preset ambisonic modes, and the audio processor is configured to match associated channel equalization parameters and mix parameters according to the selected desired reverberations mode, the preset ambisonic modes including a plate reverberations mode, a room reverberations mode, a hall reverberations mode.
For the spatial reverberation setting unit, the associated channel equalization parameters and mixing parameters are preset for the spatial reverberation pattern by:
Operating an audio system of the speaker device multiple times for each spatial reverberation mode;
Evaluating each operation according to a preset evaluation factor to calculate the tone quality score of the operation;
Operating for preset times, and selecting a sound channel equalization parameter and a sound mixing parameter under the operation corresponding to the highest tone quality score as associated optimization parameters; or stopping the operation until the tone quality score reaches a preset optimizing score threshold value, and taking the channel equalization parameter and the mixing parameter under the last operation as related optimizing parameters.
The man-machine interaction module can be integrated on the loudspeaker equipment or independent of the loudspeaker equipment, and can realize that a user can customize a dedicated audio scheme according to own hearing characteristics and favorites to form a universal stereo audio system capable of meeting different user demands.
Whether the spatial audio algorithm is implanted on an audio processor of the speaker device or an APP is installed on the speaker device to implement the spatial audio algorithm, real-time rendering and playing of the processed audio content can be implemented, which means that the user can experience the immersive spatial audio effect in real time.
The embodiment of the invention can be suitable for any equipment, including smart phones, tablet computers, notebook computers, PC computers, audio playing terminals, televisions, vehicle-mounted sound equipment and other playing equipment allowing third-party application programs (APP) to be installed, so that a user can realize the improvement of the stereo surround audio effect on the existing equipment without purchasing new special equipment or carrying out hardware transformation.
Through algorithm implantation or APP installation, a user can achieve improvement of the stereo surround audio effect on existing equipment, and extra equipment is not required to be purchased to add multi-channel configuration or perform complex setting.
In one embodiment of the present invention, an audio processor is provided that processes audio to obtain virtual surround sound audio by:
acquiring initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio that includes spatial location information, and its channels are the same as those of the initial audio stream data.
Specifically, separating left and right channel signals of the preprocessed audio in the stereo format or separating channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel;
Simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
Estimating the position of a virtual sound source within the virtual space by: presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear; after the time difference of sound received by the left ear and the right ear is determined, determining an offset vector of the initialization source point in the direction perpendicular to the perpendicular bisector based on acoustic transmission characteristics stored in an HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining an offset vector of the initialization source point on a circumference taking the midpoint between the left ear and the right ear as a circle center based on the acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining offset vectors of the initialization source point in the direction of the center vertical line and the direction vertical to the center vertical line respectively based on acoustic transmission characteristics stored in an HRTF database; and determining a comprehensive offset vector, and accordingly, offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
Re-encoding the audio into virtual surround sound audio containing spatial location information includes: and carrying out enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, wherein the enhancement processing comprises the following steps: increasing the audio width; and/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio; and/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio; and/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
The audio processor provided in this embodiment belongs to the same inventive concept as the spatial audio system provided in the above embodiment, and the entire content of the spatial audio system embodiment is incorporated into this embodiment of the audio processor by way of full reference.
In one embodiment of the present invention, there is provided a virtual surround sound conversion method of spatial audio, which converts the spatial audio into virtual surround sound without increasing the number of channels, the method including the steps of:
receiving initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear; simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
After spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information. Specifically, the manner in which audio is re-encoded into virtual surround sound audio containing spatial location information includes: increasing the audio width; and/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio; and/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio; and/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
The audio processor provided in this embodiment belongs to the same inventive concept as the spatial audio system provided in the above embodiment, that is, before analyzing the difference between the sounds received by the left ear and the right ear, the audio processor further includes: separating left and right channel signals of the preprocessed audio in the stereo format or separating channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel; simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
Estimating the position of a virtual sound source within the virtual space by:
Presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear;
After the time difference of sound received by the left ear and the right ear is determined, determining an offset vector of the initialization source point in the direction perpendicular to the perpendicular bisector based on acoustic transmission characteristics stored in an HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining an offset vector of the initialization source point on a circumference taking the midpoint between the left ear and the right ear as a circle center based on the acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining offset vectors of the initialization source point in the direction of the center vertical line and the direction vertical to the center vertical line respectively based on acoustic transmission characteristics stored in an HRTF database;
and determining a comprehensive offset vector, and accordingly, offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
The entire contents of the spatial audio system embodiments are incorporated herein by reference in its entirety into embodiments of the virtual surround sound conversion method for this spatial audio.
In one embodiment of the present invention, there is provided a spatial rendering method of spatial audio, referring to fig. 4, including the steps of:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of spatial audio to be rendered in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space, wherein the spatial audio is in a stereo format or a surround sound format;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
The audio is re-encoded into virtual surround sound audio containing spatial location information according to the estimated location of the virtual sound source within the virtual space.
The spatial rendering method provided in this embodiment is an important link in the above-mentioned spatial audio virtual surround sound conversion method embodiment, that is, the audio after preprocessing the initial audio stream data received from the speaker device in real time in the above-mentioned spatial audio virtual surround sound conversion method embodiment is the spatial audio to be rendered in this embodiment.
The purpose of the space rendering is to convert the initial audio stream data into virtual surround sound audio containing space position information on the premise of not increasing the number of channels of the initial audio stream data, so that when the loudspeaker device plays the virtual surround sound audio containing the space position information, a user can feel stronger positioning and surround feeling of a sound source in a three-dimensional space, and a more real and immersive audio experience is provided for the user.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely illustrative of the embodiments of this application and it will be appreciated by those skilled in the art that variations and modifications may be made without departing from the principles of the application, and it is intended to cover all modifications and variations as fall within the scope of the application.

Claims (14)

1. A spatial audio system based on speaker apparatus, comprising speaker apparatus and an audio processor integrated with HRTF algorithm, the audio processor receiving in real time initial audio stream data in a stereo format or a surround sound format of the speaker apparatus and converting it into virtual surround sound audio based on the premise of not adding channels, comprising:
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation of the virtual ear in the virtual space, wherein the sound propagation of the virtual ear is received by the virtual ear, and the method comprises the following steps: the audio processor separates left and right channel signals of the preprocessed audio in the stereo format or separates channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel; simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the sound difference received by the left ear and the right ear, comprising: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analyzing the mono audio to determine a frequency distribution of sound received by the left ear and the right ear, wherein the sound differences include one or more of time delay differences, phase differences, and loudness attenuation differences;
Estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the estimating comprises the following steps: presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear; after determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database; determining a comprehensive offset vector, and accordingly offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space, wherein the virtual sound source is defined as a starting point of sound propagated in the virtual space;
After spatial rendering is completed, re-encoding the audio into virtual surround sound audio containing spatial location information, comprising: and carrying out enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, wherein the enhancement processing comprises the following steps: increasing the audio width; and/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio; and/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio; and/or simulating the audio signal to be received by the left ear and the right ear with different reverberation and echo effects by means of an HRTF algorithm, evaluating the corresponding audio effects to determine a reflex adjustment of the audio; and returning the virtual surround sound audio to the speaker apparatus.
2. The speaker-based spatial audio system of claim 1, wherein the speaker apparatus is configured with more than two speakers and the two speakers are spaced apart by more than 15 cm;
the audio processor is integrated on the speaker device; or the speaker device is configured with a client application program, and the audio processor is arranged at a server end or a cloud end corresponding to the client application program.
3. The speaker-based spatial audio system of claim 2, wherein the virtual space is designed in advance according to a pitch of speakers of the speaker device using an HRTF algorithm: the larger the distance between the speakers is, the larger the virtual space is;
The speaker device comprises one or more of a mobile phone, a tablet personal computer, a notebook computer, a PC computer, an audio playing terminal, a television and a vehicle-mounted sound device.
4. A spatial audio system based on a speaker device according to any of claims 1-3, characterized in that the system further comprises a man-machine interaction module electrically connected to the audio processor, configured with one or more of the following adjustment units:
a loudness unification adjustment unit for adjusting a loudness value unifying the audio loudness of the initial audio stream data in each channel;
A virtual space adjusting unit for adjusting the size of the virtual space and/or adjusting the position of the virtual ear in the virtual space, and for resetting the initial setting parameters of the virtual space by one key;
the apparatus includes an ambisonic setting unit for selecting a desired reverberations mode from a plurality of preset ambisonic modes, and the audio processor is configured to match associated channel equalization parameters and mix parameters according to the selected desired reverberations mode, the preset ambisonic modes including a plate reverberations mode, a room reverberations mode, a hall reverberations mode.
5. The speaker device-based spatial audio system as claimed in claim 4, wherein the associated channel equalization parameters and mixing parameters are preset for the spatial reverberation pattern in advance by:
Operating an audio system of the speaker device multiple times for each spatial reverberation mode;
Evaluating each operation according to a preset evaluation factor to calculate the tone quality score of the operation;
Operating for preset times, and selecting a sound channel equalization parameter and a sound mixing parameter under the operation corresponding to the highest tone quality score as associated optimization parameters; or stopping the operation until the tone quality score reaches a preset optimizing score threshold value, and taking the channel equalization parameter and the mixing parameter under the last operation as related optimizing parameters.
6. An audio processor for use in a spatial audio system as claimed in any one of claims 1 to 5, the audio processor processing audio to obtain virtual surround sound audio by:
acquiring initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio that includes spatial location information, and its channels are the same as those of the initial audio stream data.
7. The audio processor of claim 6, wherein the left and right channel signals of the preprocessed audio in the stereo format are separated, or the channel signals of the preprocessed audio in the surround format are separated to obtain mono audio corresponding to each channel;
Simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
8. The audio processor of claim 7, wherein the position of a virtual sound source within the virtual space is estimated by: presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear; after determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database; determining a comprehensive offset vector, and accordingly offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space;
Or re-encoding the audio into virtual surround sound audio containing spatial location information includes: and carrying out enhancement processing on the preprocessed audio according to the estimated position of the virtual sound source in the virtual space, wherein the enhancement processing comprises the following steps: increasing the audio width; and/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio; and/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio; and/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
9. A virtual surround sound conversion method of spatial audio, wherein the conversion of spatial audio into virtual surround sound is achieved without increasing the number of channels based on the spatial audio system according to any one of claims 1 to 5, the method comprising the steps of:
receiving initial audio stream data in a stereo format or a surround sound format;
preprocessing the initial audio stream data to unify the audio loudness of each channel;
Performing spatial rendering operation on the preprocessed audio, including:
Presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear; simulating sound propagation of the preprocessed audio in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
after spatial rendering is completed, the audio is re-encoded into virtual surround sound audio containing spatial location information.
10. The method of claim 9, further comprising, prior to analyzing the difference between the sounds received by the left and right ears: separating left and right channel signals of the preprocessed audio in the stereo format or separating channel signals of the preprocessed audio in the surround format to obtain mono audio corresponding to each channel; simulating the monaural audio corresponding to each sound channel to propagate in the virtual space, and simulating to obtain monaural audio received by the left ear and the right ear;
Analyzing the difference of the sounds received by the left ear and the right ear includes: carrying out time domain analysis on the mono audio to determine the time difference of sound received by the left ear and the right ear; and/or, carrying out phase analysis on the mono audio to determine the phase difference of the sound received by the left ear and the right ear; and/or frequency domain analysis is performed on the mono audio to determine the frequency distribution of the sound received by the left and right ears.
11. The method of virtual surround sound conversion for spatial audio according to claim 10, wherein the position of a virtual sound source within the virtual space is estimated by:
Presetting an initialization source point to be positioned on the middle vertical line of the left ear and the right ear;
After determining the time difference of sound received by the left ear and the right ear, determining a first offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the phase difference of the sound received by the left ear and the right ear, determining a second offset vector based on acoustic transmission characteristics stored in the HRTF database; and/or after determining the frequency distribution of the sound received by the left ear and the right ear, determining a third offset vector based on the acoustic transmission characteristics stored in the HRTF database;
and determining a comprehensive offset vector, and accordingly, offsetting the initialization source point to obtain the position of the virtual sound source in the virtual space.
12. The method of claim 10, wherein the means for re-encoding the audio into virtual surround sound audio containing spatial location information comprises:
increasing the audio width;
And/or simulating the audio signal to be received by the left ear and the right ear in different directions through the HRTF algorithm, and evaluating the corresponding audio effect to determine the direction adjustment of the audio;
And/or, simulating the audio signal to be received by the left ear and the right ear with different sound volumes and filtering effects through the HRTF algorithm, and evaluating the corresponding audio effects to determine the frequency adjustment and the phase adjustment of the audio;
And/or simulating the audio signal to be received by the left and right ears with different reverberation and echo effects by means of HRTF algorithms, evaluating the corresponding audio effects to determine a reflex adjustment of the audio.
13. A spatial rendering method of spatial audio based on a spatial audio system according to any of claims 1 to 5, comprising the steps of:
presetting a virtual space, and setting the azimuth of a virtual ear in the virtual space, wherein the virtual ear comprises a left ear and a right ear;
Simulating sound propagation of spatial audio to be rendered in the virtual space by using an HRTF algorithm, and simulating sound propagation received by the virtual ears in the virtual space, wherein the spatial audio is in a stereo format or a surround sound format;
analyzing sound differences received by the left ear and the right ear, wherein the sound differences comprise one or more of time delay differences, phase differences and loudness attenuation differences;
estimating a position of a virtual sound source in the virtual space according to a sound difference received by the left ear and the right ear, wherein the virtual sound source is defined as a starting point of sound propagating in the virtual space;
The audio is re-encoded into virtual surround sound audio containing spatial location information according to the estimated location of the virtual sound source within the virtual space.
14. A vehicle comprising a spatial audio system according to any one of claims 1 to 5, wherein the spatial audio system comprises a speaker device that is a car audio device.
CN202410694626.0A 2024-05-31 2024-05-31 Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method Active CN118264971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410694626.0A CN118264971B (en) 2024-05-31 2024-05-31 Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410694626.0A CN118264971B (en) 2024-05-31 2024-05-31 Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method

Publications (2)

Publication Number Publication Date
CN118264971A CN118264971A (en) 2024-06-28
CN118264971B true CN118264971B (en) 2024-09-27

Family

ID=91609252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410694626.0A Active CN118264971B (en) 2024-05-31 2024-05-31 Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method

Country Status (1)

Country Link
CN (1) CN118264971B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040862A (en) * 2016-02-03 2017-08-11 腾讯科技(深圳)有限公司 Audio-frequency processing method and processing system
CN114900783A (en) * 2022-04-01 2022-08-12 珠海华章科技有限公司 Method and device for realizing reflective virtual surround sound

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8000485B2 (en) * 2009-06-01 2011-08-16 Dts, Inc. Virtual audio processing for loudspeaker or headphone playback
US20230247384A1 (en) * 2020-07-02 2023-08-03 Sony Group Corporation Information processing device, output control method, and program
CN118020319A (en) * 2021-09-29 2024-05-10 北京字跳网络技术有限公司 System, method and electronic device for spatial audio rendering
CN117037815A (en) * 2023-08-22 2023-11-10 苏州灵境影音技术有限公司 Home theater personalized surround sound generating system and method based on HRTF data information
CN117295004B (en) * 2023-11-22 2024-02-09 苏州灵境影音技术有限公司 Method, device and sound system for converting multichannel surround sound

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040862A (en) * 2016-02-03 2017-08-11 腾讯科技(深圳)有限公司 Audio-frequency processing method and processing system
CN114900783A (en) * 2022-04-01 2022-08-12 珠海华章科技有限公司 Method and device for realizing reflective virtual surround sound

Also Published As

Publication number Publication date
CN118264971A (en) 2024-06-28

Similar Documents

Publication Publication Date Title
EP3197182B1 (en) Method and device for generating and playing back audio signal
JP4921470B2 (en) Method and apparatus for generating and processing parameters representing head related transfer functions
US8532306B2 (en) Method and an apparatus of decoding an audio signal
US8374365B2 (en) Spatial audio analysis and synthesis for binaural reproduction and format conversion
CN105578379B (en) Device and method for generating the output signal at least two output channels
CN110326310B (en) Dynamic equalization for crosstalk cancellation
Gardner Transaural 3-D audio
CN113170271B (en) Method and apparatus for processing stereo signals
Jot et al. Binaural simulation of complex acoustic scenes for interactive audio
GB2471089A (en) Audio processing device using a library of virtual environment effects
CN118264971B (en) Speaker-based spatial audio system, audio processor, vehicle, virtual surround sound conversion method, and audio rendering method
WO2012104297A1 (en) Generation of user-adapted signal processing parameters
Lee et al. A real-time audio system for adjusting the sweet spot to the listener's position
CN113691927B (en) Audio signal processing method and device
CN113194400B (en) Audio signal processing method, device, equipment and storage medium
EP4325485A1 (en) Three-dimensional audio signal encoding method and apparatus, and encoder
US20240056735A1 (en) Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same
Huopaniemi et al. Virtual acoustics—Applications and technology trends
JP2023113505A (en) Loudness measurement device and program
Zhao et al. A simplified model for generating 3D realistic sound in the multimedia and virtual reality systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant