Nothing Special   »   [go: up one dir, main page]

WO2023040820A1 - 音频播放方法、装置、计算机可读存储介质及电子设备 - Google Patents

音频播放方法、装置、计算机可读存储介质及电子设备 Download PDF

Info

Publication number
WO2023040820A1
WO2023040820A1 PCT/CN2022/118396 CN2022118396W WO2023040820A1 WO 2023040820 A1 WO2023040820 A1 WO 2023040820A1 CN 2022118396 W CN2022118396 W CN 2022118396W WO 2023040820 A1 WO2023040820 A1 WO 2023040820A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
audio
audio playback
audio signal
zone
Prior art date
Application number
PCT/CN2022/118396
Other languages
English (en)
French (fr)
Inventor
刘松
朱长宝
牛建伟
Original Assignee
深圳地平线机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳地平线机器人科技有限公司 filed Critical 深圳地平线机器人科技有限公司
Publication of WO2023040820A1 publication Critical patent/WO2023040820A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • the present disclosure relates to the field of computer technology, in particular to an audio playing method, device, computer-readable storage medium and electronic equipment.
  • the current mainstream solution is to set up a separate microphone to collect the user's voice by holding or wearing it.
  • an additional device including a microphone needs to be equipped in the vehicle as a pickup terminal.
  • the microphone can acquire the user's voice. Playback sound from the speaker can be blocked.
  • Embodiments of the present disclosure provide an audio playing method, device, computer-readable storage medium, and electronic equipment.
  • An embodiment of the present disclosure provides an audio playback method, the method comprising: determining an audio playback sound zone from a preset number of sound zones in a target space; acquiring at least one original audio signal collected by a preset microphone array; Perform signal separation on at least one original audio signal to obtain at least one separated audio signal; determine the sound zones corresponding to at least one separated audio signal; control the audio playback device in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • an audio playback device includes: a first determination module, configured to determine the audio playback sound zone from a preset number of sound zones in the target space; the first acquisition The module is used to obtain at least one original audio signal collected by the preset microphone array; the separation module is used to perform signal separation on at least one original audio signal to obtain at least one separated audio signal; the second determination module is used to determine at least one The sound zones corresponding to one separated audio signal; the control module is used to control the audio playback device in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • a computer-readable storage medium stores a computer program, and the computer program is used to execute the above audio playing method.
  • an electronic device includes: a processor; a memory for storing instructions executable by the processor; a processor for reading executable instructions from the memory, and Execute the instructions to realize the above audio playing method.
  • the audio playback sound zone is determined from a preset number of sound zones in the target space, and then at least one channel collected by the microphone array is obtained.
  • the original audio signal, and signal separation of at least one original audio signal to obtain at least one separated audio signal then determine the corresponding sound zones of at least one separated audio signal, and finally control the audio playback device to play the corresponding separation of the audio playback sound zone
  • the fixed-set microphone array is effectively used to collect and play the audio signal emitted in a certain sound area.
  • FIG. 1 is a system diagram to which the present disclosure applies.
  • Fig. 2 is a schematic flowchart of an audio playing method provided by an exemplary embodiment of the present disclosure.
  • Fig. 3 is a schematic diagram of an application scenario of an audio playing method according to an embodiment of the present disclosure.
  • Fig. 4 is a schematic flowchart of an audio playing method provided by another exemplary embodiment of the present disclosure.
  • Fig. 5 is a schematic flowchart of an audio playing method provided by another exemplary embodiment of the present disclosure.
  • Fig. 6 is a schematic flowchart of an audio playing method provided by another exemplary embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of an audio playback device provided by an exemplary embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of an audio playback device provided by another exemplary embodiment of the present disclosure.
  • Fig. 9 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • plural may refer to two or more than two, and “at least one” may refer to one, two or more than two.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick client computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.
  • FIG. 1 shows an exemplary system architecture 100 of an audio playback method or an audio playback device to which an embodiment of the present disclosure can be applied.
  • the system architecture 100 may include a terminal device 101 , a network 102 , a server 103 , a microphone array 104 and an audio playback device 105 .
  • the network 102 is a medium for providing a communication link between the terminal device 101 and the server 103 .
  • Network 102 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the microphone array 104 can collect audio signals emitted within the target space.
  • the audio playing device 105 can play the audio signal collected by the microphone array.
  • the user can use the terminal device 101 to interact with the server 103 through the network 102 to receive or send messages and the like.
  • Various communication client applications such as multimedia applications, search applications, web browser applications, shopping applications, and instant messaging tools, may be installed on the terminal device 101 .
  • the terminal device 101 can be various electronic devices, including but not limited to mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like.
  • the terminal device 101 can control the voice interaction device (the voice interaction device may be the terminal device 101 itself, or other devices connected to the terminal device 101) to perform voice interaction.
  • the server 103 may be a server that provides various services, such as a background server that processes audio signals uploaded by the terminal device 101 .
  • the background server can process the received subtree-one original audio signal, determine the sound zone, etc., and obtain the processing result (for example, the audio signal corresponding to the audio playback sound zone).
  • the audio playing method provided by the embodiments of the present disclosure may be executed by the server 103 or by the terminal device 101 .
  • the numbers of the terminal device 101 , the network 102 , the server 103 , the microphone array 104 and the audio playback device 105 in FIG. 1 are only illustrative. According to implementation requirements, there may be any number of terminal devices 101 , network 102 , server 103 , microphone array 104 and audio playback device 105 .
  • the above system architecture may not include the network 102 and the server 103 , but only include the microphone array 104 , the terminal device 101 and the audio playback device 105 .
  • Fig. 2 is a schematic flowchart of an audio playing method provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic equipment (terminal equipment 101 or server 103 as shown in Figure 1), as shown in Figure 2, the method includes the following steps:
  • Step 201 determine the audio playback sound zone from a preset number of sound zones in the target space.
  • the electronic device may determine the audio playback sound zone from a preset number of sound zones in the target space.
  • the target space may be various spaces, such as inside a car, inside a room, and the like.
  • the sound zone can be a plurality of areas that artificially divide the target space.
  • the sound zone may be the spaces where the driver's seat, the co-pilot's seat, and the seats on both sides of the rear seat are respectively located.
  • the space where the four seats are located can be divided into corresponding sound zones, including 1L, 1R, 2L, and 2R.
  • the audio playback sound zone may be a sound zone that collects and plays sounds (such as human voices, animal calls, musical instruments, etc.) emitted by objects (such as people, animals, musical instruments, etc.) located therein.
  • the target space is the space inside the car
  • the audio playback sound zone may be the space where the driver is.
  • the electronic device can determine the audio playback sound zone based on various methods. For example, the audio playback sound zone may be determined according to the user's operation of manually setting the audio playback sound zone. It is also possible to determine all sound zones as audio playback sound zones.
  • the microphone array collects the singing voice of the passengers.
  • the playback device replays the sound of passengers singing.
  • Step 202 acquiring at least one original audio signal collected by a preset microphone array.
  • the electronic device may acquire at least one original audio signal collected by a preset microphone array.
  • the microphone array (microphone array 104 shown in FIG. 1 ) is used to collect the sound emitted in the target space to obtain at least one original audio signal, and each original audio signal corresponds to one microphone.
  • the microphones a, b, c, and d are respectively set beside the four seats, that is, the microphones a, b, c, and d respectively collect four sounds. Audio signals for zones 1L, 1R, 2L, 2R.
  • Step 203 performing signal separation on at least one original audio signal to obtain at least one separated audio signal.
  • the electronic device may perform signal separation on at least one original audio signal to obtain at least one separated audio signal.
  • the electronic device may use an existing blind source separation technology to perform signal separation on at least one channel of original audio signals.
  • blind source separation refers to the process of recovering each independent component from the source signal without knowing the parameters of the source signal and the transmission channel.
  • Blind source separation can use existing algorithms, such as ICA (Independent Component Analysis, independent component analysis).
  • At least one separated audio signal obtained after separation wherein each separated audio signal can be determined as an audio signal collected from a certain sound area.
  • At least one original audio signal may be preprocessed first, and the preprocessing method may adopt an existing technology.
  • the audio signal collected by the microphone and the reference signal played by the audio playback device are obtained, and these signals are used to perform adaptive acoustic feedback elimination on at least one original audio signal, and the reference signal is filtered through an adaptively fitted acoustic propagation path.
  • the sound collected by the microphone and played by the audio playback device is filtered out from at least one original audio signal, so as to avoid the formation of acoustic feedback and the phenomenon of howling or dragging when the sound is replayed.
  • the obtained preprocessed audio is separated, so that the noise collected by the microphone array can be filtered out.
  • multi-channel noise reduction may be performed on the separated signal.
  • each signal and other signals after separation can be used, and the algorithm of adaptive filtering can be used to take the other signals as a reference, and the signal characteristics of each signal can be used to adaptively filter out non-signals in the separated signal.
  • the algorithm of adaptive filtering can be used to take the other signals as a reference, and the signal characteristics of each signal can be used to adaptively filter out non-signals in the separated signal.
  • Step 204 determining sound zones corresponding to at least one channel of separated audio signals.
  • the electronic device may determine sound zones corresponding to at least one separated audio signal. Specifically, there may not be a one-to-one correspondence between each separated audio signal and the actual sound zone. Therefore, it is necessary to match the separated audio signal with the original audio signal (or the audio signal after the above-mentioned preprocessing operation), so as to determine each The sound zone corresponding to the separated audio signal.
  • the similarity between each separated audio signal and each original audio signal can be determined, and for each separated audio signal, it is determined that the separated audio signal corresponds to The original audio signal corresponding to the maximum similarity degree, according to the determined microphone corresponding to the original audio signal, the sound area of the separated audio signal can be determined.
  • the microphone may be set between two sound zones. After the microphone corresponding to a certain separated audio signal is determined, other methods may be used to determine which sound zone the separated audio signal is collected from. For example, a camera can be set up in the target space, and the camera can capture images of each sound zone, and perform lip movement recognition on the captured image, so as to determine which sound source exists in each sound zone corresponding to the same microphone, and then determine the separated audio frequency. The sound zone corresponding to the signal. Or use the existing sound source localization technology to determine the positional relationship between the microphone and the sound source, thereby determining the sound zone where the sound source is located, and then determine the corresponding sound zone of the separated audio signal.
  • Step 205 controlling the audio playback device in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • the electronic device may control an audio playback device (such as the audio playback device 105 shown in FIG. 1 ) in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • an audio playback device such as the audio playback device 105 shown in FIG. 1
  • the separated audio signal corresponding to the audio playback sound zone can be determined.
  • the electronic device may generate an instruction for instructing the audio playback device to play the separated audio signal, and the audio playback device plays the corresponding separated audio signal based on the instruction.
  • the audio playback sound zone is determined from a preset number of sound zones in the target space, and then at least one original audio signal collected by the microphone array is obtained, and at least one original audio signal is signaled.
  • Separation to obtain at least one separated audio signal, then determine the sound zones corresponding to at least one separated audio signal, and finally control the audio playback device to play the separated audio signal corresponding to the audio playback sound zone, thereby effectively utilizing the fixed microphone array Collect and play audio signals from a certain sound area, without setting up a separate microphone to collect audio signals, and the user does not need to hold or move to the position of a separately set microphone to complete the collection and playback of audio, which saves hardware resources and is convenient User operation, at the same time, it can also shield the audio signals collected by other non-audio playback sound zones during playback, which improves the quality of audio playback.
  • step 201 may be performed as follows:
  • the audio playing mode may include multiple types. For example, a mode of collecting and playing sounds in a single sound zone, a mode of collecting and playing sounds in multiple sound zones, and the like.
  • the audio playback sound zone is determined from a preset number of sound zones.
  • the audio playing mode is the first mode
  • the first mode may be a manually set mode, or a default mode. For example, on a vehicle, when the singing application is turned on, it defaults to the first mode, that is, the chorus mode. At this point, each sound zone is an audio playback sound zone.
  • the audio playing sound zone is determined from a preset number of sound zones based on the user's sound zone selection operation.
  • the second mode is a mode that supports the user to select at least one sound zone as the audio playback sound zone, and the second mode may be a manually set mode or a default mode.
  • the user can select an audio playback zone. For example, in a vehicle, passengers can select the sound zone corresponding to a certain seat as the audio playback sound zone by touching the screen, pressing a button, or voice triggering.
  • the audio playback sound zone can be flexibly set.
  • the user does not need to move the microphone or move his own position, so that the electronic device can collect and play the sound of the audio playback sound zone, thereby improving the audio playback efficiency. convenience.
  • the method may also include the following steps:
  • Step 401 acquiring a voice signal collected from a voice uttered by a user.
  • the speech signal may be a signal collected by the above-mentioned microphone array.
  • Step 402 recognizing the voice signal to obtain a voice recognition result.
  • the method for recognizing the voice signal can adopt the existing technology, and the voice recognition result can be expressed in words.
  • Step 403 based on the voice recognition result, update the audio playback sound zone from a preset number of sound zones.
  • a preset keyword may be extracted from the speech recognition result, and the audio playback sound zone may be updated according to the keyword. For example, if a collected voice signal contains the keyword "I want to sing", then according to the above step 204, the sound zone corresponding to the voice signal can be determined, and the sound zone corresponding to the voice signal can be determined as the audio playback sound zone.
  • steps 401 to 403 may be performed at any time after the above step 201, for example, during the audio playback device playing the separated audio signal, or before or after playing the separated audio signal.
  • the audio playback sound zone can be adjusted flexibly and conveniently through voice interaction, and the user does not need to manually operate, which greatly improves the audio playback. convenience.
  • step 401 may be performed as follows:
  • the electronic device may determine the keyword indicating the target sound region from the speech recognition result. For example, if the speech recognition result includes "chorus”, then all sound regions can be determined as the target sound region; if the speech recognition result includes "front row reception", then 1L and 1R as shown in Figure 3 are target sound regions; If the speech recognition result does not contain keywords indicating the target sound range, but contains keywords for adjusting the audio playback sound range, then the sound range for generating the voice signal can be determined as the target sound range according to the method described in step 204 above.
  • the speech recognition result includes the keywords "I don't want to sing", “I want to sing”, etc., it can be determined that the speech recognition result does not contain the keyword indicating the sound zone, but the keyword is used to adjust the sound zone of the audio playback. keywords.
  • the audio playback device is controlled to stop playing the separated audio corresponding to the target audio playback sound zone Signal.
  • the speech recognition result includes the keyword "I don't want to sing anymore"
  • the target sound zone for generating the voice signal is an audio playback sound zone
  • This implementation method determines the target sound zone indicated by the speech recognition result, and when the speech recognition result indicates that the separated audio signal corresponding to the target sound region should be stopped, the audio playback device is controlled to stop playing the separated audio signal, so that the user can flexibly use voice
  • the audio playback device can be accurately controlled to stop playing the collected audio signal, and the user does not need to manually operate, which further improves the convenience of audio playback.
  • the method may further include:
  • the target sound zone is adjusted to the audio playback sound zone.
  • This implementation mode adjusts the audio playback sound zone through voice control, so that users at any location can conveniently participate in the sound playback process without manual operation, which further improves the convenience of audio playback.
  • the method may further include:
  • the speech recognition result is information indicating playing a preset sound effect
  • determine a sound effect corresponding to the speech recognition result In response to determining that the speech recognition result is information indicating playing a preset sound effect, determine a sound effect corresponding to the speech recognition result, and play the sound effect.
  • the speech recognition result includes "good singing”
  • the corresponding sound effect audio such as clapping and cheering may be extracted and played.
  • the content of audio playback can be enriched.
  • the method may further include:
  • Step 404 in response to determining that the speech recognition result is information indicating that the target sound zone is adjusted to the main audio playback sound zone, adjusting the target sound zone to the main audio playback sound zone, and adjusting the audio playback sound zones outside the target sound zone to Secondary audio playback zone.
  • the number of audio playback sound zones in this step is at least two, one of which is adjusted as the main audio playback sound zone, and the other audio playback sound zones are auxiliary audio playback sound zones.
  • the target sound zone for generating the voice signal may be determined as the main audio playback sound zone.
  • Step 405 suppressing the separated audio signal corresponding to the secondary audio playback sound zone to obtain the suppressed audio signal.
  • the volume of the audio corresponding to the secondary audio playback sound zone may be reduced when playing.
  • perform harmony processing on the audio signals collected by the main audio playback zone and the secondary audio playback zone that is, use the audio signals collected by the main audio playback zone as the main melody, and use the audio signals collected by the secondary audio playback zone as the harmony sound mixed playback.
  • Step 406 Mix and play the separated audio signal and the suppressed audio signal corresponding to the main audio playback sound zone.
  • This implementation method determines the main audio playback sound zone and the auxiliary audio playback sound zone through voice control, and suppresses the separated audio signal corresponding to the auxiliary audio playback sound zone, so that the mixed playback audio can highlight the sound of the main audio playback sound zone, Users can more clearly distinguish the sound collected by the main audio playback zone, and users can flexibly adjust the main and auxiliary audio playback zones, which enriches the audio playback control methods and further improves the convenience of audio playback.
  • the method may further include:
  • Step 206 determine the current audio playback mode.
  • Step 207 in response to determining that the audio playback mode is the third mode, score the separated audio signals respectively played in at least one audio playback zone.
  • the third mode is a mode for scoring the played separated audio signal.
  • the third mode is a mode for scoring the played separated audio signal.
  • the electronic device detects that the current audio playing mode is the third mode, it starts to score the played separated audio signal.
  • the method for scoring can adopt the existing method for scoring audio, for example, in the scene of singing, score according to whether the frequency of the user's voice is aligned with the reference frequency, whether the volume is appropriate, etc.
  • Step 208 based on the score, select the manager sound zone from at least one audio play sound zone.
  • the audio playback sound zone corresponding to the highest score may be determined as the manager sound zone.
  • Step 209 acquiring the voice signal corresponding to the voice zone of the manager.
  • voice signals of users located in the administrator's sound zone can be collected, while voice signals of users in other sound zones can be shielded.
  • Step 210 perform voice interaction operation based on the voice signal corresponding to the voice zone of the administrator.
  • the voice signal corresponding to the manager's voice zone can be recognized, and the audio playback process can be controlled according to the voice recognition result. For example, voice control updates the audio playback zone, selects the track to play, and so on.
  • This implementation method scores the separated audio signals played by each audio playback zone in the third mode, and switches the manager’s zone based on the scoring results, which can further enrich the audio playback process and improve the efficiency of switching the manager’s zone. degree of automation.
  • Fig. 7 is a schematic structural diagram of an audio playback device provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic equipment.
  • the audio playback device includes: a first determination module 701, configured to determine the audio playback sound zone from a preset number of sound zones in the target space; the first acquisition Module 702, for obtaining at least one original audio signal collected by a preset microphone array; separation module 703, for performing signal separation on at least one original audio signal to obtain at least one separated audio signal; second determination module 704, using To determine the sound zones corresponding to at least one separated audio signal; the control module 705 is configured to control the audio playback device in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • the first determining module 701 may determine the audio playback sound zone from a preset number of sound zones in the target space.
  • the target space may be various spaces, such as inside a car, inside a room, and the like.
  • the sound zone can be a plurality of areas that artificially divide the target space.
  • the sound zone may be the spaces where the driver's seat, the co-pilot's seat, and the seats on both sides of the rear seat are respectively located.
  • the space where the four seats are located can be divided into corresponding sound zones, including 1L, 1R, 2L, and 2R.
  • the audio playback sound zone may be a sound zone that collects and plays sounds emitted therein.
  • the target space is the space inside the car
  • the audio playback sound zone may be the space where the driver is.
  • the first determination module 701 may determine the audio playback sound zone based on various methods. For example, the audio playback sound zone may be determined according to the user's operation of manually setting the audio playback sound zone. It is also possible to determine all sound zones as audio playback sound zones.
  • the microphone array collects the singing voice of the passengers.
  • the playback device replays the sound of passengers singing.
  • the first obtaining module 702 may obtain at least one original audio signal collected by a preset microphone array.
  • the microphone array (microphone array 104 shown in FIG. 1 ) is used to collect the sound emitted in the target space to obtain at least one original audio signal, and each original audio signal corresponds to one microphone.
  • the microphones a, b, c, and d are respectively set beside the four seats, that is, the microphones a, b, c, and d respectively collect four sounds. Audio signals for zones 1L, 1R, 2L, 2R.
  • the separation module 703 may perform signal separation on at least one original audio signal to obtain at least one separated audio signal.
  • the separation module 703 may use an existing blind source separation technology to perform signal separation on at least one path of original audio signals.
  • blind source separation refers to the process of recovering each independent component from the source signal without knowing the parameters of the source signal and the transmission channel.
  • Blind source separation can use existing algorithms, such as ICA (Independent Component Analysis, independent component analysis).
  • At least one separated audio signal obtained after separation wherein each separated audio signal can be determined as an audio signal collected from a certain sound area.
  • At least one original audio signal may be preprocessed first, and the preprocessing method may adopt an existing technology.
  • the audio signal collected by the microphone and the reference signal played by the audio playback device are obtained, and these signals are used to perform adaptive acoustic feedback elimination on at least one original audio signal, and the reference signal is filtered through an adaptively fitted acoustic propagation path.
  • the sound collected by the microphone and played by the audio playback device is filtered out from at least one original audio signal, so as to avoid the formation of acoustic feedback and the phenomenon of howling or dragging when the sound is replayed.
  • the obtained preprocessed audio is separated, so that the noise collected by the microphone array can be filtered out.
  • multi-channel noise reduction may be performed on the separated signal.
  • each signal and other signals after separation can be used, and the algorithm of adaptive filtering can be used to take the other signals as a reference, and the signal characteristics of each signal can be used to adaptively filter out non-signals in the separated signal.
  • the algorithm of adaptive filtering can be used to take the other signals as a reference, and the signal characteristics of each signal can be used to adaptively filter out non-signals in the separated signal.
  • the second determination module 704 may determine sound zones corresponding to at least one channel of separated audio signals. Specifically, there may not be a one-to-one correspondence between each separated audio signal and the actual sound zone. Therefore, it is necessary to match the separated audio signal with the original audio signal (or the audio signal after the above-mentioned preprocessing operation), so as to determine each The sound zone corresponding to the separated audio signal.
  • the similarity between each separated audio signal and each original audio signal can be determined, and for each separated audio signal, it is determined that the separated audio signal corresponds to The original audio signal corresponding to the maximum similarity degree, according to the determined microphone corresponding to the original audio signal, the sound area of the separated audio signal can be determined.
  • the microphone may be set between two sound zones. After the microphone corresponding to a certain separated audio signal is determined, other methods may be used to determine which sound zone the separated audio signal is collected from. For example, a camera can be set up in the target space, and the camera can capture images of each sound zone, and perform lip movement recognition on the captured image, so as to determine which sound source exists in each sound zone corresponding to the same microphone, and then determine the separated audio frequency. The sound zone corresponding to the signal. Or use the existing sound source localization technology to determine the positional relationship between the microphone and the sound source, thereby determining the sound zone where the sound source is located, and then determine the corresponding sound zone of the separated audio signal.
  • control module 705 may control an audio playback device (such as the audio playback device 105 shown in FIG. 1 ) in the target space to play the separated audio signal corresponding to the audio playback sound zone.
  • an audio playback device such as the audio playback device 105 shown in FIG. 1
  • the separated audio signal corresponding to the audio playback sound zone can be determined.
  • the control module 705 may generate an instruction for instructing the audio playback device to play the separated audio signal, and the audio playback device plays the corresponding separated audio signal based on the instruction.
  • FIG. 8 is a schematic structural diagram of an audio playback device provided by another exemplary embodiment of the present disclosure.
  • the first determining module 701 may include: a first determining unit 7011, configured to determine the current audio playback mode; a second determining unit 7012, configured to select from a preset number of The audio playback zone is determined in each zone.
  • the device may also include: a second acquiring module 706, configured to acquire a voice signal collected from the user's voice; a recognition module 707, configured to recognize the voice signal and obtain a voice recognition result; update Module 708, configured to update the audio playback sound zone from a preset number of sound zones based on the speech recognition result.
  • a second acquiring module 706, configured to acquire a voice signal collected from the user's voice
  • a recognition module 707 configured to recognize the voice signal and obtain a voice recognition result
  • update Module 708 configured to update the audio playback sound zone from a preset number of sound zones based on the speech recognition result.
  • the update module 708 may include: a third determining unit 7081, configured to determine the target sound zone indicated by the speech recognition result; a control unit 7082, configured to respond to the determination that the target sound zone is an audio playback zone , and the speech recognition result is information indicating that the separated audio signal corresponding to the target sound region is stopped, and the audio playback device is controlled to stop playing the separated audio signal corresponding to the target sound region.
  • the update module 708 further includes: a first adjustment unit 7083, configured to adjust the target sound range to Audio playback zone.
  • the device may also include: a first playing module 709, configured to determine the sound effect corresponding to the speech recognition result and play the sound effect in response to determining that the speech recognition result is information indicating playing a preset sound effect .
  • a first playing module 709 configured to determine the sound effect corresponding to the speech recognition result and play the sound effect in response to determining that the speech recognition result is information indicating playing a preset sound effect .
  • the device may further include: a second adjustment module 710, configured to adjust the target sound range to The main audio playback sound zone, and the audio playback sound zone outside the target sound zone is adjusted to the secondary audio playback sound zone; the suppression module 711 is used to suppress the separated audio signal corresponding to the secondary audio playback sound zone, and after being suppressed Audio signal; the second playing module 712, configured to mix and play the separated audio signal and the suppressed audio signal corresponding to the main audio playing zone.
  • a second adjustment module 710 configured to adjust the target sound range to The main audio playback sound zone, and the audio playback sound zone outside the target sound zone is adjusted to the secondary audio playback sound zone
  • the suppression module 711 is used to suppress the separated audio signal corresponding to the secondary audio playback sound zone, and after being suppressed Audio signal
  • the second playing module 712 configured to mix and play the separated audio signal and the suppressed audio signal corresponding to the main audio playing zone.
  • the device may further include: a third determining module 713, configured to determine the current audio playback mode; a scoring module 714, configured to respond to determining that the audio playback mode is the third mode, The separated audio signal played by an audio playback sound zone respectively is scored; the selection module 715 is used to select the manager's sound zone from at least one audio playback sound zone based on the score; the third acquisition module 716 is used to obtain the manager's voice The voice signal corresponding to the area; the interaction module 717 is used to perform voice interaction operations based on the voice signal corresponding to the manager's voice area.
  • a third determining module 713 configured to determine the current audio playback mode
  • a scoring module 714 configured to respond to determining that the audio playback mode is the third mode, The separated audio signal played by an audio playback sound zone respectively is scored
  • the selection module 715 is used to select the manager's sound zone from at least one audio playback sound zone based on the score
  • the third acquisition module 716 is used to obtain the manager's voice
  • the audio playback device determines the audio playback sound zone from a preset number of sound zones in the target space, and then acquires at least one original audio signal collected by the microphone array, and performs at least one original audio signal Signal separation, obtain at least one separated audio signal, then determine the corresponding sound zones of at least one separated audio signal, and finally control the audio playback device to play the separated audio signal corresponding to the audio playback sound region, thereby effectively utilizing the fixed microphone
  • the array collects and plays audio signals from a certain sound area.
  • the electronic device may be either or both of the terminal device 101 and the server 103 shown in FIG. 1 , or a stand-alone device independent of them. Receive the acquired input signal.
  • FIG. 9 illustrates a block diagram of an electronic device according to an embodiment of the present disclosure.
  • an electronic device 900 includes one or more processors 901 and a memory 902 .
  • the processor 901 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.
  • CPU central processing unit
  • the processor 901 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.
  • Memory 902 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 901 may execute the program instructions to implement the above audio playback methods and/or other desired functions of the various embodiments of the present disclosure.
  • Various contents such as raw audio signals may also be stored in the computer readable storage medium.
  • the electronic device 900 may further include: an input device 903 and an output device 904, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 903 may be a microphone, a mouse, a keyboard, etc., for inputting original audio signals, various instructions, and the like.
  • the input device 903 may be a communication network connector for receiving input original audio signals, various instructions, etc. from the terminal device 101 and the server 103 .
  • the output device 904 can output various information to the outside, including the separated audio signal.
  • the output device 904 may include, for example, a display, a speaker, a printer, a communication network and remote output devices connected thereto, and the like.
  • the electronic device 900 may further include any other appropriate components.
  • embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the above-mentioned "exemplary method" of this specification.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
  • each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种音频播放方法、装置、计算机可读存储介质及电子设备,其中,该方法包括:从目标空间内的预设数量个音区中确定音频播放音区;获取预设的麦克风阵列采集的至少一路原始音频信号;对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;确定至少一路分离后音频信号分别对应的音区;控制目标空间内的音频播放设备播放音频播放音区对应的分离后音频信号。无需设置单独的麦克风采集音频信号,用户也无需手持或移动到单独设置的麦克风的位置即可完成采集和播放音频,节约了硬件资源,方便了用户操作,同时还可以在播放时屏蔽其他非音频播放音区采集的音频信号,提高了音频播放的质量。

Description

音频播放方法、装置、计算机可读存储介质及电子设备
本公开要求在2021年9月17日提交的、申请号为202111095336.7、发明名称为“音频播放方法、装置、计算机可读存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机技术领域,尤其是一种音频播放方法、装置、计算机可读存储介质及电子设备。
背景技术
在一些包含多个人的空间内,需要将某些人或某些区域内发出的声音采集并播放出来。目前主流的方案是设置单独的麦克风,通过用户手持、佩戴等方式采集用户的声音。例如,用户在车辆内部唱歌的时候,需要在车内配备额外的包括麦克风的设备作为拾音终端,通过对这些拾音终端的麦克风灵敏度、指向性等参数进行设计,使得麦克风获取用户的声音时可以屏蔽扬声器的播放声。
发明内容
本公开的实施例提供了一种音频播放方法、装置、计算机可读存储介质及电子设备。
本公开的实施例提供了一种音频播放方法,该方法包括:从目标空间内的预设数量个音区中确定音频播放音区;获取预设的麦克风阵列采集的至少一路原始音频信号;对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;确定至少一路分离后音频信号分别对应的音区;控制目标空间内的音频播放设备播放音频播放音区对应的分离后音频信号。
根据本公开实施例的另一个方面,提供了一种音频播放装置,该装置包括:第一确定模块,用于从目标空间内的预设数量个音区中确定音频播放音区;第一获取模块,用于获取预设的麦克风阵列采集的至少一路原始音频信号;分离模块,用于对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;第二确定模块,用于确定至少一路分离后音频信号分别对应的音区;控制模块,用于控制目标空间内的音频播放设备播放音频播放音区对应的分离后音频信号。
根据本公开实施例的另一个方面,提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序用于执行上述音频播放方法。
根据本公开实施例的另一个方面,提供了一种电子设备,电子设备包括:处理器;用于存储处理器可执行指令的存储器;处理器,用于从存储器中读取可执行指令,并执行指令以实现上述音频播放方法。
基于本公开上述实施例提供的音频播放方法、装置、计算机可读存储介质及电子设备,通过从目标空间内的预设数量个音区中确定音频播放音区,然后获取麦克风阵列采集的至少一路原始音频信号,并对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号,接着确定至少一路分离后音频信号分别对应的音区,最后控制音频播放设备播放音频播放音区对应的分离后音频信号,从而有效利用了固定设置的麦克风阵列采集某个音区内发出的音频信号并播放,无需设置单独的麦克风采集音频信号,用户也无需手持或移动到单独设置的麦克风的位置即可完成采集和播放音频,节约了硬件资源,方便了用户操作,同时还可以在播放时屏蔽其他非音频播放音区采集的音频信号,提高了音频播放的质量。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其他目的、特征和优势将变得更 加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1是本公开所适用的系统图。
图2是本公开一示例性实施例提供的音频播放方法的流程示意图。
图3是本公开的实施例的音频播放方法的一个应用场景的示意图。
图4是本公开另一示例性实施例提供的音频播放方法的流程示意图。
图5是本公开又一示例性实施例提供的音频播放方法的流程示意图。
图6是本公开又一示例性实施例提供的音频播放方法的流程示意图。
图7是本公开一示例性实施例提供的音频播放装置的结构示意图。
图8是本公开另一示例性实施例提供的音频播放装置的结构示意图。
图9是本公开一示例性实施例提供的电子设备的结构图。
具体实施方式
下面,将参考附图详细地描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
示例性系统
图1示出了可以应用本公开的实施例的音频播放方法或音频播放装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、网络102、服务器103、麦克风阵列104和音频播放设备105。网络102用于在终端设备101和服务器103之间提供通信链路的介质。网络102可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
麦克风阵列104可以采集目标空间内发出的音频信号。音频播放设备105可以将麦克风阵列采集的音频信号进行播放。
用户可以使用终端设备101通过网络102与服务器103交互,以接收或发送消息等。终端设备101上可以安装有各种通讯客户端应用,例如多媒体应用、搜索类应用、网页浏览器应用、购物类应用、即时通信工具等。
终端设备101可以是各种电子设备,包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。终端设备101可以控制语音交互设备(语 音交互设备可以是终端设备101本身,也可以是与终端设备101连接的其他设备)进行语音交互。
服务器103可以是提供各种服务的服务器,例如对终端设备101上传的音频信号进行处理的后台服务器。后台服务器可以对接收到的子树一路原始音频信号分离、确定音区等处理,得到处理结果(例如音频播放音区对应的音频信号)。
需要说明的是,本公开的实施例所提供的音频播放方法可以由服务器103执行,也可以由终端设备101执行。
应该理解,图1中的终端设备101、网络102、服务器103、麦克风阵列104和音频播放设备105的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备101、网络102、服务器103、麦克风阵列104和音频播放设备105。例如,在音频信号不需要远程处理的情况下,上述系统架构可以不包括网络102和服务器103,只包括麦克风阵列104、终端设备101和音频播放设备105。
示例性方法
图2是本公开一示例性实施例提供的音频播放方法的流程示意图。本实施例可应用在电子设备(如图1所示的终端设备101或服务器103)上,如图2所示,该方法包括如下步骤:
步骤201,从目标空间内的预设数量个音区中确定音频播放音区。
在本实施例中,电子设备可以从目标空间内的预设数量个音区中确定音频播放音区。其中,目标空间可以是各种空间,例如车内、房间内等。音区可以是人为地对目标空间进行划分的多个区域。例如,当目标音区是车内空间时,音区可以是驾驶位、副驾驶位、后排座两侧座椅分别所在的空间。如图3所示,可以将四个座椅所在的空间分别划分为相应的音区,包括1L、1R、2L、2R。
音频播放音区可以是对位于其内的对象(例如人、动物、乐器等)发出的声音(例如人声、动物的叫声、乐器的演奏声等)进行采集并播放的音区。例如,目标空间为车内空间,音频播放音区可以为驾驶员所在的空间。电子设备可以基于各种方式确定音频播放音区。例如,可以根据用户手动设置音频播放音区的操作确定音频播放音区。也可以将所有音区均确定为音频播放音区。
在一个场景中,车辆上乘客想要唱歌,可以通过操纵车辆上的触控屏选择音频播放音区,在后续的步骤中,麦克风阵列采集乘客唱歌的声音,经过后续的步骤处理后,在音频播放设备对乘客唱歌的声音重放。
步骤202,获取预设的麦克风阵列采集的至少一路原始音频信号。
在本实施例中,电子设备可以获取预设的麦克风阵列采集的至少一路原始音频信号。其中,麦克风阵列(如图1所示的麦克风阵列104)用于采集目标空间内发出的声音,得到至少一路原始音频信号,每路原始音频信号对应一个麦克风。
作为示例,如图3所示,当目标空间为车内空间时,麦克风a、b、c、d分别设置在四个座椅的旁边,即麦克风a、b、c、d分别采集四个音区1L、1R、2L、2R的音频信号。
步骤203,对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号。
在本实施例中,电子设备可以对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号。
作为示例,电子设备可以采用现有的盲源分离技术,对至少一路原始音频信号进行信号分离。其中,盲源分离是指在不知源信号和传输通道的参数的情况下,从源信号中恢复各个独立成分的过程。盲源分离可以采用现有的算法,例如ICA(Independent Component Analysis,独立成分分析)。
经过分离后得到的至少一路分离后音频信号,其中的每路分离后音频信号可以确定为从某个音区采集的音频信号。
可选的,在对至少一路原始音频信号进行信号分离时,可以首先对至少一路原始音频信号进行预处理,预处理的方法可以采用现有技术。例如获取麦克风采集的音频信号以及由音频播放设备播放的参考信号, 利用这些信号对至少一路原始音频信号进行自适应的声反馈消除,利用参考信号,经过自适应拟合的声学传播路径滤波,将麦克风采集的来自音频播放设备播放的声音从至少一路原始音频信号中滤除,从而避免在声重放时,形成声反馈,产生啸叫或者拖音现象。经过预处理后,再对得到的预处理后音频进行分离,可以将麦克风阵列采集的噪声滤除。
可选的,还可以对至少一路原始音频信号进行信号分离后,对分离后的信号进行多通道降噪。具体地,可以利用分离后每一路信号以及其余各路信号,利用自适应滤波的算法,将其余各路信号作为参考,利用每一路信号的信号特性,自适应滤除分离后的信号中的非对应音区的噪声和语音的残留。也可以采用深度学习的模型,通过预先训练好的模型,将各路分离信号输入模型中,最终输出纯净的分离后音频信号。
步骤204,确定至少一路分离后音频信号分别对应的音区。
在本实施例中,电子设备可以确定至少一路分离后音频信号分别对应的音区。具体地,各个分离后音频信号与实际的音区可能不是一一对应的,因此,需要将分离后音频信号和原始音频信号(或经过上述预处理操作后的音频信号)进行匹配,从而确定每个分离后音频信号对应的音区。
作为示例,可以确定各分离后音频信号和各原始音频信号(或经过上述预处理操作后的音频信号)两两之间的相似度,对于每个分离后音频信号,确定该分离后音频信号对应的最大相似度对应的原始音频信号,根据确定的原始音频信号对应的麦克风,可以确定该分离后音频信号的音区。
需要说明的是,麦克风与音区可以不是一一对应的。例如,麦克风可以设置在两个音区之间,当确定某个分离后音频信号对应的麦克风后,还可以采用其他方法确定该分离后音频信号是该麦克风从哪个音区采集的信号。例如,可以在目标空间设置摄像头,摄像头对各个音区拍摄图像,对拍摄的图像进行唇部动作识别,从而确定同一麦克风对应的各个音区中,哪个音区存在声源,进而确定分离后音频信号对应的音区。或者利用现有的声源定位技术,确定麦克风与声源的位置关系,从而确定声源所在的音区,进而确定分离后音频信号对应的音区。
步骤205,控制目标空间内的音频播放设备播放音频播放音区对应的分离后音频信号。
在本实施例中,电子设备可以控制目标空间内的音频播放设备(例如图1所示的音频播放设备105)播放音频播放音区对应的分离后音频信号。
具体地,在确定每个分离后音频信号对应的音区后,即可确定音频播放音区对应的分离后音频信号。电子设备可以生成用于指示音频播放设备播放分离后音频信号的指令,音频播放设备基于该指令,播放对应的分离后音频信号。
本公开的上述实施例提供的方法,通过从目标空间内的预设数量个音区中确定音频播放音区,然后获取麦克风阵列采集的至少一路原始音频信号,并对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号,接着确定至少一路分离后音频信号分别对应的音区,最后控制音频播放设备播放音频播放音区对应的分离后音频信号,从而有效利用了固定设置的麦克风阵列采集某个音区内发出的音频信号并播放,无需设置单独的麦克风采集音频信号,用户也无需手持或移动到单独设置的麦克风的位置即可完成采集和播放音频,节约了硬件资源,方便了用户操作,同时还可以在播放时屏蔽其他非音频播放音区采集的音频信号,提高了音频播放的质量。
在一些可选的实现方式中,步骤201可以如下执行:
首先,确定当前的音频播放模式。
其中,音频播放模式可以包括多种。例如,采集单独一个音区内的声音并播放的模式,采集多个音区内的声音并播放的模式等。
然后,基于所属音频播放模式,从预设数量个音区中确定音频播放音区。
作为示例,若音频播放模式为第一模式,确定预设数量个音区均为音频播放音区。其中,第一模式可 以是手动设置的模式,也可以默认的模式。例如,在车辆上,当开启唱歌应用时,默认为第一模式,即合唱模式。此时,每个音区均为音频播放音区。
若音频播放模式为第二模式,基于用户进行的音区选择操作,从预设数量个音区中确定音频播放音区。
其中,第二模式为支持用户选择至少一个音区作为音频播放音区的模式,第二模式可以是手动设置的模式,也可以默认的模式。在第二模式下,用户可以选择音频播放音区。例如,在车辆上,乘客可以通过触摸屏幕、按下按键、语音触发等方式选择某个座位对应的音区作为音频播放音区。
本实现方式通过设置音频播放模式,可以实现灵活地设置音频播放音区,用户无需移动麦克风或移动自身的位置,即可使电子设备采集并播放音频播放音区的声音,从而提高了音频播放的便利性。
在一些可选的实现方式中,如图4所示,该方法还可以包括如下步骤:
步骤401,获取对用户发出的语音采集的语音信号。
其中,语音信号可以是上述麦克风阵列采集的信号。
步骤402,识别语音信号,得到语音识别结果。
其中,识别语音信号的方法可以采用现有技术,语音识别结果可以用文字表示。
步骤403,基于语音识别结果,从预设数量个音区中更新音频播放音区。
具体地,可以从语音识别结果中提取预设的关键词,根据关键词来更新音频播放音区。例如,若采集的某个语音信号包含关键词“我要唱歌”,则可以根据上述步骤204,确定该语音信号对应的音区,将该语音信号对应的音区确定为音频播放音区。
需要说明的是,步骤401-步骤403可以在上述步骤201之后的任意时刻执行,例如,在音频播放设备播放分离后音频信号过程中执行,或者在播放分离后音频信号之前或之后执行。
本实现方式,通过对用户的语音进行识别,基于语音识别结果更新音频播放音区,可以实现通过语音交互的方式,灵活且方便地调整音频播放音区,用户无需手动操作,大大提高了音频播放的便利性。
在一些可选的实现方式中,步骤401可以如下执行:
首先,确定语音识别结果指示的目标音区。
具体地,电子设备可以从语音识别结果中确定指示目标音区的关键词。例如,若语音识别结果包含“合唱”,则可以将所有音区确定为目标音区;若语音识别结果包含“前排收音”,则如图3所示的1L和1R为目标音区;若语音识别结果不包含指示目标音区的关键词,但包含用于调整音频播放音区的关键词,则可以根据上述步骤204描述的方法,确定产生语音信号的音区作为目标音区。例如,若语音识别结果包括关键词“我不想唱了”、“我要唱歌”等,则可以确定语音识别结果不包含指示音区的关键词,但该关键词为用于调整音频播放音区的关键词。
然后,响应于确定目标音区为音频播放音区,且语音识别结果为表示停止播放目标音区对应的分离后音频信号的信息,控制音频播放设备停止播放目标音频播放音区对应的分离后音频信号。
基于上述示例,若语音识别结果包括关键词“我不想唱了”,且产生语音信号的目标音区为音频播放音区,则可以确定语音识别结果表示停止播放目标音区对应的分离后音频信号,生成控制音频播放设备停止播放目标音区对应的分离后音频信号的指令。
本实现方式通过确定语音识别结果指示的目标音区,在语音识别结果表示停止播放目标音区对应的分离后音频信号时,控制音频播放设备停止播放分离后音频信号,从而可以由用户通过语音灵活地控制音频播放设备停止播放采集的音频信号,用户无需手动操作,进一步提高了音频播放的便利性。
在一些可选的实现方式中,在上述确定语音识别结果指示的目标音区之后,该方法还可以包括:
响应于确定语音识别结果为表示将目标音区调整为音频播放音区的信息,将目标音区调整为音频播放音区。
具体地,作为示例,若语音识别结果包含关键词“前排收音”,则如图3所示的1L和1R调整为音频播放音区,若语音识别结果包含关键词“后排收音”,则如图3所示的2L和2R调整为音频播放音区。
本实现方式通过语音控制调整音频播放音区,实现了任意位置的用户可以方便地参与到声音重放的过程,无需手动操作,进一步提高了音频播放的便利性。
在一些可选的实现方式中,在步骤402之后,该方法还可以包括:
响应于确定语音识别结果为表示播放预设的音效的信息,确定语音识别结果对应的音效,并播放音效。
作为示例,若语音识别结果包括“唱的好棒”,则可以提取对应的鼓掌、欢呼等音效音频并播放。
本实现方式通过识别语音,播放相应的音效,可以使音频播放的内容更丰富。
在一些可选的实现方式中,如图5所示,在步骤402之后,该方法还可以包括:
步骤404,响应于确定语音识别结果为表示将目标音区调整为主音频播放音区的信息,将目标音区调整为主音频播放音区,以及将目标音区以外的音频播放音区调整为辅音频播放音区。
本步骤中的音频播放音区的数量为至少两个,其中一个调整为主音频播放音区,其他音频播放音区为辅音频播放音区。作为示例,若语音识别结果包括“我要主唱”,则可以将产生语音信号的目标音区确定为主音频播放音区。
步骤405,对辅音频播放音区对应的分离后音频信号进行抑制,得到抑制后音频信号。
作为示例,可以减小辅音频播放音区对应的音频在播放时的音量。或者,对主音频播放音区和辅音频播放音区采集的音频信号进行和声处理,即将主音频播放音区采集的音频信号作为主旋律,将辅音频播放音区采集的音频信号作为和声声部混合播放。
步骤406,将主音频播放音区对应的分离后音频信号和抑制后音频信号混合播放。
本实现方式通过语音控制确定主音频播放音区和辅音频播放音区,并对辅音频播放音区对应的分离后音频信号进行抑制,可以使混合播放的音频突出主音频播放音区的声音,用户可以更清楚地分辨主音频播放音区采集的声音,用户可以灵活地对主辅音频播放音区进行调整,丰富了音频播放控制的方式,进一步提高了音频播放的便利性。
在一些可选的实现方式中,如图6所示,在步骤205之后,该方法还可以包括:
步骤206,确定当前的音频播放模式。
步骤207,响应于确定音频播放模式为第三模式,对在至少一个音频播放音区分别播放的分离后音频信号进行评分。
其中,第三模式为对播放的分离后音频信号进行评分的模式。作为示例,若用户点击屏幕上的表示切换到第三模式的按钮(例如显示“PK一下”的按钮),或识别用户的语音包括表示切换到第三模式的关键词(例如关键词“PK”、“竞唱”等),将当前的音频播放模式切换到第三模式。电子设备在检测到当前的音频播放模式为第三模式时,开始对播放的分离后音频信号进行评分。评分的方法可以采用现有的对音频进行评分的方法,例如在唱歌的场景下,根据用户的声音频率与参考频率是否对齐,音量高低是否合适等进行评分。
步骤208,基于评分,从至少一个音频播放音区中选择管理者音区。
作为示例,可以将最高评分对应的音频播放音区确定为管理者音区。
步骤209,获取管理者音区对应的语音信号。
具体地,通过麦克风阵列,可以采集位于管理者音区的用户的语音信号,同时屏蔽其他音区的用户的语音信号。
步骤210,基于管理者音区对应的语音信号,进行语音交互操作。
具体地,可以对管理者音区对应的语音信号进行识别,根据语音识别结果,对音频播放的过程进行操 控。例如语音控制更新音频播放音区,选择播放的曲目等。
本实现方式通过在第三模式下对各个音频播放音区播放的分离后音频信号进行评分,基于评分结果切换管理者音区,可以进一步使音频播放的过程更加丰富,提高切换管理者音区的自动化程度。
示例性装置
图7是本公开一示例性实施例提供的音频播放装置的结构示意图。本实施例可应用在电子设备上,如图7所示,音频播放装置包括:第一确定模块701,用于从目标空间内的预设数量个音区中确定音频播放音区;第一获取模块702,用于获取预设的麦克风阵列采集的至少一路原始音频信号;分离模块703,用于对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;第二确定模块704,用于确定至少一路分离后音频信号分别对应的音区;控制模块705,用于控制目标空间内的音频播放设备播放音频播放音区对应的分离后音频信号。
在本实施例中,第一确定模块701可以从目标空间内的预设数量个音区中确定音频播放音区。其中,目标空间可以是各种空间,例如车内、房间内等。音区可以是人为地对目标空间进行划分的多个区域。例如,当目标音区是车内空间时,音区可以是驾驶位、副驾驶位、后排座两侧座椅分别所在的空间。如图3所示,可以将四个座椅所在的空间分别划分为相应的音区,包括1L、1R、2L、2R。
音频播放音区可以是对其内发出的声音进行采集并播放的音区。例如,目标空间为车内空间,音频播放音区可以为驾驶员所在的空间。第一确定模块701可以基于各种方式确定音频播放音区。例如,可以根据用户手动设置音频播放音区的操作确定音频播放音区。也可以将所有音区均确定为音频播放音区。
在一个场景中,车辆上乘客想要唱歌,可以通过操纵车辆上的触控屏选择音频播放音区,在后续的步骤中,麦克风阵列采集乘客唱歌的声音,经过后续的步骤处理后,在音频播放设备对乘客唱歌的声音重放。
在本实施例中,第一获取模块702可以获取预设的麦克风阵列采集的至少一路原始音频信号。其中,麦克风阵列(如图1所示的麦克风阵列104)用于采集目标空间内发出的声音,得到至少一路原始音频信号,每路原始音频信号对应一个麦克风。
作为示例,如图3所示,当目标空间为车内空间时,麦克风a、b、c、d分别设置在四个座椅的旁边,即麦克风a、b、c、d分别采集四个音区1L、1R、2L、2R的音频信号。
在本实施例中,分离模块703可以对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号。
作为示例,分离模块703可以采用现有的盲源分离技术,对至少一路原始音频信号进行信号分离。其中,盲源分离是指在不知源信号和传输通道的参数的情况下,从源信号中恢复各个独立成分的过程。盲源分离可以采用现有的算法,例如ICA(Independent Component Analysis,独立成分分析)。
经过分离后得到的至少一路分离后音频信号,其中的每路分离后音频信号可以确定为从某个音区采集的音频信号。
可选的,在对至少一路原始音频信号进行信号分离时,可以首先对至少一路原始音频信号进行预处理,预处理的方法可以采用现有技术。例如获取麦克风采集的音频信号以及由音频播放设备播放的参考信号,利用这些信号对至少一路原始音频信号进行自适应的声反馈消除,利用参考信号,经过自适应拟合的声学传播路径滤波,将麦克风采集的来自音频播放设备播放的声音从至少一路原始音频信号中滤除,从而避免在声重放时,形成声反馈,产生啸叫或者拖音现象。经过预处理后,再对得到的预处理后音频进行分离,可以将麦克风阵列采集的噪声滤除。
可选的,还可以对至少一路原始音频信号进行信号分离后,对分离后的信号进行多通道降噪。具体地,可以利用分离后每一路信号以及其余各路信号,利用自适应滤波的算法,将其余各路信号作为参考,利用 每一路信号的信号特性,自适应滤除分离后的信号中的非对应音区的噪声和语音的残留。也可以采用深度学习的模型,通过预先训练好的模型,将各路分离信号输入模型中,最终输出纯净的分离后音频信号。
在本实施例中,第二确定模块704可以确定至少一路分离后音频信号分别对应的音区。具体地,各个分离后音频信号与实际的音区可能不是一一对应的,因此,需要将分离后音频信号和原始音频信号(或经过上述预处理操作后的音频信号)进行匹配,从而确定每个分离后音频信号对应的音区。
作为示例,可以确定各分离后音频信号和各原始音频信号(或经过上述预处理操作后的音频信号)两两之间的相似度,对于每个分离后音频信号,确定该分离后音频信号对应的最大相似度对应的原始音频信号,根据确定的原始音频信号对应的麦克风,可以确定该分离后音频信号的音区。
需要说明的是,麦克风与音区可以不是一一对应的。例如,麦克风可以设置在两个音区之间,当确定某个分离后音频信号对应的麦克风后,还可以采用其他方法确定该分离后音频信号是该麦克风从哪个音区采集的信号。例如,可以在目标空间设置摄像头,摄像头对各个音区拍摄图像,对拍摄的图像进行唇部动作识别,从而确定同一麦克风对应的各个音区中,哪个音区存在声源,进而确定分离后音频信号对应的音区。或者利用现有的声源定位技术,确定麦克风与声源的位置关系,从而确定声源所在的音区,进而确定分离后音频信号对应的音区。
在本实施例中,控制模块705可以控制目标空间内的音频播放设备(例如图1所示的音频播放设备105)播放音频播放音区对应的分离后音频信号。
具体地,在确定每个分离后音频信号对应的音区后,即可确定音频播放音区对应的分离后音频信号。控制模块705可以生成用于指示音频播放设备播放分离后音频信号的指令,音频播放设备基于该指令,播放对应的分离后音频信号。
参照图8,图8是本公开另一示例性实施例提供的音频播放装置的结构示意图。
在一些可选的实现方式中,第一确定模块701可以包括:第一确定单元7011,用于确定当前的音频播放模式;第二确定单元7012,用于基于所属音频播放模式,从预设数量个音区中确定音频播放音区。
在一些可选的实现方式中,该装置还可以包括:第二获取模块706,用于获取对用户发出的语音采集的语音信号;识别模块707,用于识别语音信号,得到语音识别结果;更新模块708,用于基于语音识别结果,从预设数量个音区中更新音频播放音区。
在一些可选的实现方式中,更新模块708可以包括:第三确定单元7081,用于确定语音识别结果指示的目标音区;控制单元7082,用于响应于确定目标音区为音频播放音区,且语音识别结果为表示停止播放目标音区对应的分离后音频信号的信息,控制音频播放设备停止播放目标音区对应的分离后音频信号。
在一些可选的实现方式中,更新模块708还包括:第一调整单元7083,用于响应于确定语音识别结果为表示将目标音区调整为音频播放音区的信息,将目标音区调整为音频播放音区。
在一些可选的实现方式中,该装置还可以包括:第一播放模块709,用于响应于确定语音识别结果为表示播放预设的音效的信息,确定语音识别结果对应的音效,并播放音效。
在一些可选的实现方式中,该装置还可以包括:第二调整模块710,用于响应于确定语音识别结果为表示将目标音区调整为主音频播放音区的信息,将目标音区调整为主音频播放音区,以及将目标音区以外的音频播放音区调整为辅音频播放音区;抑制模块711,用于对辅音频播放音区对应的分离后音频信号进行抑制,得到抑制后音频信号;第二播放模块712,用于将主音频播放音区对应的分离后音频信号和抑制后音频信号混合播放。
在一些可选的实现方式中,该装置还可以包括:第三确定模块713,用于确定当前的音频播放模式;评分模块714,用于响应于确定音频播放模式为第三模式,对在至少一个音频播放音区分别播放的分离后音频信号进行评分;选择模块715,用于基于评分,从至少一个音频播放音区中选择管理者音区;第三获 取模块716,用于获取管理者音区对应的语音信号;交互模块717,用于基于管理者音区对应的语音信号,进行语音交互操作。
本公开上述实施例提供的音频播放装置,通过从目标空间内的预设数量个音区中确定音频播放音区,然后获取麦克风阵列采集的至少一路原始音频信号,并对至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号,接着确定至少一路分离后音频信号分别对应的音区,最后控制音频播放设备播放音频播放音区对应的分离后音频信号,从而有效利用了固定设置的麦克风阵列采集某个音区内发出的音频信号并播放,无需设置单独的麦克风采集音频信号,用户也无需手持或移动到单独设置的麦克风的位置即可完成采集和播放音频,节约了硬件资源,方便了用户操作,同时还可以在播放时屏蔽其他非音频播放音区采集的音频信号,提高了音频播放的质量。
示例性电子设备
下面,参考图9来描述根据本公开实施例的电子设备。该电子设备可以是如图1所示的终端设备101和服务器103中的任一个或两者、或与它们独立的单机设备,该单机设备可以与终端设备101和服务器103进行通信,以从它们接收所采集到的输入信号。
图9图示了根据本公开实施例的电子设备的框图。
如图9所示,电子设备900包括一个或多个处理器901和存储器902。
处理器901可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备900中的其他组件以执行期望的功能。
存储器902可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器901可以运行程序指令,以实现上文的本公开的各个实施例的音频播放方法以及/或者其他期望的功能。在计算机可读存储介质中还可以存储诸如原始音频信号等各种内容。
在一个示例中,电子设备900还可以包括:输入装置903和输出装置904,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
例如,在该电子设备是终端设备101或服务器103时,该输入装置903可以是麦克风、鼠标、键盘等设备,用于输入原始音频信号、各种指令等。在该电子设备是单机设备时,该输入装置903可以是通信网络连接器,用于从终端设备101和服务器103接收所输入的原始音频信号、各种指令等。
该输出装置904可以向外部输出各种信息,包括分离后音频信号。该输出装置904可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图9中仅示出了该电子设备900中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备900还可以包括任何其他适当的组件。
示例性计算机程序产品和计算机可读存储介质
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的音频播放方法中的步骤。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外, 上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。

Claims (10)

  1. 一种音频播放方法,包括:
    从目标空间内的预设数量个音区中确定音频播放音区;
    获取预设的麦克风阵列采集的至少一路原始音频信号;
    对所述至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;
    确定所述至少一路分离后音频信号分别对应的音区;
    控制所述目标空间内的音频播放设备播放所述音频播放音区对应的分离后音频信号。
  2. 根据权利要求1所述的方法,其中,所述从目标空间内的预设数量个音区中确定音频播放音区,包括:
    确定当前的音频播放模式;
    基于所述音频播放模式,从所述预设数量个音区中确定音频播放音区。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取对用户发出的语音采集的语音信号;
    识别所述语音信号,得到语音识别结果;
    基于所述语音识别结果,从所述预设数量个音区中更新音频播放音区。
  4. 根据权利要求3所述的方法,其中,所述基于所述语音识别结果,从所述预设数量个音区中更新音频播放音区,包括:
    确定所述语音识别结果指示的目标音区;
    响应于确定所述目标音区为音频播放音区,且所述语音识别结果为表示停止播放所述目标音区对应的分离后音频信号的信息,控制所述音频播放设备停止播放所述目标音区对应的分离后音频信号。
  5. 根据权利要求3所述的方法,其中,在所述得到语音识别结果之后,所述方法还包括:
    响应于确定所述语音识别结果为表示播放预设的音效的信息,确定所述语音识别结果对应的音效,并播放所述音效。
  6. 根据权利要求3所述的方法,其中,在所述得到语音识别结果之后,所述方法还包括:
    响应于确定所述语音识别结果为表示将目标音区调整为主音频播放音区的信息,将所述目标音区调整为主音频播放音区,以及将所述目标音区以外的音频播放音区调整为辅音频播放音区;
    对所述辅音频播放音区对应的分离后音频信号进行抑制,得到抑制后音频信号;
    将所述主音频播放音区对应的分离后音频信号和所述抑制后音频信号混合播放。
  7. 根据权利要求1所述的方法,其中,在所述播放所述音频播放音区对应的分离后音频信号之后,所述方法还包括:
    确定当前的音频播放模式;
    响应于确定所述音频播放模式为第三模式,对在至少一个音频播放音区分别播放的分离后音频 信号进行评分;
    基于所述评分,从所述至少一个音频播放音区中选择管理者音区;
    获取所述管理者音区对应的语音信号;
    基于所述管理者音区对应的语音信号,进行语音交互操作。
  8. 一种音频播放装置,包括:
    第一确定模块,用于从目标空间内的预设数量个音区中确定音频播放音区;
    第一获取模块,用于获取预设的麦克风阵列采集的至少一路原始音频信号;
    分离模块,用于对所述至少一路原始音频信号进行信号分离,得到至少一路分离后音频信号;
    第二确定模块,用于确定所述至少一路分离后音频信号分别对应的音区;
    控制模块,用于控制所述目标空间内的音频播放设备播放所述音频播放音区对应的分离后音频信号。
  9. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-7任一所述的方法。
  10. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-7任一所述的方法。
PCT/CN2022/118396 2021-09-17 2022-09-13 音频播放方法、装置、计算机可读存储介质及电子设备 WO2023040820A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111095336.7 2021-09-17
CN202111095336.7A CN113808611A (zh) 2021-09-17 2021-09-17 音频播放方法、装置、计算机可读存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2023040820A1 true WO2023040820A1 (zh) 2023-03-23

Family

ID=78895850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118396 WO2023040820A1 (zh) 2021-09-17 2022-09-13 音频播放方法、装置、计算机可读存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN113808611A (zh)
WO (1) WO2023040820A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808611A (zh) * 2021-09-17 2021-12-17 深圳地平线机器人科技有限公司 音频播放方法、装置、计算机可读存储介质及电子设备
CN114708878A (zh) * 2022-03-30 2022-07-05 北京地平线机器人技术研发有限公司 音频信号处理方法、装置、存储介质和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030073408A1 (en) * 2001-10-12 2003-04-17 Harrell Michael R. Automated system and method for automative time-based audio verification
US20170242651A1 (en) * 2016-02-22 2017-08-24 Sonos, Inc. Audio Response Playback
CN110996308A (zh) * 2019-12-10 2020-04-10 歌尔股份有限公司 声音播放设备及其控制方法、控制装置和可读存储介质
CN112435682A (zh) * 2020-11-10 2021-03-02 广州小鹏汽车科技有限公司 车辆降噪系统、方法、装置、车辆及存储介质
CN113014983A (zh) * 2021-03-08 2021-06-22 Oppo广东移动通信有限公司 视频播放方法、装置、存储介质及电子设备
CN113345401A (zh) * 2021-05-31 2021-09-03 锐迪科微电子(上海)有限公司 可穿戴设备的主动降噪系统的校准方法及装置、存储介质、终端
CN113808611A (zh) * 2021-09-17 2021-12-17 深圳地平线机器人科技有限公司 音频播放方法、装置、计算机可读存储介质及电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785819B (zh) * 2018-12-22 2023-03-07 深圳唐恩科技有限公司 多个麦克风的关联方法、存储介质、麦克风及唱歌系统
CN109637532A (zh) * 2018-12-25 2019-04-16 百度在线网络技术(北京)有限公司 语音识别方法、装置、车载终端、车辆及存储介质
CN109922290A (zh) * 2018-12-27 2019-06-21 蔚来汽车有限公司 用于车辆的音视频合成方法、装置、系统、设备及车辆
JP7383942B2 (ja) * 2019-09-06 2023-11-21 ヤマハ株式会社 車載音響システムおよび車両
CN113270082A (zh) * 2020-02-14 2021-08-17 广州汽车集团股份有限公司 一种车载ktv控制方法及装置、以及车载智能网联终端
CN112397065A (zh) * 2020-11-04 2021-02-23 深圳地平线机器人科技有限公司 语音交互方法、装置、计算机可读存储介质及电子设备
CN113225716A (zh) * 2021-04-19 2021-08-06 北京塞宾科技有限公司 一种车载k歌实现方法、系统、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030073408A1 (en) * 2001-10-12 2003-04-17 Harrell Michael R. Automated system and method for automative time-based audio verification
US20170242651A1 (en) * 2016-02-22 2017-08-24 Sonos, Inc. Audio Response Playback
CN110996308A (zh) * 2019-12-10 2020-04-10 歌尔股份有限公司 声音播放设备及其控制方法、控制装置和可读存储介质
CN112435682A (zh) * 2020-11-10 2021-03-02 广州小鹏汽车科技有限公司 车辆降噪系统、方法、装置、车辆及存储介质
CN113014983A (zh) * 2021-03-08 2021-06-22 Oppo广东移动通信有限公司 视频播放方法、装置、存储介质及电子设备
CN113345401A (zh) * 2021-05-31 2021-09-03 锐迪科微电子(上海)有限公司 可穿戴设备的主动降噪系统的校准方法及装置、存储介质、终端
CN113808611A (zh) * 2021-09-17 2021-12-17 深圳地平线机器人科技有限公司 音频播放方法、装置、计算机可读存储介质及电子设备

Also Published As

Publication number Publication date
CN113808611A (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
US11929088B2 (en) Input/output mode control for audio processing
US10782928B2 (en) Apparatus and method for providing various audio environments in multimedia content playback system
WO2023040820A1 (zh) 音频播放方法、装置、计算机可读存储介质及电子设备
CN107168518B (zh) 一种用于头戴显示器的同步方法、装置及头戴显示器
US20140241702A1 (en) Dynamic audio perspective change during video playback
JP7453712B2 (ja) オーディオ再生方法、装置、コンピュータ可読記憶媒体及び電子機器
US20170148438A1 (en) Input/output mode control for audio processing
CN109345817A (zh) 大屏幕系统控制方法、装置及电子设备
JP6678315B2 (ja) 音声再生方法、音声対話装置及び音声対話プログラム
CN111768755A (zh) 信息处理方法、装置、车辆和计算机存储介质
US12073844B2 (en) Audio-visual hearing aid
JP7417272B2 (ja) 端末装置、サーバ装置、配信方法、学習器取得方法、およびプログラム
KR102650763B1 (ko) 오디오 소스 지향성에 기초한 심리음향 강화
CN111696566B (zh) 语音处理方法、装置和介质
CN114734942A (zh) 调节车载音响音效的方法及装置
CN111627417B (zh) 播放语音的方法、装置及电子设备
CN111696565B (zh) 语音处理方法、装置和介质
CN111696564B (zh) 语音处理方法、装置和介质
CN116320144B (zh) 一种音频播放方法及电子设备、可读存储介质
US20240340605A1 (en) Information processing device and method, and program
CN114664294A (zh) 一种音频数据处理方法、装置及电子设备
Björnsson Amplified Speech in Live Theatre, What should it Sound Like?
KR20150119013A (ko) 분리용 데이터 처리 장치 및 프로그램
CN118202669A (zh) 信息处理装置、信息处理方法和程序
CN118921497A (zh) 显示设备、服务器及直播互动方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22869189

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18-07-2024)