Nothing Special   »   [go: up one dir, main page]

CN114639395A - Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device - Google Patents

Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device Download PDF

Info

Publication number
CN114639395A
CN114639395A CN202011489364.2A CN202011489364A CN114639395A CN 114639395 A CN114639395 A CN 114639395A CN 202011489364 A CN202011489364 A CN 202011489364A CN 114639395 A CN114639395 A CN 114639395A
Authority
CN
China
Prior art keywords
user
audio information
vehicle
emotional
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011489364.2A
Other languages
Chinese (zh)
Inventor
蔡汉嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qoros Automotive Co Ltd
Original Assignee
Qoros Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qoros Automotive Co Ltd filed Critical Qoros Automotive Co Ltd
Priority to CN202011489364.2A priority Critical patent/CN114639395A/en
Publication of CN114639395A publication Critical patent/CN114639395A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses voice control method and device of vehicle-mounted virtual character and a vehicle with the same, wherein the method comprises the following steps: collecting audio information of a user, and identifying emotional characters by the audio information; analyzing user semantics from the emotion characters to analyze the current psychological condition of the user according to the user semantics; and generating emotional actions corresponding to the current psychological condition according to the amplitude of the audio information, and controlling the vehicle-mounted virtual character to express the current psychological condition and/or execute the emotional actions. According to the voice control method of the vehicle-mounted virtual character, the current psychological emotional condition of a user is analyzed and judged through the local non-network or big data network, the depth of the audio amplitude is combined and used as the basis of emotional action, and the large screen of the central control system is integrally displayed, so that the use pleasure of a vehicle is enriched, and the use experience of the user is greatly improved.

Description

Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device
Technical Field
The application relates to the technical field of vehicles, in particular to a voice control method and device for a vehicle-mounted virtual character and a vehicle with the same.
Background
At present, most of voice semantic modules on vehicles are used for integrating processing of voice semantics, and controlling the vehicles to perform corresponding actions according to instructions of users, such as opening/closing an air conditioner, lifting a window, and the like.
However, there is no technical solution for deriving corresponding emotional actions in a vehicle according to the speech semantics of a user in the related art.
Content of application
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the first purpose of the invention is to provide a voice control method for a vehicle-mounted virtual character, which analyzes and judges the current psychological emotional condition of a user through a local non-network or big data network, and integrates the depth of audio amplitude as the basis of emotional action to integrally display a large central control screen, thereby enriching the use pleasure of a vehicle and greatly improving the use experience of the user.
The second purpose of the invention is to provide a voice control device of the vehicle-mounted virtual character.
A third object of the invention is to propose a vehicle.
In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for controlling a voice of a vehicle-mounted virtual character, including the following steps:
collecting audio information of a user, and identifying emotional characters by the audio information;
analyzing user semantics from the emotion words to analyze the current psychological condition of the user according to the user semantics; and
generating emotional action corresponding to the current psychological condition according to the amplitude of the audio information, and controlling the vehicle-mounted character to express the current psychological condition and/or execute the emotional action.
In addition, the voice control method for the vehicle-mounted virtual character according to the above embodiment of the present invention may further have the following additional technical features:
optionally, before the acquiring the audio information of the user, the method further includes:
judging whether the voice instruction of the user is a wake-up instruction or not;
and after the voice instruction is detected to be awakened, picking up sound of the user to obtain the audio information.
Optionally, before the acquiring the audio information of the user, the method further includes:
judging whether to trigger a wake-up key or not;
and after the awakening key is detected to be triggered, picking up sound of the user to obtain the audio information.
Optionally, the analyzing the user semantics from the emotional words further includes:
matching the emotion words with user semantics of the local database;
if the matching is successful, determining the user semantics according to the matching result, otherwise, sending the emotion words to a server, and receiving the matching result obtained by the server according to the emotion words.
Optionally, the generating an emotional action corresponding to the current psychological condition according to the amplitude of the audio information includes:
processing the audio information, and screening out the audio information meeting the preset requirement;
calculating the average amplitude of the audio information, and generating the emotional action according to the average amplitude.
In order to achieve the above object, a second aspect of the present application provides a voice control apparatus for a vehicle-mounted virtual character, including:
the acquisition module is used for acquiring audio information of a user and identifying emotional characters by the audio information;
the analysis module is used for analyzing user semantics from the emotion characters so as to analyze the current psychological condition of the user according to the user semantics; and
and the control module is used for generating emotional actions corresponding to the current psychological condition according to the amplitude of the audio information, and controlling the vehicle-mounted character to express the current psychological condition and/or execute the emotional actions.
Optionally, before the acquiring the audio information of the user, the acquiring module further includes:
the first judgment unit is used for judging whether the voice instruction of the user is a wake-up instruction or not;
and the second acquisition unit is used for picking up sound of the user to obtain the audio information after the voice instruction is detected to be awakened.
Optionally, before the acquiring the audio information of the user, the acquiring module further includes:
the first judgment unit is used for judging whether the awakening key is triggered or not;
and the second acquisition unit is used for picking up sound of the user after the wake-up key is detected to be triggered so as to obtain the audio information.
Optionally, the parsing module further includes:
the matching unit is used for matching the emotion characters with the user semantics of the local database;
the receiving unit is used for determining the user semantics according to a matching result if the matching is successful, otherwise, sending the emotion characters to a server and receiving the matching result obtained by the server according to the emotion characters;
the control module includes:
the processing unit is used for processing the audio information and screening out the audio information meeting the preset requirement;
and the computing unit is used for computing the average amplitude of the audio information and generating the emotional action according to the average amplitude.
In order to achieve the above object, a third embodiment of the present application provides a vehicle including the above voice control apparatus for an in-vehicle avatar.
Therefore, the audio information of the user can be collected, the emotion words are identified by the audio information, the user semantics are analyzed from the emotion words, the current psychological condition of the user is analyzed according to the user semantics, the emotional action corresponding to the current psychological condition is generated according to the amplitude of the audio information, and the vehicle-mounted character is controlled to express the current psychological condition and/or execute the emotional action. Therefore, the current psychological emotional condition of the user is analyzed and judged through the local non-network or big data network, the depth of the audio amplitude is combined, the emotional action is taken as the basis, the interfaces such as UART (universal asynchronous receiver/transmitter) of the voice semantic module are penetrated, the large-screen integrated display of the central control system is achieved, the use pleasure of the vehicle is enriched, and the use experience of the user is greatly improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a voice control method for a vehicle-mounted virtual character according to an embodiment of the present application;
FIG. 2 is an exemplary diagram of a central control large screen integration interface according to one embodiment of the present application;
FIG. 3 is an exemplary diagram of a central control large screen integration interface according to one embodiment of the present application;
FIG. 4 is an exemplary diagram of an integrated interface for controlling a large screen in accordance with another embodiment of the present application;
FIG. 5 is an exemplary diagram of a central control large screen integration interface according to yet another embodiment of the present application;
FIG. 6 is a flowchart of a voice control method for a vehicle-mounted avatar according to one embodiment of the present application;
FIG. 7 is a schematic diagram of a speech semantic module architecture according to one embodiment of the present application;
FIG. 8 is a block diagram of a voice control apparatus for a vehicle-mounted avatar according to an embodiment of the present application;
FIG. 9 is a block schematic diagram of a vehicle according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The voice control method and apparatus for a vehicle-mounted virtual character and a vehicle having the same according to an embodiment of the present invention will be described below with reference to the accompanying drawings, and first, the voice control method for a vehicle-mounted virtual character according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Specifically, fig. 1 is a schematic flow chart of a voice control method for a vehicle-mounted virtual character according to an embodiment of the present application.
As shown in fig. 1, the voice control method of the vehicle-mounted virtual character comprises the following steps:
in step S101, audio information of the user is collected, and emotional characters are identified from the audio information.
It can be understood that the embodiment of the application can acquire the audio information of the user through the electronic device to analyze and process the audio information of the user. The audio information refers to electronic information acquired by the electronic device according to the words spoken by the user, for example, after a microphone is turned on, sound is picked up, and picked voice audio is converted into words, for example, if the audio information of the user is collected as "i want to be angry", the emotion words identified by the audio information may be "angry", and if the audio information of the user is collected as "good chat", the emotion words identified by the audio information may be "boring".
It should be noted that the execution main body of the embodiment may be an electronic device with a related data processing function, such as a microphone, a tablet computer (Pad), a computer with a wireless transceiving function, an electronic device in self driving (self driving), and the like, and is not limited in particular.
In addition, the electronic device may obtain the audio data of the user by, but not limited to, the following several ways:
(1) capturing, by audio processing software within the electronic device, audio data of the user;
(2) and collecting the audio data of the user by utilizing an audio processing device such as a microphone arranged in the electronic equipment.
Both methods can be used in the present step either alternatively or in combination.
It should be noted that, in order to improve the accuracy of audio acquisition, the embodiment of the present application may determine whether acquisition is needed before acquiring the audio information of the user, which is explained below with reference to a specific embodiment.
As a possible implementation manner, in some embodiments, before acquiring the audio information of the user, the method further includes: judging whether the voice instruction of the user is a wake-up instruction or not; and after the voice command is detected to be awaken, picking up the sound of the user to obtain audio information.
It can be understood that, when the audio information of the user is detected, the embodiment of the present application determines whether the voice instruction of the user is a wake-up instruction, for example, the wake-up instruction is "hello, whit (can be changed according to the personal preference)", and when the voice instruction of the user is determined to be "hello, whit", the speech spoken by the user starts to be collected as the audio information.
As another possible implementation manner, in some embodiments, before the collecting the audio information of the user, the method further includes: judging whether to trigger a wake-up key or not; and after detecting that the awakening key is triggered, picking up sound for the user to obtain audio information.
It is understood that the wake-up button may be a physical operation button disposed on the vehicle, and when the user presses the wake-up button for a short time or a long time, the sound pickup function may be triggered, so as to collect audio information of the user.
In step S102, user semantics are analyzed from the emotion words to analyze the current psychological condition of the user according to the user semantics.
Optionally, in some embodiments, analyzing the user semantics from the emotional text further comprises: matching the emotion characters with user semantics of a local database; if the matching is successful, determining the user semantics according to the matching result, otherwise, sending the emotion characters to the server, and receiving the matching result obtained by the server according to the emotion characters.
The local database may be a database preset by a person skilled in the art, or may be a database obtained through big data analysis, and is not specifically limited herein.
It can be understood that, in the embodiment of the present application, the emotion words identified in step S101 may be matched with words in the local database, for example, if the identified emotion words are "angry", and there is "angry" in the user semantics in the local database, it indicates that the emotion words are successfully matched with the user semantics in the local database, and if the identified emotion words are "boring", and there is no "boring" in the user semantics in the local database, and there is no related word, it indicates that the emotion words are unsuccessfully matched with the user semantics in the local database, at this time, the emotion words need to be sent to the server, the server further analyzes the emotion words, and receives a matching result after the server has analyzed the emotion words.
It should be noted that, no matter whether the server successfully parses, the parsing result is returned.
In step S103, an emotional action corresponding to the current psychological condition is generated according to the amplitude of the audio information, and the in-vehicle character is controlled to express the current psychological condition and/or to perform the emotional action.
Optionally, in some embodiments, generating an emotional action corresponding to the current psychological condition from the amplitude of the audio information comprises: processing the audio information, and screening out the audio information meeting the preset requirement; and calculating the average amplitude of the audio information, and generating emotional action according to the average amplitude.
It is understood that the embodiment of the present application may generate emotional actions according to the average amplitude lightness (OOdB) of the collected audio of the user. Therefore, the present application needs to analyze the depth of the amplitude in advance, for example, a < ═ XXdB is small, B > - < XXdB is medium, C > - < XXdB is large, where the processed audio is determined to be small on average at 39dB or less, if the processed audio is determined to be medium on average at 40dB or more, if the processed audio is determined to be large on average at 60dB or more, a < - > 39dB is small, B > - < 40dB is medium, and C > - < 60dB is large. It should be noted that, before performing average amplitude or decibel magnitude, noise reduction and echo cancellation may be performed on the audio in the embodiment of the present application; the average amplitude of a segment of audio is judged according to the emotion depth, such as smile, general smile and wild smile. For example, assuming that the processed audio averages below 39 decibels, a smile is made; if the average of the processed audio is 40-60 dB, performing general smiling; if the processed audio is above 60 decibels on average, then a smile is made that can be displayed in the center-controlled large screen of the vehicle.
For example, referring to fig. 2 to 5, assuming that the collected emotional text is "splenic qi is about to burst", when the processed audio is below 39db on average, the interface shown in fig. 2 or 3 may be displayed in the central control large screen of the vehicle, and at this time, the image may be in a live state, and an inverted small hand is beside the image, or a lightning is provided; when the processed audio is more than 40db and less than 60db on average, the interface shown in fig. 4 can be displayed on the central control large screen of the vehicle, and at this time, the image can be slightly enlarged and the side hand can be correspondingly enlarged; when the average processed audio frequency is more than 60 decibels, an interface shown in the figure 5 can be displayed in a central control large screen of the vehicle, the image is larger, and lightning is larger, so that the condition that splenic qi explodes is further highlighted and is very vivid.
Therefore, according to the voice semantic module, emotion judgment can be performed on local semantics in advance, if no result is obtained, emotion judgment is performed on cloud semantics, the main emotion of the user is judged through the various mixed type, the driving psychological state can be effectively simulated and measured, and the depth of the average amplitude (emotion action) of one section of audio is used as the depth of the average amplitudeMicro-meterLaughing,Big (a)Laughing,Mania ofSmile and the like, and then derive the action diagrams of zoom-out, zoom-in little praise, general praise, praise and the like.
Finally, the voice semantic module processes two parts of semantic understanding and emotion action in advance, and then transmits the two parts to the large central control screen in a Universal Asynchronous Receiver/Transmitter (UART) mode and the like, so that module preprocessing action is performed, reprocessing of the large central control screen is reduced, and from the aspect of software, open copying of codes can be avoided, and the large central control screen can also perform code reduction action.
In order to enable those skilled in the art to further understand the voice control method of the vehicle-mounted virtual character according to the embodiment of the present application, the following detailed description is provided with reference to specific embodiments.
As shown in fig. 6, the voice control method for the vehicle-mounted virtual character includes the following steps:
s601, the user wakes up the microphone on the vehicle.
The microphone may be disposed on the central control large screen of the vehicle, or may be disposed in other positions, which is not specifically limited herein. The embodiment of the application can be awakened through a physical party control voice key or a voice awakening word, and software voice recognition (short time) is started from a hardware level (resident).
S602, collecting the audio information of the user.
And S603, recognizing the emotion characters according to the audio information.
It can be understood that when the software voice recognition is started, the picked voice frequency can be temporarily stored to be used for subsequent voice-to-text conversion.
And S604, locally performing semantic understanding on the characters.
S605, judging whether the semantic understanding of the local character has a result, if so, executing the step S607, otherwise, executing the step S606.
It can be understood that, in the embodiment of the present application, it may be determined in advance whether the local semantic understanding of the text is emotion-related, so as to facilitate subsequent processing.
S606, semantic understanding is carried out on the characters through the cloud server, and the step S609 is executed in a skipping mode.
It can be understood that semantic understanding of the characters through the cloud server indicates that no result is generated in local emotion analysis, and the emotion analysis needs to be carried out by the cloud.
S607, the local semantic is analyzed to be related to the emotion.
It is to be understood that here it is indicated that the locally resolving of the emotion related semantics is successful.
S608, calculate a segment of audio average amplitude OOdB (echo, noise, etc. are subtracted), and analyze the depth in advance as the basis of emotional action, and the audio average amplitude assists in making depth (small, medium, large) determination, where a < ═ XXdB is small, B > -XXdB is medium, and C > -XXdB is large.
It should be noted that, whether there is a semantic result in the local or cloud, the average amplitude of the audio in a session is calculated to provide the depth of the emotional action.
And S609, the voice semantic module processes semantic local or cloud + emotion action depth in advance, and central control large screen integration is provided in a UART (universal asynchronous receiver/transmitter) mode and the like.
That is to say, in the embodiment of the present application, the local or cloud semantics processed by the voice semantic module can be added with the determined emotional action depth, and then the processed emotional action depth is input to the large central control screen through the UART pin signal, as shown in fig. 7, the voice semantic module at least includes: the functions and functions of the 5V pin, the Ground GND pin, the UARTRX receiving pin, the UARTTX transmitting pin, the MCU (micro controller Unit), the Flash memory space, and the microphone are consistent with the methods used in the related art, and are not described in detail herein to avoid redundancy.
And S610, receiving emotion information of the large screen of the central control.
That is to say, the embodiment of the present application may collect information including local emotion semantic understanding information or cloud emotion semantic understanding information and emotion action depth information from, for example, a UART, and the like, and both the information may be analyzed.
S611, the large central control screen obtains semantic main emotion and emotion action depth results from the UART.
That is, the central control large screen is mainly from UART to determine the main emotion through the analyzed semantics, and then to determine and estimate the emotion such as smile, general smile, and wild smile according to the emotion depth.
S612, the action graphic can be changed according to the UART emotion depth result, and the 2D/3D action graphic can be enlarged or reduced.
That is, the large central control screen of the embodiment of the present application can specify the action by knowing the emotion level from the UART, such asSmallPraise and raise,InPraise and raise,Big (a)Like, etc., and zoom out or zoom in on the 2D/3D emotional action illustration.
S613, determining emotion-derived action, and changing UI (User Interface Design).
That is, the emotion determination is finished, and the result information is notified to the UI to change the emotional expression + action of the person.
In summary, the application is a voice semantic module of the emotion actions of the vehicle-mounted character, which mainly uses the characters after voice recognition to carry out semantic understanding on the characters from a big data platform, and can guess the psychological condition from the semantic understanding and analyzing semantics (joy, anger, sadness) after big data analysis, and map the psychological condition to a central control large screen to display the emotion of the head portrait of the character; the voice semantic module can process local semantics or cloud semantics and audio amplitude depth information in advance, and provides interfaces such as a UART (universal asynchronous receiver/transmitter) to enable the large screen of the central control system to process.
Secondly, analyzing the action depth of the reaction again according to the average amplitude depth from the head to the tail of the audio, and if the amplitude of the audio is smaller on average, the head of the UI character is praise and the action is reduced and illustrated; the audio average amplitude is larger, the UI character avatar appears as a closed fist, and the action is magnified.
According to the voice control method of the vehicle-mounted virtual character, the audio information of the user can be collected, the emotion characters are identified through the audio information, the user semantics are analyzed from the emotion characters, the current psychological condition of the user is analyzed according to the user semantics, the emotional action corresponding to the current psychological condition is generated according to the amplitude of the audio information, and the vehicle-mounted character is controlled to express the current psychological condition and/or execute the emotional action. Therefore, the current psychological emotional condition of the user is analyzed and judged through the local non-network or big data network, the depth of the audio amplitude is combined, the emotional action is taken as the basis, the interfaces such as UART (universal asynchronous receiver/transmitter) of the voice semantic module are penetrated, the large-screen integrated display of the central control system is achieved, the use pleasure of the vehicle is enriched, and the use experience of the user is greatly improved.
Next, a voice control apparatus of a vehicle-mounted virtual character proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 8 is a block diagram schematically illustrating a voice control apparatus for a vehicle-mounted virtual character according to an embodiment of the present application.
As shown in fig. 8, the voice control device 10 for the vehicle-mounted virtual character includes: an acquisition module 100, a parsing module 200 and a control module 300.
The acquisition module 100 is configured to acquire audio information of a user, and identify emotional characters from the audio information;
the analysis module 200 is used for analyzing the user semantics from the emotion characters so as to analyze the current psychological condition of the user according to the user semantics; and
the control module 300 is configured to generate an emotional action corresponding to the current psychological condition according to the amplitude of the audio information, and control the vehicle-mounted character to represent the current psychological condition and/or execute the emotional action.
Optionally, in some embodiments, before capturing the audio information of the user, the capturing module 100 further includes:
the first judgment unit is used for judging whether the voice instruction of the user is a wake-up instruction or not;
and the second acquisition unit is used for picking up sound of the user to obtain audio information after the voice instruction is detected to be awaken.
Optionally, in some embodiments, before capturing the audio information of the user, the capturing module 100 further includes:
the first judgment unit is used for judging whether the awakening key is triggered or not;
and the second acquisition unit is used for picking up sound of the user after the wake-up key is detected to be triggered so as to obtain the audio information.
Optionally, in some embodiments, the parsing module 200 further comprises:
the matching unit is used for matching the emotion characters with user semantics of a local database;
the receiving unit is used for determining user semantics according to a matching result if the matching is successful, otherwise, sending the emotion characters to the server, and receiving the matching result obtained by the server according to the emotion characters;
the control module 300 includes:
the processing unit is used for processing the audio information and screening out the audio information meeting the preset requirement;
and the computing unit is used for computing the average amplitude of the audio information and generating emotional action according to the average amplitude.
It should be noted that the foregoing explanation of the embodiment of the voice control method for the vehicle-mounted virtual character is also applicable to the voice control apparatus for the vehicle-mounted virtual character of the embodiment, and details are not repeated here.
According to the voice control device of the vehicle-mounted virtual character, provided by the embodiment of the application, the audio information of a user can be collected, the emotion characters are identified by the audio information, the user semantics are analyzed from the emotion characters, the current psychological condition of the user is analyzed according to the user semantics, the emotional action corresponding to the current psychological condition is generated according to the amplitude of the audio information, and the vehicle-mounted character is controlled to express the current psychological condition and/or execute the emotional action. Therefore, the current psychological emotional condition of the user is analyzed and judged through the local non-network or big data network, the depth of the audio amplitude is combined, the emotional action is taken as the basis, the interfaces such as UART (universal asynchronous receiver/transmitter) of the voice semantic module are penetrated, the large-screen integrated display of the central control system is achieved, the use pleasure of the vehicle is enriched, and the use experience of the user is greatly improved.
In addition, as shown in fig. 9, the embodiment of the present application also proposes a vehicle 20, where the vehicle 20 includes the above-mentioned voice control device 10 for the vehicle-mounted virtual character.
According to the vehicle provided by the embodiment of the application, through the voice control device of the vehicle-mounted virtual character, the current psychological emotion condition of a user can be analyzed and judged locally without a network or a big data network, the depth of the audio amplitude is combined to be used as the basis of emotion actions, the integrated display of a central control large screen is realized through interfaces such as a UART (universal asynchronous receiver/transmitter) of the voice semantic module, the use pleasure of the vehicle is enriched, and the use experience of the user is greatly improved.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A voice control method of a vehicle-mounted virtual character is characterized by comprising the following steps:
collecting audio information of a user, and identifying emotional characters by the audio information;
analyzing user semantics from the emotion words to analyze the current psychological condition of the user according to the user semantics; and
generating emotional action corresponding to the current psychological condition according to the amplitude of the audio information, and controlling the vehicle-mounted virtual character to express the current psychological condition and/or execute the emotional action.
2. The method of claim 1, prior to the capturing audio information of the user, further comprising:
judging whether the voice instruction of the user is a wake-up instruction or not;
and after the voice instruction is detected to be awakened, picking up sound of the user to obtain the audio information.
3. The method of claim 1, prior to the capturing audio information of the user, further comprising:
judging whether to trigger a wake-up key or not;
and after the awakening key is detected to be triggered, picking up sound of the user to obtain the audio information.
4. The method of claim 1, wherein analyzing user semantics from the emotional text further comprises:
matching the emotion words with user semantics of the local database;
if the matching is successful, determining the user semantics according to the matching result, otherwise, sending the emotion words to a server, and receiving the matching result obtained by the server according to the emotion words.
5. The method of claim 1, wherein generating an emotional action corresponding to the current mental condition based on the amplitude of the audio information comprises:
processing the audio information, and screening out the audio information meeting the preset requirement;
calculating the average amplitude of the audio information, and generating the emotional action according to the average amplitude.
6. A voice control device for a vehicle-mounted virtual character, comprising:
the acquisition module is used for acquiring audio information of a user and identifying emotional characters by the audio information;
the analysis module is used for analyzing user semantics from the emotion characters so as to analyze the current psychological condition of the user according to the user semantics; and
and the control module is used for generating emotional actions corresponding to the current psychological condition according to the amplitude of the audio information, and controlling the vehicle-mounted virtual character to express the current psychological condition and/or execute the emotional actions.
7. The apparatus of claim 6, wherein prior to the capturing the audio information of the user, the capturing module further comprises:
the first judgment unit is used for judging whether the voice instruction of the user is a wake-up instruction or not;
and the second acquisition unit is used for picking up sound of the user to obtain the audio information after the voice instruction is detected to be awakened.
8. The apparatus of claim 6, wherein prior to the capturing the audio information of the user, the capturing module further comprises:
the first judgment unit is used for judging whether the awakening key is triggered or not;
and the second acquisition unit is used for picking up sound of the user after the wake-up key is detected to be triggered so as to obtain the audio information.
9. The apparatus of claim 6, wherein the parsing module further comprises:
the matching unit is used for matching the emotion characters with the user semantics of the local database;
the receiving unit is used for determining the user semantics according to a matching result if the matching is successful, otherwise, sending the emotion characters to a server, and receiving the matching result obtained by the server according to the emotion characters;
the control module includes:
the processing unit is used for processing the audio information and screening out the audio information meeting the preset requirement;
and the calculating unit is used for calculating the average amplitude of the audio information and generating the emotional action according to the average amplitude.
10. A vehicle, characterized by comprising: the voice control apparatus of the vehicle-mounted virtual character of any one of claims 6 to 9.
CN202011489364.2A 2020-12-16 2020-12-16 Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device Pending CN114639395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011489364.2A CN114639395A (en) 2020-12-16 2020-12-16 Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011489364.2A CN114639395A (en) 2020-12-16 2020-12-16 Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device

Publications (1)

Publication Number Publication Date
CN114639395A true CN114639395A (en) 2022-06-17

Family

ID=81945437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011489364.2A Pending CN114639395A (en) 2020-12-16 2020-12-16 Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device

Country Status (1)

Country Link
CN (1) CN114639395A (en)

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101669090A (en) * 2007-04-26 2010-03-10 福特全球技术公司 Emotive advisory system and method
CN104486331A (en) * 2014-12-11 2015-04-01 上海元趣信息技术有限公司 Multimedia file processing method, client terminals and interaction system
US20150331666A1 (en) * 2014-05-15 2015-11-19 Tyco Safety Products Canada Ltd. System and Method for Processing Control Commands in a Voice Interactive System
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
CN105345818A (en) * 2015-11-04 2016-02-24 深圳好未来智能科技有限公司 3D video interaction robot with emotion module and expression module
CN105931644A (en) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 Voice recognition method and mobile terminal
CN106297782A (en) * 2016-07-28 2017-01-04 北京智能管家科技有限公司 A kind of man-machine interaction method and system
CN106782539A (en) * 2017-01-16 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of intelligent sound exchange method, apparatus and system
JP2017181667A (en) * 2016-03-29 2017-10-05 トヨタ自動車株式会社 Voice recognition apparatus and voice recognition method
CN107861626A (en) * 2017-12-06 2018-03-30 北京光年无限科技有限公司 The method and system that a kind of virtual image is waken up
WO2018141140A1 (en) * 2017-02-06 2018-08-09 中兴通讯股份有限公司 Method and device for semantic recognition
CN108962255A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Emotion identification method, apparatus, server and the storage medium of voice conversation
CN109754256A (en) * 2017-11-08 2019-05-14 徐蔚 Model, device, system, methods and applications based on code chain
CN109801625A (en) * 2018-12-29 2019-05-24 百度在线网络技术(北京)有限公司 Control method, device, user equipment and the storage medium of virtual speech assistant
CN109885277A (en) * 2019-02-26 2019-06-14 百度在线网络技术(北京)有限公司 Human-computer interaction device, mthods, systems and devices
US20190251965A1 (en) * 2018-02-15 2019-08-15 DMAI, Inc. System and method for conversational agent via adaptive caching of dialogue tree
CN110262665A (en) * 2019-06-26 2019-09-20 北京百度网讯科技有限公司 Method and apparatus for output information
CN110377761A (en) * 2019-07-12 2019-10-25 深圳传音控股股份有限公司 A kind of method and device enhancing video tastes
CN110390933A (en) * 2018-04-20 2019-10-29 比亚迪股份有限公司 State methods of exhibiting, device and the displaying vehicle system of vehicle intelligent voice system
CN110874557A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Video generation method and device for voice-driven virtual human face
CN111124123A (en) * 2019-12-24 2020-05-08 苏州思必驰信息科技有限公司 Voice interaction method and device based on virtual robot image and intelligent control system of vehicle-mounted equipment

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101669090A (en) * 2007-04-26 2010-03-10 福特全球技术公司 Emotive advisory system and method
US20150331666A1 (en) * 2014-05-15 2015-11-19 Tyco Safety Products Canada Ltd. System and Method for Processing Control Commands in a Voice Interactive System
CN104486331A (en) * 2014-12-11 2015-04-01 上海元趣信息技术有限公司 Multimedia file processing method, client terminals and interaction system
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
CN105345818A (en) * 2015-11-04 2016-02-24 深圳好未来智能科技有限公司 3D video interaction robot with emotion module and expression module
JP2017181667A (en) * 2016-03-29 2017-10-05 トヨタ自動車株式会社 Voice recognition apparatus and voice recognition method
CN105931644A (en) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 Voice recognition method and mobile terminal
CN106297782A (en) * 2016-07-28 2017-01-04 北京智能管家科技有限公司 A kind of man-machine interaction method and system
CN106782539A (en) * 2017-01-16 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of intelligent sound exchange method, apparatus and system
WO2018141140A1 (en) * 2017-02-06 2018-08-09 中兴通讯股份有限公司 Method and device for semantic recognition
CN108399919A (en) * 2017-02-06 2018-08-14 中兴通讯股份有限公司 A kind of method for recognizing semantics and device
CN109754256A (en) * 2017-11-08 2019-05-14 徐蔚 Model, device, system, methods and applications based on code chain
CN107861626A (en) * 2017-12-06 2018-03-30 北京光年无限科技有限公司 The method and system that a kind of virtual image is waken up
US20190251965A1 (en) * 2018-02-15 2019-08-15 DMAI, Inc. System and method for conversational agent via adaptive caching of dialogue tree
CN110390933A (en) * 2018-04-20 2019-10-29 比亚迪股份有限公司 State methods of exhibiting, device and the displaying vehicle system of vehicle intelligent voice system
CN108962255A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Emotion identification method, apparatus, server and the storage medium of voice conversation
CN110874557A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Video generation method and device for voice-driven virtual human face
CN109801625A (en) * 2018-12-29 2019-05-24 百度在线网络技术(北京)有限公司 Control method, device, user equipment and the storage medium of virtual speech assistant
CN109885277A (en) * 2019-02-26 2019-06-14 百度在线网络技术(北京)有限公司 Human-computer interaction device, mthods, systems and devices
CN110262665A (en) * 2019-06-26 2019-09-20 北京百度网讯科技有限公司 Method and apparatus for output information
CN110377761A (en) * 2019-07-12 2019-10-25 深圳传音控股股份有限公司 A kind of method and device enhancing video tastes
CN111124123A (en) * 2019-12-24 2020-05-08 苏州思必驰信息科技有限公司 Voice interaction method and device based on virtual robot image and intelligent control system of vehicle-mounted equipment

Similar Documents

Publication Publication Date Title
JP2006030447A (en) Voice recognition system and moving body and vehicle having the system
CN109254669B (en) Expression picture input method and device, electronic equipment and system
CN111325386B (en) Method, device, terminal and storage medium for predicting running state of vehicle
CN110598576A (en) Sign language interaction method and device and computer medium
JP2004237022A (en) Information processing device and method, program and recording medium
CN113327620B (en) Voiceprint recognition method and device
JP2012059107A (en) Emotion estimation device, emotion estimation method and program
CN109302486B (en) Method and system for pushing music according to environment in vehicle
DE112018007847B4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
CN110808038A (en) Mandarin assessment method, device, equipment and storage medium
CN109166571A (en) Wake-up word training method, device and the household appliance of household appliance
CN108780644A (en) The system and method for means of transport, speech pause length for adjusting permission in voice input range
CN113643704A (en) Test method, upper computer, system and storage medium of vehicle-mounted machine voice system
CN113808621A (en) Method and device for marking voice conversation in man-machine interaction, equipment and medium
CN111192583B (en) Control device, agent device, and computer-readable storage medium
CN115205917A (en) Man-machine interaction method and electronic equipment
CN114639395A (en) Voice control method and device for vehicle-mounted virtual character and vehicle with voice control device
CN111429882B (en) Voice playing method and device and electronic equipment
KR20210063698A (en) Electronic device and method for controlling the same, and storage medium
CN116483305A (en) Intelligent network-connected automobile digital virtual person application system, application method thereof and vehicle
CN112951216B (en) Vehicle-mounted voice processing method and vehicle-mounted information entertainment system
CN115497478A (en) Method and device for vehicle internal and external communication and readable storage medium
CN115171284A (en) Old people care method and device
CN115482583A (en) Vehicle-mounted sign language translation method
CN113270087A (en) Processing method, mobile terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination