CN115440221B - Vehicle-mounted intelligent voice interaction method and system based on cloud computing - Google Patents
Vehicle-mounted intelligent voice interaction method and system based on cloud computing Download PDFInfo
- Publication number
- CN115440221B CN115440221B CN202211395643.1A CN202211395643A CN115440221B CN 115440221 B CN115440221 B CN 115440221B CN 202211395643 A CN202211395643 A CN 202211395643A CN 115440221 B CN115440221 B CN 115440221B
- Authority
- CN
- China
- Prior art keywords
- information
- instruction
- interactive
- vehicle
- target user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000002452 interceptive effect Effects 0.000 claims abstract description 237
- 230000008569 process Effects 0.000 claims abstract description 28
- 238000012937 correction Methods 0.000 claims abstract description 15
- 238000010801 machine learning Methods 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims description 33
- 238000010586 diagram Methods 0.000 claims description 26
- 230000001815 facial effect Effects 0.000 claims description 14
- 238000012544 monitoring process Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 10
- 230000015654 memory Effects 0.000 claims description 10
- 230000002457 bidirectional effect Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000003062 neural network model Methods 0.000 claims description 7
- 230000003631 expected effect Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 239000013589 supplement Substances 0.000 claims description 4
- 230000001502 supplementing effect Effects 0.000 claims description 4
- 230000006399 behavior Effects 0.000 abstract description 3
- 238000001514 detection method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000007787 long-term memory Effects 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000004424 eye movement Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 210000001747 pupil Anatomy 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R16/00—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
- B60R16/02—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
- B60R16/037—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
- B60R16/0373—Voice control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/08—Interaction between the driver and the control system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Mechanical Engineering (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Transportation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a vehicle-mounted intelligent voice interaction method and system based on cloud computing, which comprises the following steps: acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, and analyzing state information of the driving user by combining with vehicle-mounted environment information to generate a driving scene in the driving process of a current vehicle; acquiring the position and identity information of a target user through voice information, and initializing an instruction hierarchical graph; performing semantic recognition on voice information based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving situation, and correcting the interactive instruction through the comprehensive constraint; and acquiring feedback information of the target user on the interactive instruction, analyzing interactive instruction habit information of the target user, and compensating for correction of the interactive instruction. The method and the device provided by the invention are used for intelligently analyzing the interactive instruction based on the driving situation, better meeting the behavior characteristics of voice interaction of the automobile user on the premise of ensuring the recognition efficiency, and improving the interactive experience in the automobile.
Description
Technical Field
The invention relates to the technical field of voice interaction, in particular to a vehicle-mounted intelligent voice interaction method and system based on cloud computing.
Background
With the gradual rise of the internet of vehicles and intelligent automobiles, intelligent transportation and the internet of vehicles are becoming hot topics concerned by people, and more functions are carried on the vehicle machines. At present, the voice interaction technology is successfully applied to scenes such as intelligent sound boxes, input methods and the like, and is beneficial to reducing the operation dependence of a driver on equipment in a vehicle and improving the driving safety factor. Under the background of rapid development of intelligent technologies, key technologies of voice interaction comprise voice recognition, semantic understanding and voice synthesis. The speech recognition technology is to convert human voice signals, i.e. natural language, into corresponding text or instructions. Semantic understanding techniques process received text or instructions to convert natural language into a language that a machine can understand, thereby understanding the user's intent.
The voice recognition in the vehicle-mounted voice interaction is used for recognizing the identity of a speaker besides text content, and providing differentiated services for drivers and passengers in a vehicle according to application scenes, so that compared with the existing voice interaction widely applied to the vehicle, the voiceprint recognition is a field with a relatively higher technical threshold, how to realize more accurate user information habit acquisition by utilizing the voiceprint recognition, and an intelligent voice interaction model aiming at a vehicle-mounted special scene is constructed, so that the behavior characteristics of the voice interaction of the vehicle user are better met, the interaction experience in the vehicle is improved, and the safety protection of the vehicle is strengthened, so that the safety of a driver is guaranteed, and the interaction experience in the driving process is improved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a vehicle-mounted intelligent voice interaction method and system based on cloud computing.
The invention provides a vehicle-mounted intelligent voice interaction method based on cloud computing, which comprises the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
In the scheme, the state information of a driving user is analyzed by combining the attention information with the vehicle-mounted environment information, and the driving scene in the driving process of the current vehicle is generated, specifically:
acquiring facial frame image data of a driver through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data;
extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction according to the human face characteristic points;
comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with a preset threshold value, reading the attention information of a driving user, setting weight information according to road condition information of a current driving road section, and adjusting the attention threshold value by using the weight information;
evaluating the attention information of the driving user according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and an attention evaluation result, and performing matching analysis on the state information of the driving user;
when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction;
in addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
In this scheme, acquire the mutual speech information in the vehicle-mounted environment, acquire target user's position and identity information through mutual speech information, specifically do:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
In the scheme, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically comprising the following steps:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
In this scheme, still include: monitoring attention information of a driving user during voice interaction, specifically comprising:
after receiving the interactive instruction, acquiring the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp;
acquiring the sight line drop point frequency of a driving user in the driving scene of the current timestamp to acquire a watching hot spot area, acquiring the sight line drop point of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop point;
judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt;
and after the voice interaction is suspended, when the condition that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction.
In the scheme, feedback information of the target user to the interactive instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and the interactive instruction is compensated based on the instruction habit information, specifically:
after the interactive instruction is executed, feedback information of the target user on the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user;
the method comprises the steps of performing supplementary correction on an instruction hierarchical diagram based on a supplementary data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
establishing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
The second aspect of the present invention further provides a cloud computing-based vehicle-mounted intelligent voice interaction system, which includes: the vehicle-mounted intelligent voice interaction method based on the cloud computing comprises a memory and a processor, wherein the memory comprises the vehicle-mounted intelligent voice interaction method based on the cloud computing, and when the processor executes the vehicle-mounted intelligent voice interaction method based on the cloud computing, the following steps are realized:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical graph;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
The invention discloses a vehicle-mounted intelligent voice interaction method and system based on cloud computing, which comprises the following steps: acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, and analyzing state information of the driving user by combining with vehicle-mounted environment information to generate a driving scene in the driving process of a current vehicle; acquiring the position and identity information of a target user through voice information, and initializing an instruction hierarchical graph; performing semantic recognition on voice information based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving scene, and correcting the interactive instruction through the comprehensive constraint; and acquiring feedback information of the target user on the interactive instruction, analyzing interactive instruction habit information of the target user, and compensating for the correction of the interactive instruction. The method and the device are used for intelligently analyzing the interactive instruction based on the driving scene, better meeting the behavior characteristics of voice interaction of the automobile user on the premise of ensuring the recognition efficiency and improving the interactive experience in the automobile.
Drawings
FIG. 1 is a flow chart of a cloud computing-based vehicle-mounted intelligent voice interaction method according to the invention;
FIG. 2 is a flowchart illustrating a method for obtaining location and identity information of a target user through interactive voice information according to the present invention;
FIG. 3 is a flow chart of a method for semantic recognition of interactive voice information based on machine learning according to the present invention;
fig. 4 shows a block diagram of a cloud computing-based vehicle-mounted intelligent voice interaction system.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Fig. 1 shows a flow chart of a cloud computing-based vehicle-mounted intelligent voice interaction method of the invention.
As shown in fig. 1, a first aspect of the present invention provides a cloud-computing-based vehicle-mounted intelligent voice interaction method, including:
s102, acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current vehicle driving process;
s104, acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical graph;
s106, performing semantic recognition on interactive voice information by the cloud based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving scene, and correcting the interactive instruction through the comprehensive constraint;
and S108, acquiring feedback information of the target user to the interactive instruction, analyzing the instruction habit information of the target user through matching of the voiceprint information of the target user and the feedback information, and compensating the modification of the interactive instruction based on the instruction habit information.
The method comprises the steps that facial frame image data of a driving user are obtained through a camera in a vehicle, the frame image data are preprocessed, and a key frame of the facial frame image data is extracted; extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction; setting corresponding threshold intervals for the extracted data, matching attention levels to the threshold intervals, comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with preset thresholds, reading the attention information of the driving user, and judging the current attention level of the driving user. Setting weight information according to the road condition information of the current driving road section, and adjusting the attention threshold value by using the weight information, for example, when the current driving road section has a congested road condition, reducing the corresponding attention threshold value, so that a driving user is more concentrated in the driving process; evaluating the attention information of the driver according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and performing matching analysis on the state information of the driver with the attention evaluation result, wherein the vehicle-mounted environment information comprises information such as the number of people in the vehicle, the sound of the vehicle-mounted environment, the temperature of the vehicle-mounted environment, the air quality of the vehicle-mounted environment and the like; when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction, for example, when the driver is in a slight fatigue state, inquiring whether to open a vehicle window, play music or reduce the temperature of an air conditioner through voice interaction. In addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
Fig. 2 is a flowchart illustrating a method for obtaining location and identity information of a target user through interactive voice information according to the present invention.
According to the embodiment of the invention, the interactive voice information in the vehicle-mounted environment is acquired, and the position and identity information of the target user is acquired through the interactive voice information, which specifically comprises the following steps:
s202, acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, filtering and denoising the interactive voice information, and dividing the vehicle-mounted environment into a preset number of sub-regions;
s204, acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
s206, determining the position of the interactive voice information, then performing voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the similarity between the voiceprint corresponding to the interactive voice information and cloud storage data;
s208, acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data does not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and S210, matching an interactive instruction set corresponding to the function information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
It should be noted that, an interactive instruction set corresponding to the function information is matched according to the location information of the target user, and an instruction hierarchical diagram is initialized based on the identity information through the interactive instruction set, specifically: the method comprises the steps of studying interactive instructions based on position information of a vehicle-mounted environment for classification, obtaining keyword information corresponding to each interactive instruction through big data retrieval, and constructing an interactive instruction knowledge graph by the interactive instructions, the keyword information and classification results; extracting a corresponding interactive instruction set from the interactive instruction knowledge graph according to a source subregion of the target user interactive voice information; acquiring similar historical driving scenes from the historical driving scenes according to the current driving scenes, extracting the use frequency of each interactive instruction in the similar historical driving scenes, and carrying out priority classification on the interactive instructions in the interactive instruction set according to the use frequency; if the identity information of the target user is stored in the cloud, constructing a user portrait according to a historical interaction instruction corresponding to the identity information of the target user, adjusting the priority through the user portrait, and generating an instruction hierarchical graph according to the adjusted priority by the interaction instruction set.
FIG. 3 is a flow chart illustrating a method for semantic recognition of interactive voice information based on machine learning according to the present invention.
According to the embodiment of the invention, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically:
s302, preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average according to the Word vectors to construct sentence vector expression, and taking the Word vectors and the sentence vector expression as semantic features;
s304, establishing a key information extraction model based on the bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiation weights by combining an attention mechanism and context to obtain key information in the interactive voice information;
s306, classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of the target user;
s308, when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to the retrieval content, updating the intention according to the feedback of the target user, and matching the corresponding interactive instruction according to the updated intention;
s310, comprehensive constraints are set based on the current driving situation, whether the matched interactive instruction meets the range of the comprehensive constraints or not is judged, if not, the interactive instruction is corrected, and then feedback information of a target user is inquired through voice.
It should be noted that after receiving the interactive instruction, obtaining the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp; acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop; judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt; and after the voice interaction is suspended, when the fact that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction, wherein the sight line drop point detection of the target user can be obtained through eye movement hardware detection equipment or a pupil projection space method.
Integrating the context characteristics of the interactive voice information based on a bidirectional long and short term memory neural network model, ensuring the semantic integrity of the interactive voice information, constructing a data set by analyzing interactive instruction keywords through big data, dividing the data set into a training set and a verification set, carrying out word vector representation on the training data set, inputting the training data set into the bidirectional long and short term memory neural network combined with an attention mechanism for training, and extracting key information in the interactive voice information through the trained model.
In addition, comprehensive constraints are set based on the current driving scene, risk factors in the current driving scene are analyzed, constraint information is set on part of interactive instructions according to the risk factors, when the interactive instructions corresponding to the interactive voice information of the target user do not accord with a constraint preset range, voice prompt is generated, corresponding instructions are executed according to feedback of the target user, for example, when children are detected in the vehicle according to vehicle-mounted environment information, the height of a window where the children are located is constrained.
After the interactive instruction is executed, feedback information of the target user on the interactive instruction is acquired, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user; the method comprises the steps of supplementing and correcting an instruction hierarchical graph based on a supplement data set of each interactive instruction, extracting a graph structure according to the corrected instruction hierarchical graph, representing habit characteristics of a target user to the interactive instruction, training a graph convolution neural network according to the extracted graph structure to obtain instruction habit information of the target user, and executing message propagation between the target user and the interactive instruction which are connected with each other in the graph structure, wherein the message propagation process comprises characteristic transformation, neighborhood aggregation and nonlinear activation, so that target user nodes of self attribute characteristics have local neighborhood information and are expressed in a vector form, and the method specifically comprises the following steps:
wherein,represents the target user node->At the fifth place>Sub-convolved feature vector representation->Indicating that the interactive instruction node->Is selected, based on the set of neighbor nodes in the system, and>response history interactive instruction node>And the target user node->In the degree of (c), is greater than or equal to>Representing interactive instruction node>At the fifth place>A feature vector representation of the sub-convolution;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time; and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
Fig. 4 shows a block diagram of a cloud computing-based vehicle-mounted intelligent voice interaction system.
The second aspect of the present invention further provides a cloud computing-based vehicle-mounted intelligent voice interaction system 4, which includes: the memory 41 and the processor 42, where the memory includes a cloud-computing-based vehicle-mounted intelligent voice interaction method program, and when executed by the processor, the cloud-computing-based vehicle-mounted intelligent voice interaction method program implements the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user on the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
It should be noted that, the method includes acquiring facial frame image data of a driving user through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data; extracting face characteristic points of a driver according to key frames of the face frame image data, and acquiring face orientation information, eye closing degree and sight direction; setting corresponding threshold intervals for the extracted data, matching attention levels to the threshold intervals, comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with preset thresholds, reading the attention information of the driving user, and judging the current attention level of the driving user. Setting weight information according to the road condition information of the current driving road section, and adjusting the attention threshold value by using the weight information, for example, when the current driving road section has a congested road condition, reducing the corresponding attention threshold value, so that a driving user is more concentrated in the driving process; evaluating the attention information of the driver according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and performing matching analysis on the state information of the driver by the attention evaluation result, wherein the vehicle-mounted environment information comprises information such as the number of people in the vehicle, the sound of the vehicle-mounted environment, the temperature of the vehicle-mounted environment, the air quality of the vehicle-mounted environment and the like; when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction, for example, when the driver is in a slight fatigue state, inquiring whether to open a vehicle window, play music or reduce the temperature of an air conditioner through voice interaction. In addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
According to the embodiment of the invention, the interactive voice information in the vehicle-mounted environment is acquired, and the position and identity information of the target user is acquired through the interactive voice information, which specifically comprises the following steps:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
It should be noted that, an interactive instruction set corresponding to the function information is matched according to the location information of the target user, and an instruction hierarchical diagram is initialized based on the identity information through the interactive instruction set, specifically: the method comprises the steps of studying interactive instructions based on position information of a vehicle-mounted environment for classification, obtaining keyword information corresponding to each interactive instruction through big data retrieval, and constructing an interactive instruction knowledge graph by the interactive instructions, the keyword information and classification results; extracting a corresponding interactive instruction set from the interactive instruction knowledge graph according to a source subregion of the target user interactive voice information; acquiring similar historical driving scenes from the historical driving scenes according to the current driving scenes, extracting the use frequency of each interactive instruction in the similar historical driving scenes, and carrying out priority classification on the interactive instructions in the interactive instruction set according to the use frequency; if the identity information of the target user is stored in the cloud, constructing a user portrait according to a historical interaction instruction corresponding to the identity information of the target user, adjusting the priority through the user portrait, and generating an instruction hierarchical graph according to the adjusted priority by the interaction instruction set.
According to the embodiment of the invention, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to a non-unique instruction, setting question-back voice information according to retrieval contents, updating intentions according to feedback of a target user, and matching corresponding interactive instructions according to the updated intentions;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
It should be noted that after receiving the interactive instruction, obtaining the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp; acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop; judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt; and after the voice interaction is suspended, when the fact that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction, wherein the sight line drop point detection of the target user can be obtained through eye movement hardware detection equipment or a projection space method.
Integrating the context characteristics of the interactive voice information based on a bidirectional long and short term memory neural network model, ensuring the semantic integrity of the interactive voice information, constructing a data set by analyzing interactive instruction keywords through big data, dividing the data set into a training set and a verification set, carrying out word vector representation on the training data set, inputting the training data set into the bidirectional long and short term memory neural network combined with an attention mechanism for training, and extracting key information in the interactive voice information through the trained model.
In addition, comprehensive constraints are set based on the current driving scene, risk factors in the current driving scene are analyzed, constraint information is set on part of interactive instructions according to the risk factors, when the interactive instructions corresponding to the interactive voice information of the target user do not accord with a constraint preset range, voice prompt is generated, corresponding instructions are executed according to feedback of the target user, for example, when children are detected in the vehicle according to vehicle-mounted environment information, the height of a window where the children are located is constrained.
After the interactive instruction is executed, feedback information of the target user on the interactive instruction is acquired, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user; the method comprises the steps of supplementing and correcting an instruction hierarchical graph based on a supplement data set of each interactive instruction, extracting a graph structure according to the corrected instruction hierarchical graph, representing habit characteristics of a target user to the interactive instruction, training a graph convolution neural network according to the extracted graph structure to obtain instruction habit information of the target user, and executing message propagation between the target user and the interactive instruction which are connected with each other in the graph structure, wherein the message propagation process comprises characteristic transformation, neighborhood aggregation and nonlinear activation, so that target user nodes of self attribute characteristics have local neighborhood information and are expressed in a vector form, and the method specifically comprises the following steps:
wherein,represents the target user node->At the fifth place>Sub-convolved feature vector representation->Indicating that the interactive instruction node->In a neighbor node set, in conjunction with a node selection unit>Response history interactive instruction node->And the target user node->Is greater than or equal to>Indicating that the interactive instruction node->At the fifth place>A feature vector representation of the sub-convolution;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time; and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
The third aspect of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a cloud-computing-based vehicle-mounted intelligent voice interaction method program, and when the cloud-computing-based vehicle-mounted intelligent voice interaction method program is executed by a processor, the steps of the cloud-computing-based vehicle-mounted intelligent voice interaction method are implemented.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A vehicle-mounted intelligent voice interaction method based on cloud computing is characterized by comprising the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
2. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that state information of a driving user is analyzed according to attention information and vehicle-mounted environment information, and a driving scene in a current vehicle driving process is generated, specifically:
acquiring facial frame image data of a driver through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data;
extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction according to the human face characteristic points;
comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with a preset threshold value, reading the attention information of a driving user, setting weight information according to road condition information of a current driving road section, and adjusting the attention threshold value by using the weight information;
evaluating the attention information of the driving user according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and an attention evaluation result, and performing matching analysis on the state information of the driving user;
when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction;
in addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
3. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that interactive voice information in a vehicle-mounted environment is acquired, and position and identity information of a target user is acquired through the interactive voice information, specifically:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
4. The vehicle-mounted intelligent voice interaction method based on cloud computing according to claim 1, wherein the cloud performs semantic recognition on interaction voice information based on machine learning, searches in an instruction hierarchical diagram to generate an interaction instruction, generates a comprehensive constraint according to the driving scenario, and corrects the interaction instruction through the comprehensive constraint, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
5. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, further comprising: monitoring attention information of a driving user during voice interaction, specifically comprising:
after receiving the interactive instruction, acquiring the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp;
acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop;
judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt;
and after the voice interaction is suspended, when the condition that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction.
6. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that feedback information of a target user on an interaction instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interaction instruction is compensated based on the instruction habit information, specifically:
after the interactive instruction is executed, feedback information of a target user to the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set to be a voiceprint information label of the target user;
the method comprises the steps of supplementing and correcting an instruction hierarchical diagram based on a supplement data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
7. The utility model provides an on-vehicle intelligent voice interaction system based on cloud calculates which characterized in that, this system includes: the vehicle-mounted intelligent voice interaction method based on the cloud computing comprises a storage and a processor, wherein the storage comprises the vehicle-mounted intelligent voice interaction method based on the cloud computing, and when the vehicle-mounted intelligent voice interaction method based on the cloud computing is executed by the processor, the following steps are realized:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the user through a judgment result, and initializing an instruction hierarchical graph;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user on the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
8. The vehicle-mounted intelligent voice interaction system based on cloud computing as claimed in claim 7, wherein interactive voice information in a vehicle-mounted environment is acquired, and the position and identity information of a target user is acquired through the interactive voice information, and specifically:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
9. The cloud-computing-based vehicle-mounted intelligent voice interaction system according to claim 7, wherein the cloud performs semantic recognition on interaction voice information based on machine learning, searches in an instruction hierarchical diagram to generate an interaction instruction, generates a comprehensive constraint according to the driving scenario, and corrects the interaction instruction through the comprehensive constraint, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
10. The cloud-computing-based vehicle-mounted intelligent voice interaction system according to claim 7, wherein feedback information of a target user on an interaction instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interaction instruction is compensated based on the instruction habit information, and specifically:
after the interactive instruction is executed, feedback information of a target user to the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set to be a voiceprint information label of the target user;
the method comprises the steps of supplementing and correcting an instruction hierarchical diagram based on a supplement data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211395643.1A CN115440221B (en) | 2022-11-09 | 2022-11-09 | Vehicle-mounted intelligent voice interaction method and system based on cloud computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211395643.1A CN115440221B (en) | 2022-11-09 | 2022-11-09 | Vehicle-mounted intelligent voice interaction method and system based on cloud computing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115440221A CN115440221A (en) | 2022-12-06 |
CN115440221B true CN115440221B (en) | 2023-03-24 |
Family
ID=84252910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211395643.1A Active CN115440221B (en) | 2022-11-09 | 2022-11-09 | Vehicle-mounted intelligent voice interaction method and system based on cloud computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115440221B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118248135A (en) * | 2022-12-23 | 2024-06-25 | 北京罗克维尔斯科技有限公司 | Voice interaction method and device of intelligent equipment and vehicle |
CN116741175B (en) * | 2023-08-14 | 2023-11-03 | 深圳市实信达科技开发有限公司 | Block chain-based intelligent data transmission supervision system and method |
CN117115788B (en) * | 2023-10-19 | 2024-01-02 | 天津所托瑞安汽车科技有限公司 | Intelligent interaction method for vehicle, back-end server and front-end equipment |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9085303B2 (en) * | 2012-11-15 | 2015-07-21 | Sri International | Vehicle personal assistant |
CN108896061A (en) * | 2018-05-11 | 2018-11-27 | 京东方科技集团股份有限公司 | A kind of man-machine interaction method and onboard navigation system based on onboard navigation system |
CN110019740B (en) * | 2018-05-23 | 2021-10-01 | 京东方科技集团股份有限公司 | Interaction method of vehicle-mounted terminal, server and storage medium |
CN110874202B (en) * | 2018-08-29 | 2024-04-19 | 斑马智行网络(香港)有限公司 | Interaction method, device, medium and operating system |
CN111653277A (en) * | 2020-06-10 | 2020-09-11 | 北京百度网讯科技有限公司 | Vehicle voice control method, device, equipment, vehicle and storage medium |
US11897331B2 (en) * | 2021-01-14 | 2024-02-13 | Baidu Usa Llc | In-vehicle acoustic monitoring system for driver and passenger |
CN115294976A (en) * | 2022-06-23 | 2022-11-04 | 中国第一汽车股份有限公司 | Error correction interaction method and system based on vehicle-mounted voice scene and vehicle thereof |
CN115273797A (en) * | 2022-06-28 | 2022-11-01 | 智己汽车科技有限公司 | Sound-based automobile interaction method and system and storage medium |
-
2022
- 2022-11-09 CN CN202211395643.1A patent/CN115440221B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115440221A (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115440221B (en) | Vehicle-mounted intelligent voice interaction method and system based on cloud computing | |
CN112100349B (en) | Multi-round dialogue method and device, electronic equipment and storage medium | |
CN110990543A (en) | Intelligent conversation generation method and device, computer equipment and computer storage medium | |
CN110390363A (en) | A kind of Image Description Methods | |
CN110232340B (en) | Method and device for establishing video classification model and video classification | |
CN111310562A (en) | Vehicle driving risk management and control method based on artificial intelligence and related equipment thereof | |
CN113723528B (en) | Vehicle-mounted language-vision fusion multi-mode interaction method and system, equipment and storage medium | |
CN111653274B (en) | Wake-up word recognition method, device and storage medium | |
CN113408287B (en) | Entity identification method and device, electronic equipment and storage medium | |
CN112418302A (en) | Task prediction method and device | |
CN111125457A (en) | Deep cross-modal Hash retrieval method and device | |
CN113837669B (en) | Evaluation index construction method of label system and related device | |
CN111340355A (en) | Matching method, device, server and medium of travel order | |
CN116363712B (en) | Palmprint palm vein recognition method based on modal informativity evaluation strategy | |
CN115512696B (en) | Simulation training method and vehicle | |
CN116958512A (en) | Target detection method, target detection device, computer readable medium and electronic equipment | |
CN113012687B (en) | Information interaction method and device and electronic equipment | |
CN113469023B (en) | Method, apparatus, device and storage medium for determining alertness | |
CN113128526B (en) | Image recognition method and device, electronic equipment and computer-readable storage medium | |
CN114882522A (en) | Behavior attribute recognition method and device based on multi-mode fusion and storage medium | |
CN108959387B (en) | Information acquisition method and device | |
CN116401364A (en) | Language model training method, electronic device, storage medium and product | |
CN115689603A (en) | User feedback information collection method and device and user feedback system | |
CN116737940B (en) | Intelligent decision method and decision system | |
CN117235234B (en) | Object information acquisition method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |