Nothing Special   »   [go: up one dir, main page]

CN111737670B - Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction - Google Patents

Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction Download PDF

Info

Publication number
CN111737670B
CN111737670B CN201910225815.2A CN201910225815A CN111737670B CN 111737670 B CN111737670 B CN 111737670B CN 201910225815 A CN201910225815 A CN 201910225815A CN 111737670 B CN111737670 B CN 111737670B
Authority
CN
China
Prior art keywords
data
user
intention
request
priority
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910225815.2A
Other languages
Chinese (zh)
Other versions
CN111737670A (en
Inventor
冉光伟
张莹
王金华
张宗煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Automobile Group Co Ltd
Original Assignee
Guangzhou Automobile Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Automobile Group Co Ltd filed Critical Guangzhou Automobile Group Co Ltd
Priority to CN201910225815.2A priority Critical patent/CN111737670B/en
Publication of CN111737670A publication Critical patent/CN111737670A/en
Application granted granted Critical
Publication of CN111737670B publication Critical patent/CN111737670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The application discloses a method, a system and a vehicle-mounted multimedia device for collaborative man-machine interaction of multi-mode data. The method comprises the following steps: step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device; step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data; step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping; and step S13, performing priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention. By implementing the method and the device, the cooperation of the multi-mode data in the human-computer interaction party can be realized, so that the human-computer interaction experience of the vehicle-mounted terminal is improved.

Description

Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction
Technical Field
The application belongs to the field of vehicle-mounted terminals, and particularly relates to a method and a system for collaborative human-computer interaction of multi-mode data and a vehicle-mounted multimedia device.
Background
The man-machine interaction system in the current vehicle-mounted terminal generally adopts a voice recognition system, a gesture recognition system, a face recognition system, a fingerprint recognition system and a software/hardware recognition system. However, these existing systems are independent algorithm logic systems, and cannot realize effective coordination of data.
For example, in some prior art, a single speech recognition system can only provide content that directly acts on the recognized keywords or keywords, but cannot locate the content of the display interface; the single gesture recognition system can only provide the content in the content of the display interface for operation, and can not acquire the keywords or the information of the keywords of the user. For example, in some examples, only the content of the voice is subjected to data decomposition, and the decomposed data is subjected to reclassifying definition, so that coordination fluency of man-machine interaction cannot be solved; or, only the environment where the current user is located and the physiological state of the current user are identified, but specific interactive contents cannot be realized; or, the interaction between semantic recognition and gesture recognition is involved, the system needs to accurately analyze the semantic meaning first, then analyze the gesture motion, and calculate the result wanted by the user, for example, in some examples, the user's voice "measurement length", the gesture motion "clicks" one point of the object as a measurement starting point, and then the system can calculate the length of the object by using the gesture motion "clicking" another point of the object as an end point, which has the problems of insufficient fluency and insufficient accuracy of man-machine interaction in the prior art.
Disclosure of Invention
The technical problem to be solved by the embodiment of the application is to provide a method, a system and a vehicle-mounted multimedia device for collaborative human-computer interaction of multi-modal data, which can realize the collaborative problem of multi-modal input data and can more accurately identify the actual intention of user interaction.
In one aspect of the present application, a method for collaborative human-computer interaction of multi-modal data is provided, which includes the following steps:
step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device, wherein the multi-mode interaction data comprises at least one of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;
step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;
step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping;
and step S13, performing priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention, and outputting the user intention.
Wherein the method further comprises the steps of:
the step of obtaining standard characteristic data corresponding to various multi-mode interaction data in advance specifically comprises the following steps:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
Wherein, the step S10 includes:
and the fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to the user are acquired through a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module which are connected with the vehicle-mounted multimedia device.
Wherein, the step S11 further includes:
when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, respectively matching the acquired multi-mode interaction data with corresponding standard feature data, and respectively acquiring a face authentication request, a fingerprint authentication request and an iris authentication request after successful matching;
when the acquired multi-mode interaction data are lip images, matching the feature data corresponding to the lip images with basic lip data, and obtaining corresponding lip instruction requests after successful matching;
when the acquired multi-mode interaction data are voice data, matching the voice data with basic voiceprint data, and obtaining a voice command request corresponding to a keyword corresponding to the voice after successful matching;
when the acquired multi-mode interaction data are hand data, matching the hand data with basic hand data to obtain a hand instruction request corresponding to the gesture or hand;
when the acquired multi-modal interaction data is a facial expression image, matching the facial expression with basic expression data to obtain an expression instruction request corresponding to the expression.
Wherein, the step S12 further includes:
confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention, and associating a first group of priorities;
confirming a key request, a touch control click request and the hand instruction request as action intentions, and associating the action intentions with a second group of priorities;
confirming the lip instruction request and the voice instruction request as semantic intents and associating the semantic intents with a third group of priorities;
confirming the expression instruction request as emotion intention and associating the emotion instruction request as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities corresponds to a different priority.
Wherein, the step S13 further includes:
and carrying out priority arbitration on each obtained intention within a preset time, taking the user intention corresponding to the request with the highest priority as the selected user intention, and outputting an instruction request corresponding to the user intention.
Wherein, the step S13 further includes:
if the user intention with higher priority is detected to appear in real time in the process of outputting the instruction request corresponding to the user intention, the current output is interrupted.
Correspondingly, in another aspect of the embodiment of the present application, there is also provided a system for collaborative human-computer interaction with multi-modal data, including:
the multi-mode interaction data acquisition unit is used for acquiring multi-mode interaction data in real time through the vehicle-mounted multimedia device, wherein the multi-mode interaction data comprise at least one of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;
the matching processing unit is used for carrying out characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with the pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;
the priority grouping association unit is used for associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data and carrying out priority grouping;
and the user intention arbitration unit is used for carrying out priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.
Wherein, further include:
the standard characteristic data acquisition unit is used for acquiring standard characteristic data corresponding to various multi-mode interaction data in advance, and the standard characteristic data acquisition unit specifically acquires the standard characteristic data in the following mode:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
The device for the multi-mode data collaborative human-computer interaction is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module and is used for acquiring fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to a user.
Wherein the matching processing unit further includes:
the authentication matching processing unit is used for respectively matching the acquired multi-mode interaction data with the corresponding standard characteristic data when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, and respectively obtaining a face authentication request, a fingerprint authentication request and an iris authentication request after the matching is successful;
the lip command request matching unit is used for matching the feature data corresponding to the lip image with the basic lip data when the acquired multi-mode interaction data is the lip image, and obtaining the corresponding lip command request after successful matching;
the voice command request matching unit is used for matching the voice data with basic voiceprint data when the acquired multi-mode interaction data are voice data, and acquiring a corresponding voice command request of a keyword corresponding to the voice after the voice data are successfully matched;
the hand instruction request matching unit is used for matching the hand data with the basic hand data when the acquired multi-mode interaction data are the hand data, and obtaining a hand instruction request corresponding to the gesture or the hand type;
and the expression instruction request matching unit is used for matching the facial expression with the basic expression data when the acquired multi-modal interaction data is a facial expression image, so as to obtain an expression instruction request corresponding to the expression.
Wherein the priority packet associating unit further includes:
the identity authentication intention association unit is used for confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention and associating a first group of priorities;
the action intention association unit is used for confirming the key request, the touch control click request and the hand instruction request as action intention and associating the action intention with a second group of priorities;
the semantic intention association unit confirms the lip instruction request and the voice instruction request as semantic intention and associates the lip instruction request and the voice instruction request as a third group of priority;
an emotion intention association unit configured to confirm the emotion instruction request as an emotion intention and associate the emotion intention as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities also corresponds to a different priority.
Wherein the user intention arbitration unit further includes:
an arbitration unit, configured to perform priority arbitration on each obtained intention within a predetermined time, and regarding a user intention corresponding to a request with a highest priority as a selected user intention;
the output unit is used for outputting the instruction request corresponding to the selected user intention determined by the arbitration unit;
and the interruption unit is used for interrupting the current output when detecting that the user intention with higher priority appears in real time in the process of outputting the instruction request corresponding to the user intention.
Correspondingly, in still another aspect of the present application, a vehicle-mounted multimedia device is further provided, which is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module, and is characterized in that the vehicle-mounted multimedia device comprises the system for collaborative man-machine interaction of multi-mode data.
The embodiment of the application has the following beneficial effects:
according to the method, the system and the vehicle-mounted multimedia device for the multi-mode data collaborative human-computer interaction, provided by the application, the multi-mode sensor is utilized to collect the identity characteristic information and the intention information of the driver, so that the multi-mode human-computer interaction data can be collected;
in the embodiment of the application, the priority ranking can be carried out on various characteristics of multiple modes, and multiple input sources can be processed to determine corresponding priority levels; meanwhile, data fusion is carried out on various characteristics of multiple modes, so that accuracy of multi-dimensional input source intention can be improved;
in the embodiment of the application, each intention of data fusion can be arbitrated, and interrupt processing is realized by utilizing the priority of multiple intents and multiple outputs; the system can more accurately and rapidly identify the intention of the driver, improve the diversity and accuracy of the man-machine interaction scene to identify the intention of the driver, and improve the interaction flexibility.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that it is within the scope of the application to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
FIG. 1 is a schematic view of an application environment of a method for collaborative human-computer interaction of multimodal data provided by the present application;
FIG. 2 is a schematic diagram of a main flow of an embodiment of a method for collaborative human-computer interaction with multimodal data provided by the present application;
FIG. 3 is a more detailed flow chart of one embodiment of step S13 of FIG. 2;
FIG. 4 is a schematic structural diagram of an embodiment of a system for collaborative human-computer interaction with multimodal data provided by the present application;
FIG. 5 is a schematic diagram of the matching processing unit of FIG. 4;
fig. 6 is a schematic diagram of a structure of the priority packet associating unit in fig. 4;
fig. 7 is a schematic diagram of a structure of the user intention arbitration unit in fig. 4.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent.
The method for collaborative human-computer interaction of the multi-mode data provided by the application can be applied to an application environment shown in figure 1. The vehicle-mounted multimedia device communicates with each sensor (acquisition module) through a bus. The vehicle-mounted multimedia device may be regarded as an electronic device including a processor, a nonvolatile storage medium, an internal memory, and an input/output device (such as a touch screen module, a key module, and a network interface) connected through a system bus. The system for the multi-mode data collaborative human-computer interaction of the vehicle-mounted multimedia device is used for realizing a method for the multi-mode data collaborative human-computer interaction. The processor is configured to provide computing and control capabilities to support the operation of the entire in-vehicle multimedia device. An internal memory in the vehicle-mounted multimedia device provides an environment for the operation of the system of the multi-mode data collaborative human-computer interaction in the nonvolatile storage medium. Specifically, the system for collaboration of the multi-modal data and man-machine interaction may perform matching analysis, priority association and priority negotiation on the multi-modal data acquired by the acquisition module, so as to obtain a real intention of the user, and specific details will be described later. The vehicle-mounted multimedia device can be a vehicle-mounted terminal, and can be various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.
FIG. 2 is a schematic diagram of a main flow of an embodiment of a method for collaborative human-computer interaction with multimodal data provided by the present application; in this embodiment, the method for collaborative human-computer interaction with multi-modal data is implemented by a system for collaborative human-computer interaction with multi-modal data in a multimedia-loaded device as in fig. 1, and specifically, the method includes:
step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device, wherein the multi-mode interaction data comprises at least one of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;
step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;
step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping;
and step S13, performing priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention, and outputting the user intention.
In order to facilitate understanding of the present application, each step to which the present application relates will be described in detail with reference to specific examples.
It can be understood that in the implementation process of the present application, before step S10, a step of obtaining in advance standard feature data corresponding to various multi-modal interaction data is required, where the step specifically includes:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
Wherein, the step S10 includes:
and the fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to the user are acquired through a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module which are connected with the vehicle-mounted multimedia device.
Wherein, the step S11 further includes: and carrying out feature extraction on the acquired multi-mode interaction data of each type and carrying out matching on the acquired multi-mode interaction data and the standard feature data corresponding to each type. The process of feature extraction for each type of multimode interaction data, such as feature cutting for collected face and lip data, semantic processing for voice data, etc., is well known in the art, and is easily understood and available to those skilled in the art, and is not described in detail herein. The following mainly describes the process of feature matching. The method specifically comprises the following steps:
when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, respectively matching the acquired multi-mode interaction data with corresponding standard feature data, and respectively acquiring a face authentication request, a fingerprint authentication request and an iris authentication request after successful matching; specifically, fingerprint data collected in real time and corresponding standard feature data are matched, if matching similarity is checked to be larger than a set threshold, the matching is considered to pass, an output result is 1, otherwise, the output result is-1, when the output result is-1, other fingerprints are traversed to be matched, and if a certain fingerprint cannot be successfully matched, waiting time can be increased each time. Similarly, matching the face image with corresponding standard feature data, if the matching similarity is larger than a set threshold value, indicating that the matching is successful, outputting a result of 1, otherwise outputting a result of-1; if the matching is unsuccessful, the waiting time can be prolonged, and the next matching is performed; also, a similar matching process is employed for the matching of iris images. It will be appreciated that the three authentication methods are shown here as matching processes, and that in practical applications only one or a few of them may be used.
When the acquired multi-mode interaction data are lip images, matching the feature data corresponding to the lip images with basic lip data, and obtaining corresponding lip instruction requests after successful matching (the similarity of the feature data and the basic lip data is larger than a preset threshold value);
when the acquired multi-mode interaction data are voice data, matching the voice data with basic voiceprint data, outputting keywords or keywords with similar pronunciation after successful matching (the similarity of the voice data and the basic voiceprint data is larger than a preset threshold), and searching corresponding voice instructions in an engine library through the keywords or keywords so as to obtain corresponding voice instruction requests of the keywords corresponding to the voice;
when the acquired multi-mode interaction data are hand data, matching the hand data with basic hand data, and after successful matching (the similarity of the hand data and the basic hand data is greater than a preset threshold value), obtaining a hand instruction request corresponding to the gesture or the hand;
when the acquired multi-mode interaction data is a facial expression image, matching the facial expression with basic expression data, and obtaining an expression instruction request corresponding to the expression after successful matching (the similarity of the facial expression and the basic expression data is larger than a preset threshold value).
Wherein, the step S12 further includes:
confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention, and associating a first group of priorities;
confirming a key request, a touch control click request and the hand instruction request as action intentions, and associating the action intentions with a second group of priorities;
confirming the lip instruction request and the voice instruction request as semantic intents and associating the semantic intents with a third group of priorities;
confirming the expression instruction request as emotion intention and associating the emotion instruction request as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities corresponds to a different priority.
In a specific example, the face identity matching passing request priority level may be set to A1, the iris identity matching passing request priority level may be set to A2, the fingerprint identity matching passing request priority level may be set to A3, the key request priority level may be set to B1, the touch point touch request priority level may be set to B2, the hand instruction request priority level may be set to B3, the lip instruction request priority level may be set to C2, the voiceprint instruction request priority level may be set to C1, and the expression instruction request priority level may be set to D.
Classifying the matching results corresponding to the priorities A1, A2 and A3, and determining the matching results as identity authentication intents with the priorities A; within the group, A1 has a priority greater than that of A2, A2 has a priority greater than that of A3;
classifying the matching results corresponding to the priorities B1, B2 and B3, and determining the action intention of the group B; within the group, B1 has a priority greater than that of B2, and B2 has a priority greater than that of B3;
classifying the matching results corresponding to the priorities C1, C2 and C3, and determining the semantic intention of the priority C group; within the group, the priority of C1 is greater than the priority of C2, and the priority of C2 is greater than the priority of C3;
classifying the matching results corresponding to the priority level D, and determining the emotion intention of the priority level D group.
Overall, the priority of group a is greater than the priority of group B, which is greater than the priority of group C, which is greater than the priority of group D.
Wherein, the step S13 further includes:
and carrying out priority arbitration on each obtained intention within a preset time, taking the user intention corresponding to the request with the highest priority as the selected user intention, and outputting an instruction request corresponding to the user intention.
Wherein, the step S13 further includes:
if the user intention with higher priority is detected to appear in real time in the process of outputting the instruction request corresponding to the user intention, the current output is interrupted.
In a specific example, the step S13 may use a high-bottom according to priority to perform real-time judgment, and may specifically include the following steps:
step S130, confirming that the identity characteristic authentication passes, namely the identity authentication of the group A priority is intended to pass;
step S131, judging whether to execute action intention; namely, judging whether the action intention of the B group priority exists or not, if so, outputting a corresponding action intention result in the step S134; otherwise, go to step S132;
step S132, judging whether to execute semantic intention; namely judging whether the semantic intention of the priority of the group C exists or not, if so, outputting a corresponding semantic intention result in the step S134; otherwise, go to step S133;
step S133, judging whether to execute the expression intention; judging whether the expression intention of the priority of the group D exists or not, and if so, outputting a corresponding expression intention result in the step S134; otherwise, the process goes to step S131 to resume the determination flow.
FIG. 4 is a schematic diagram illustrating the architecture of one embodiment of a system 2 for collaborative human-machine interaction with multimodal data provided by the present application; which may be implemented in the application environment of fig. 1, in conjunction with fig. 5-7. In this embodiment, the system 2 comprises:
the multi-modal interaction data collection unit 25 is configured to collect multi-modal interaction data in real time through the vehicle-mounted multimedia device, where the multi-modal interaction data includes at least one of user fingerprint data, face data, acoustic wave data, hand data, vehicle-mounted terminal key data, and vehicle-mounted screen touch data;
the matching processing unit 20 is configured to perform feature acquisition on each type of multimodal interaction data acquired in real time, and compare the acquired multimodal interaction data with pre-stored standard feature data corresponding to the type of multimodal interaction data respectively to obtain user intention corresponding to each type of multimodal interaction data;
a priority grouping association unit 21, configured to associate a corresponding priority for the user intention corresponding to each type of the multi-mode interaction data, and perform priority grouping;
and the user intention arbitration unit 22 is configured to perform priority arbitration on the various user intentions according to the priority grouping information, determine one of the user intentions as the selected user intention, and output the determined user intention.
Wherein, further include: the standard feature data obtaining unit 24 is configured to obtain standard feature data corresponding to various multi-mode interaction data in advance, where the standard feature data obtaining unit specifically obtains the standard feature data in the following manner:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
The device for the multi-mode data collaborative human-computer interaction is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module and is used for acquiring corresponding user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data.
Wherein the matching process unit 20 further comprises:
the authentication matching processing unit 200 is configured to match the acquired multimodal interaction data with corresponding standard feature data when the acquired multimodal interaction data is a face image, a fingerprint image, and an iris image, and obtain a face authentication request, a fingerprint authentication request, and an iris authentication request after the matching is successful;
the lip command request matching unit 201 is configured to match feature data corresponding to the lip image with basic lip data when the acquired multimodal interaction data is a lip image, and obtain a corresponding lip command request after the matching is successful;
the voice command request matching unit 202 is configured to, when the acquired multimodal interaction data is voice data, match the voice data with basic voiceprint data, and obtain a corresponding voice command request of a keyword corresponding to the voice after the matching is successful;
the hand instruction request matching unit 203 is configured to match the hand data with the base hand data when the acquired multimodal interaction data is hand data, so as to obtain a hand instruction request corresponding to the gesture or the hand pattern;
and the expression instruction request matching unit 204 is configured to match the facial expression with the basic expression data when the acquired multimodal interaction data is a facial expression image, so as to obtain an expression instruction request corresponding to the expression.
Wherein the priority packet associating unit 21 further includes:
an identity authentication intention association unit 210, configured to confirm the face authentication request, the fingerprint authentication request, and the iris authentication request as identity authentication intention, and associate a first set of priorities;
an action intention association unit 211, configured to confirm a key request, a touch click request, and the hand instruction request as action intention, and associate the action intention with a second group of priorities;
a semantic intention association unit 212, configured to identify the lip instruction request and the voice instruction request as semantic intention, and associate the lip instruction request and the voice instruction request as a third group of priorities;
an emotional intent association unit 213 for confirming the emotional instruction request as an emotional intent and associating as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities also corresponds to a different priority.
Wherein the user intention arbitration unit 22 further includes:
an arbitration unit 220, configured to arbitrate the obtained intents in priority for each type, and take the user intention corresponding to the request with the highest priority as the selected user intention;
an output unit 221, configured to output an instruction request corresponding to the selected user intention determined by the arbitration unit;
and the interruption unit 222 is configured to interrupt the current output when detecting that the user intention with the higher priority appears in real time during the process of outputting the instruction request corresponding to the user intention.
For more details, reference is made to the description of fig. 2 above, and this is not described in detail here.
The embodiment of the application has the following beneficial effects:
the application provides a method, a system and a vehicle-mounted multimedia device for collaborative man-machine interaction of multi-mode data. The multi-mode sensor is utilized to collect the identity characteristic information and the intention information of the driver, so that multi-mode man-machine interaction data can be collected;
in the embodiment of the application, the priority ranking can be carried out on various characteristics of multiple modes, and multiple input sources can be processed to determine corresponding priority levels; meanwhile, data fusion is carried out on various characteristics of multiple modes, so that accuracy of multi-dimensional input source intention can be improved;
in the embodiment of the application, each intention of data fusion can be arbitrated, and interrupt processing is realized by utilizing the priority of multiple intents and multiple outputs; the system can more accurately and rapidly identify the intention of the driver, improve the diversity and accuracy of the man-machine interaction scene to identify the intention of the driver, and improve the interaction flexibility.
The above disclosure is only a preferred embodiment of the present application, and it is needless to say that the scope of the application is not limited thereto, and therefore, the equivalent changes according to the claims of the present application still fall within the scope of the present application.

Claims (14)

1. The method for collaborative human-computer interaction of the multi-mode data is characterized by comprising the following steps:
step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device, wherein the multi-mode interaction data comprises at least one type of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;
step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;
step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping;
and step S13, carrying out priority arbitration on each user intention according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.
2. The method as recited in claim 1, further comprising the step of:
the step of obtaining standard characteristic data corresponding to various multi-mode interaction data in advance specifically comprises the following steps:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
3. The method according to claim 2, wherein the step S10 includes:
and the fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to the user are acquired through a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module which are connected with the vehicle-mounted multimedia device.
4. A method according to claim 2 or 3, wherein said step S11 further comprises:
when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, respectively matching the acquired multi-mode interaction data with corresponding standard feature data, and respectively acquiring a face authentication request, a fingerprint authentication request and an iris authentication request after successful matching;
when the acquired multi-mode interaction data are lip images, matching the feature data corresponding to the lip images with basic lip data, and obtaining corresponding lip instruction requests after successful matching;
when the acquired multi-mode interaction data are voice data, matching the voice data with basic voiceprint data, and obtaining a voice command request corresponding to a keyword corresponding to the voice after successful matching;
when the acquired multi-mode interaction data are hand data, matching the hand data with basic hand data to obtain a hand instruction request corresponding to the gesture or hand;
when the acquired multi-modal interaction data is a facial expression image, matching the facial expression with basic expression data to obtain an expression instruction request corresponding to the expression.
5. The method of claim 4, wherein the step S12 further comprises:
confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention, and associating a first group of priorities;
confirming a key request, a touch control click request and the hand instruction request as action intentions, and associating the action intentions with a second group of priorities;
confirming the lip instruction request and the voice instruction request as semantic intents and associating the semantic intents with a third group of priorities;
confirming the expression instruction request as emotion intention and associating the emotion instruction request as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities corresponds to a different priority.
6. The method of claim 5, wherein the step S13 further comprises:
and carrying out priority arbitration on each obtained intention within a preset time, taking the user intention corresponding to the request with the highest priority as the selected user intention, and outputting an instruction request corresponding to the user intention.
7. The method of claim 6, wherein the step S13 further comprises:
if the user intention with higher priority is detected to appear in real time in the process of outputting the instruction request corresponding to the user intention, the current output is interrupted.
8. The system for collaborative human-computer interaction of the multi-mode data is characterized by comprising the following steps:
the multi-mode interaction data acquisition unit is used for acquiring multi-mode interaction data in real time through the vehicle-mounted multimedia device, wherein the multi-mode interaction data comprise at least one type of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;
the matching processing unit is used for carrying out characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with the pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;
the priority grouping association unit is used for associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data and carrying out priority grouping;
and the user intention arbitration unit is used for carrying out priority arbitration on each user intention according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.
9. The system as recited in claim 8, further comprising:
the standard characteristic data acquisition unit is used for acquiring standard characteristic data corresponding to various multi-mode interaction data in advance, and the standard characteristic data acquisition unit specifically acquires the standard characteristic data in the following mode:
recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;
recording and obtaining facial image features of a user as standard feature data of the facial data of the user;
recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;
recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;
recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;
recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;
recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.
10. The system of claim 9, wherein the multi-modal data collaborative human-computer interaction device is connected to a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module, and a key module for acquiring fingerprint data, face data, sound wave data, hand data, vehicle terminal key data, and vehicle screen touch data corresponding to the user.
11. The system of claim 9 or 10, wherein the matching processing unit further comprises:
the authentication matching processing unit is used for respectively matching the acquired multi-mode interaction data with the corresponding standard characteristic data when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, and respectively obtaining a face authentication request, a fingerprint authentication request and an iris authentication request after the matching is successful;
the lip command request matching unit is used for matching the feature data corresponding to the lip image with the basic lip data when the acquired multi-mode interaction data is the lip image, and obtaining the corresponding lip command request after successful matching;
the voice command request matching unit is used for matching the voice data with basic voiceprint data when the acquired multi-mode interaction data are voice data, and acquiring a corresponding voice command request of a keyword corresponding to the voice after the voice data are successfully matched;
the hand instruction request matching unit is used for matching the hand data with the basic hand data when the acquired multi-mode interaction data are the hand data, and obtaining a hand instruction request corresponding to the gesture or the hand type;
and the expression instruction request matching unit is used for matching the facial expression with the basic expression data when the acquired multi-modal interaction data is a facial expression image, so as to obtain an expression instruction request corresponding to the expression.
12. The system of claim 11, wherein the priority packet association unit further comprises:
the identity authentication intention association unit is used for confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention and associating a first group of priorities;
the action intention association unit is used for confirming the key request, the touch control click request and the hand instruction request as action intention and associating the action intention with a second group of priorities;
the semantic intention association unit confirms the lip instruction request and the voice instruction request as semantic intention and associates the lip instruction request and the voice instruction request as a third group of priority;
an emotion intention association unit configured to confirm the emotion instruction request as an emotion intention and associate the emotion intention as a fourth group of priorities;
the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities also corresponds to a different priority.
13. The system of claim 12, wherein the user intent arbitration unit further comprises:
an arbitration unit, configured to perform priority arbitration on each obtained intention within a predetermined time, and regarding a user intention corresponding to a request with a highest priority as a selected user intention;
the output unit is used for outputting the instruction request corresponding to the selected user intention determined by the arbitration unit;
and the interruption unit is used for interrupting the current output when detecting that the user intention with higher priority appears in real time in the process of outputting the instruction request corresponding to the user intention.
14. A vehicle-mounted multimedia device connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module, wherein the vehicle-mounted multimedia device comprises the multi-mode data collaborative human-computer interaction system according to any one of claims 8, 9 and 11-13.
CN201910225815.2A 2019-03-25 2019-03-25 Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction Active CN111737670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910225815.2A CN111737670B (en) 2019-03-25 2019-03-25 Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910225815.2A CN111737670B (en) 2019-03-25 2019-03-25 Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction

Publications (2)

Publication Number Publication Date
CN111737670A CN111737670A (en) 2020-10-02
CN111737670B true CN111737670B (en) 2023-08-18

Family

ID=72645922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910225815.2A Active CN111737670B (en) 2019-03-25 2019-03-25 Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction

Country Status (1)

Country Link
CN (1) CN111737670B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113460067B (en) * 2020-12-30 2023-06-23 安波福电子(苏州)有限公司 Human-vehicle interaction system
CN113591659B (en) * 2021-07-23 2023-05-30 重庆长安汽车股份有限公司 Gesture control intention recognition method and system based on multi-mode input
WO2023005362A1 (en) * 2021-07-30 2023-02-02 深圳传音控股股份有限公司 Processing method, processing device and storage medium
CN114348000A (en) * 2022-02-15 2022-04-15 安波福电子(苏州)有限公司 Driver attention management system and method
CN116434027A (en) * 2023-06-12 2023-07-14 深圳星寻科技有限公司 Artificial intelligent interaction system based on image recognition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385860A (en) * 2010-08-26 2012-03-21 索尼公司 Information processing apparatus, information processing method, and program
CN106239506A (en) * 2016-08-11 2016-12-21 北京光年无限科技有限公司 The multi-modal input data processing method of intelligent robot and robot operating system
CN106489148A (en) * 2016-06-29 2017-03-08 深圳狗尾草智能科技有限公司 A kind of intention scene recognition method that is drawn a portrait based on user and system
CN106569613A (en) * 2016-11-14 2017-04-19 中国电子科技集团公司第二十八研究所 Multi-modal man-machine interaction system and control method thereof
CN106845624A (en) * 2016-12-16 2017-06-13 北京光年无限科技有限公司 The multi-modal exchange method relevant with the application program of intelligent robot and system
CN107728780A (en) * 2017-09-18 2018-02-23 北京光年无限科技有限公司 A kind of man-machine interaction method and device based on virtual robot
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102405463B (en) * 2009-04-30 2015-07-29 三星电子株式会社 Utilize the user view reasoning device and method of multi-modal information
CN106776936B (en) * 2016-12-01 2020-02-18 上海智臻智能网络科技股份有限公司 Intelligent interaction method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385860A (en) * 2010-08-26 2012-03-21 索尼公司 Information processing apparatus, information processing method, and program
CN106489148A (en) * 2016-06-29 2017-03-08 深圳狗尾草智能科技有限公司 A kind of intention scene recognition method that is drawn a portrait based on user and system
CN106239506A (en) * 2016-08-11 2016-12-21 北京光年无限科技有限公司 The multi-modal input data processing method of intelligent robot and robot operating system
CN106569613A (en) * 2016-11-14 2017-04-19 中国电子科技集团公司第二十八研究所 Multi-modal man-machine interaction system and control method thereof
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN106845624A (en) * 2016-12-16 2017-06-13 北京光年无限科技有限公司 The multi-modal exchange method relevant with the application program of intelligent robot and system
CN107728780A (en) * 2017-09-18 2018-02-23 北京光年无限科技有限公司 A kind of man-machine interaction method and device based on virtual robot
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑彬彬 等.基于多模态信息融合的语音意图理解方法.《中国科技论文在线》.2011,第6卷(第7期),第495-500页. *

Also Published As

Publication number Publication date
CN111737670A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111737670B (en) Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction
KR102222421B1 (en) Save metadata related to captured images
EP3982236B1 (en) Invoking automated assistant function(s) based on detected gesture and gaze
US10733987B1 (en) System and methods for providing unplayed content
US10067740B2 (en) Multimodal input system
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
CN110148416A (en) Audio recognition method, device, equipment and storage medium
CN110727346A (en) Man-machine interaction method and device, vehicle and storage medium
CN107533599B (en) Gesture recognition method and device and electronic equipment
JP2022095768A (en) Method, device, apparatus, and medium for dialogues for intelligent cabin
US20210110815A1 (en) Method and apparatus for determining semantic meaning of pronoun
CN111179935A (en) Voice quality inspection method and device
CN115291724A (en) Man-machine interaction method and device, storage medium and electronic equipment
CN109032345A (en) Apparatus control method, device, equipment, server-side and storage medium
CN112908325B (en) Voice interaction method and device, electronic equipment and storage medium
Pinto et al. Audiovisual classification of group emotion valence using activity recognition networks
CN111444321A (en) Question answering method, device, electronic equipment and storage medium
WO2024179519A1 (en) Semantic recognition method and apparatus
CN114595692A (en) Emotion recognition method, system and terminal equipment
CN118259747A (en) Multi-mode interaction method, device, controller, system, automobile and storage medium
CN112951216B (en) Vehicle-mounted voice processing method and vehicle-mounted information entertainment system
CN112417197B (en) Sorting method, sorting device, machine readable medium and equipment
CN115981542A (en) Intelligent interactive touch control method, system, equipment and medium for touch screen
CN115019788A (en) Voice interaction method, system, terminal equipment and storage medium
CN113448429A (en) Method and device for controlling electronic equipment based on gestures, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant