CN111737670B

CN111737670B - Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction

Info

Publication number: CN111737670B
Application number: CN201910225815.2A
Authority: CN
Inventors: 冉光伟; 张莹; 王金华; 张宗煜
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2023-08-18
Anticipated expiration: 2039-03-25
Also published as: CN111737670A

Abstract

The application discloses a method, a system and a vehicle-mounted multimedia device for collaborative man-machine interaction of multi-mode data. The method comprises the following steps: step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device; step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data; step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping; and step S13, performing priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention. By implementing the method and the device, the cooperation of the multi-mode data in the human-computer interaction party can be realized, so that the human-computer interaction experience of the vehicle-mounted terminal is improved.

Description

Method, system and vehicle-mounted multimedia device for multi-mode data collaborative man-machine interaction

Technical Field

The application belongs to the field of vehicle-mounted terminals, and particularly relates to a method and a system for collaborative human-computer interaction of multi-mode data and a vehicle-mounted multimedia device.

Background

The man-machine interaction system in the current vehicle-mounted terminal generally adopts a voice recognition system, a gesture recognition system, a face recognition system, a fingerprint recognition system and a software/hardware recognition system. However, these existing systems are independent algorithm logic systems, and cannot realize effective coordination of data.

For example, in some prior art, a single speech recognition system can only provide content that directly acts on the recognized keywords or keywords, but cannot locate the content of the display interface; the single gesture recognition system can only provide the content in the content of the display interface for operation, and can not acquire the keywords or the information of the keywords of the user. For example, in some examples, only the content of the voice is subjected to data decomposition, and the decomposed data is subjected to reclassifying definition, so that coordination fluency of man-machine interaction cannot be solved; or, only the environment where the current user is located and the physiological state of the current user are identified, but specific interactive contents cannot be realized; or, the interaction between semantic recognition and gesture recognition is involved, the system needs to accurately analyze the semantic meaning first, then analyze the gesture motion, and calculate the result wanted by the user, for example, in some examples, the user's voice "measurement length", the gesture motion "clicks" one point of the object as a measurement starting point, and then the system can calculate the length of the object by using the gesture motion "clicking" another point of the object as an end point, which has the problems of insufficient fluency and insufficient accuracy of man-machine interaction in the prior art.

Disclosure of Invention

The technical problem to be solved by the embodiment of the application is to provide a method, a system and a vehicle-mounted multimedia device for collaborative human-computer interaction of multi-modal data, which can realize the collaborative problem of multi-modal input data and can more accurately identify the actual intention of user interaction.

In one aspect of the present application, a method for collaborative human-computer interaction of multi-modal data is provided, which includes the following steps:

step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device, wherein the multi-mode interaction data comprises at least one of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;

step S11, performing characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;

step S12, associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data, and carrying out priority grouping;

and step S13, performing priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention, and outputting the user intention.

Wherein the method further comprises the steps of:

the step of obtaining standard characteristic data corresponding to various multi-mode interaction data in advance specifically comprises the following steps:

recording and obtaining the thread pattern characteristics of the user fingerprint as standard characteristic data of the user fingerprint data;

recording and obtaining facial image features of a user as standard feature data of the facial data of the user;

recording and obtaining iris pattern characteristics of a user as standard characteristic data of iris data of the user;

recording and obtaining voiceprint characteristics in user voice as standard characteristic data of user voice data;

recording and obtaining lip characteristics of the users corresponding to the keywords, and taking the lip characteristics as standard characteristic data of the lip data of the users;

recording and obtaining hand type/gesture characteristics of a user corresponding to each instruction, wherein the hand type/gesture characteristics are used as standard characteristic data of hand data;

recording and obtaining facial expression characteristics of the user corresponding to each instruction, and taking the facial expression characteristics as standard characteristic data of facial expression data.

Wherein, the step S10 includes:

and the fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to the user are acquired through a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module which are connected with the vehicle-mounted multimedia device.

Wherein, the step S11 further includes:

when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, respectively matching the acquired multi-mode interaction data with corresponding standard feature data, and respectively acquiring a face authentication request, a fingerprint authentication request and an iris authentication request after successful matching;

when the acquired multi-mode interaction data are lip images, matching the feature data corresponding to the lip images with basic lip data, and obtaining corresponding lip instruction requests after successful matching;

when the acquired multi-mode interaction data are voice data, matching the voice data with basic voiceprint data, and obtaining a voice command request corresponding to a keyword corresponding to the voice after successful matching;

when the acquired multi-mode interaction data are hand data, matching the hand data with basic hand data to obtain a hand instruction request corresponding to the gesture or hand;

when the acquired multi-modal interaction data is a facial expression image, matching the facial expression with basic expression data to obtain an expression instruction request corresponding to the expression.

Wherein, the step S12 further includes:

confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention, and associating a first group of priorities;

confirming a key request, a touch control click request and the hand instruction request as action intentions, and associating the action intentions with a second group of priorities;

confirming the lip instruction request and the voice instruction request as semantic intents and associating the semantic intents with a third group of priorities;

confirming the expression instruction request as emotion intention and associating the emotion instruction request as a fourth group of priorities;

the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities corresponds to a different priority.

Wherein, the step S13 further includes:

and carrying out priority arbitration on each obtained intention within a preset time, taking the user intention corresponding to the request with the highest priority as the selected user intention, and outputting an instruction request corresponding to the user intention.

Wherein, the step S13 further includes:

if the user intention with higher priority is detected to appear in real time in the process of outputting the instruction request corresponding to the user intention, the current output is interrupted.

Correspondingly, in another aspect of the embodiment of the present application, there is also provided a system for collaborative human-computer interaction with multi-modal data, including:

the multi-mode interaction data acquisition unit is used for acquiring multi-mode interaction data in real time through the vehicle-mounted multimedia device, wherein the multi-mode interaction data comprise at least one of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;

the matching processing unit is used for carrying out characteristic acquisition on each type of multi-mode interaction data acquired in real time, and respectively comparing the acquired multi-mode interaction data with the pre-stored standard characteristic data corresponding to the type of multi-mode interaction data to acquire user intention corresponding to each type of multi-mode interaction data;

the priority grouping association unit is used for associating a corresponding priority for the user intention corresponding to each type of multi-mode interaction data and carrying out priority grouping;

and the user intention arbitration unit is used for carrying out priority arbitration on the various user intentions according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.

Wherein, further include:

the standard characteristic data acquisition unit is used for acquiring standard characteristic data corresponding to various multi-mode interaction data in advance, and the standard characteristic data acquisition unit specifically acquires the standard characteristic data in the following mode:

The device for the multi-mode data collaborative human-computer interaction is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module and is used for acquiring fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data corresponding to a user.

Wherein the matching processing unit further includes:

the authentication matching processing unit is used for respectively matching the acquired multi-mode interaction data with the corresponding standard characteristic data when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, and respectively obtaining a face authentication request, a fingerprint authentication request and an iris authentication request after the matching is successful;

the lip command request matching unit is used for matching the feature data corresponding to the lip image with the basic lip data when the acquired multi-mode interaction data is the lip image, and obtaining the corresponding lip command request after successful matching;

the voice command request matching unit is used for matching the voice data with basic voiceprint data when the acquired multi-mode interaction data are voice data, and acquiring a corresponding voice command request of a keyword corresponding to the voice after the voice data are successfully matched;

the hand instruction request matching unit is used for matching the hand data with the basic hand data when the acquired multi-mode interaction data are the hand data, and obtaining a hand instruction request corresponding to the gesture or the hand type;

and the expression instruction request matching unit is used for matching the facial expression with the basic expression data when the acquired multi-modal interaction data is a facial expression image, so as to obtain an expression instruction request corresponding to the expression.

Wherein the priority packet associating unit further includes:

the identity authentication intention association unit is used for confirming the face authentication request, the fingerprint authentication request and the iris authentication request as identity authentication intention and associating a first group of priorities;

the action intention association unit is used for confirming the key request, the touch control click request and the hand instruction request as action intention and associating the action intention with a second group of priorities;

the semantic intention association unit confirms the lip instruction request and the voice instruction request as semantic intention and associates the lip instruction request and the voice instruction request as a third group of priority;

an emotion intention association unit configured to confirm the emotion instruction request as an emotion intention and associate the emotion intention as a fourth group of priorities;

the priority levels of the first group of priorities, the second group of priorities, the third group of priorities and the fourth group of priorities are arranged from high to low; each type of request in each set of priorities also corresponds to a different priority.

Wherein the user intention arbitration unit further includes:

an arbitration unit, configured to perform priority arbitration on each obtained intention within a predetermined time, and regarding a user intention corresponding to a request with a highest priority as a selected user intention;

the output unit is used for outputting the instruction request corresponding to the selected user intention determined by the arbitration unit;

and the interruption unit is used for interrupting the current output when detecting that the user intention with higher priority appears in real time in the process of outputting the instruction request corresponding to the user intention.

Correspondingly, in still another aspect of the present application, a vehicle-mounted multimedia device is further provided, which is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module, and is characterized in that the vehicle-mounted multimedia device comprises the system for collaborative man-machine interaction of multi-mode data.

The embodiment of the application has the following beneficial effects:

according to the method, the system and the vehicle-mounted multimedia device for the multi-mode data collaborative human-computer interaction, provided by the application, the multi-mode sensor is utilized to collect the identity characteristic information and the intention information of the driver, so that the multi-mode human-computer interaction data can be collected;

in the embodiment of the application, the priority ranking can be carried out on various characteristics of multiple modes, and multiple input sources can be processed to determine corresponding priority levels; meanwhile, data fusion is carried out on various characteristics of multiple modes, so that accuracy of multi-dimensional input source intention can be improved;

in the embodiment of the application, each intention of data fusion can be arbitrated, and interrupt processing is realized by utilizing the priority of multiple intents and multiple outputs; the system can more accurately and rapidly identify the intention of the driver, improve the diversity and accuracy of the man-machine interaction scene to identify the intention of the driver, and improve the interaction flexibility.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that it is within the scope of the application to one skilled in the art to obtain other drawings from these drawings without inventive faculty.

FIG. 1 is a schematic view of an application environment of a method for collaborative human-computer interaction of multimodal data provided by the present application;

FIG. 2 is a schematic diagram of a main flow of an embodiment of a method for collaborative human-computer interaction with multimodal data provided by the present application;

FIG. 3 is a more detailed flow chart of one embodiment of step S13 of FIG. 2;

FIG. 4 is a schematic structural diagram of an embodiment of a system for collaborative human-computer interaction with multimodal data provided by the present application;

FIG. 5 is a schematic diagram of the matching processing unit of FIG. 4;

fig. 6 is a schematic diagram of a structure of the priority packet associating unit in fig. 4;

fig. 7 is a schematic diagram of a structure of the user intention arbitration unit in fig. 4.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent.

The method for collaborative human-computer interaction of the multi-mode data provided by the application can be applied to an application environment shown in figure 1. The vehicle-mounted multimedia device communicates with each sensor (acquisition module) through a bus. The vehicle-mounted multimedia device may be regarded as an electronic device including a processor, a nonvolatile storage medium, an internal memory, and an input/output device (such as a touch screen module, a key module, and a network interface) connected through a system bus. The system for the multi-mode data collaborative human-computer interaction of the vehicle-mounted multimedia device is used for realizing a method for the multi-mode data collaborative human-computer interaction. The processor is configured to provide computing and control capabilities to support the operation of the entire in-vehicle multimedia device. An internal memory in the vehicle-mounted multimedia device provides an environment for the operation of the system of the multi-mode data collaborative human-computer interaction in the nonvolatile storage medium. Specifically, the system for collaboration of the multi-modal data and man-machine interaction may perform matching analysis, priority association and priority negotiation on the multi-modal data acquired by the acquisition module, so as to obtain a real intention of the user, and specific details will be described later. The vehicle-mounted multimedia device can be a vehicle-mounted terminal, and can be various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.

FIG. 2 is a schematic diagram of a main flow of an embodiment of a method for collaborative human-computer interaction with multimodal data provided by the present application; in this embodiment, the method for collaborative human-computer interaction with multi-modal data is implemented by a system for collaborative human-computer interaction with multi-modal data in a multimedia-loaded device as in fig. 1, and specifically, the method includes:

In order to facilitate understanding of the present application, each step to which the present application relates will be described in detail with reference to specific examples.

It can be understood that in the implementation process of the present application, before step S10, a step of obtaining in advance standard feature data corresponding to various multi-modal interaction data is required, where the step specifically includes:

Wherein, the step S10 includes:

Wherein, the step S11 further includes: and carrying out feature extraction on the acquired multi-mode interaction data of each type and carrying out matching on the acquired multi-mode interaction data and the standard feature data corresponding to each type. The process of feature extraction for each type of multimode interaction data, such as feature cutting for collected face and lip data, semantic processing for voice data, etc., is well known in the art, and is easily understood and available to those skilled in the art, and is not described in detail herein. The following mainly describes the process of feature matching. The method specifically comprises the following steps:

when the acquired multi-mode interaction data are a face image, a fingerprint image and an iris image, respectively matching the acquired multi-mode interaction data with corresponding standard feature data, and respectively acquiring a face authentication request, a fingerprint authentication request and an iris authentication request after successful matching; specifically, fingerprint data collected in real time and corresponding standard feature data are matched, if matching similarity is checked to be larger than a set threshold, the matching is considered to pass, an output result is 1, otherwise, the output result is-1, when the output result is-1, other fingerprints are traversed to be matched, and if a certain fingerprint cannot be successfully matched, waiting time can be increased each time. Similarly, matching the face image with corresponding standard feature data, if the matching similarity is larger than a set threshold value, indicating that the matching is successful, outputting a result of 1, otherwise outputting a result of-1; if the matching is unsuccessful, the waiting time can be prolonged, and the next matching is performed; also, a similar matching process is employed for the matching of iris images. It will be appreciated that the three authentication methods are shown here as matching processes, and that in practical applications only one or a few of them may be used.

When the acquired multi-mode interaction data are lip images, matching the feature data corresponding to the lip images with basic lip data, and obtaining corresponding lip instruction requests after successful matching (the similarity of the feature data and the basic lip data is larger than a preset threshold value);

when the acquired multi-mode interaction data are voice data, matching the voice data with basic voiceprint data, outputting keywords or keywords with similar pronunciation after successful matching (the similarity of the voice data and the basic voiceprint data is larger than a preset threshold), and searching corresponding voice instructions in an engine library through the keywords or keywords so as to obtain corresponding voice instruction requests of the keywords corresponding to the voice;

when the acquired multi-mode interaction data are hand data, matching the hand data with basic hand data, and after successful matching (the similarity of the hand data and the basic hand data is greater than a preset threshold value), obtaining a hand instruction request corresponding to the gesture or the hand;

when the acquired multi-mode interaction data is a facial expression image, matching the facial expression with basic expression data, and obtaining an expression instruction request corresponding to the expression after successful matching (the similarity of the facial expression and the basic expression data is larger than a preset threshold value).

Wherein, the step S12 further includes:

In a specific example, the face identity matching passing request priority level may be set to A1, the iris identity matching passing request priority level may be set to A2, the fingerprint identity matching passing request priority level may be set to A3, the key request priority level may be set to B1, the touch point touch request priority level may be set to B2, the hand instruction request priority level may be set to B3, the lip instruction request priority level may be set to C2, the voiceprint instruction request priority level may be set to C1, and the expression instruction request priority level may be set to D.

Classifying the matching results corresponding to the priorities A1, A2 and A3, and determining the matching results as identity authentication intents with the priorities A; within the group, A1 has a priority greater than that of A2, A2 has a priority greater than that of A3;

classifying the matching results corresponding to the priorities B1, B2 and B3, and determining the action intention of the group B; within the group, B1 has a priority greater than that of B2, and B2 has a priority greater than that of B3;

classifying the matching results corresponding to the priorities C1, C2 and C3, and determining the semantic intention of the priority C group; within the group, the priority of C1 is greater than the priority of C2, and the priority of C2 is greater than the priority of C3;

classifying the matching results corresponding to the priority level D, and determining the emotion intention of the priority level D group.

Overall, the priority of group a is greater than the priority of group B, which is greater than the priority of group C, which is greater than the priority of group D.

Wherein, the step S13 further includes:

In a specific example, the step S13 may use a high-bottom according to priority to perform real-time judgment, and may specifically include the following steps:

step S130, confirming that the identity characteristic authentication passes, namely the identity authentication of the group A priority is intended to pass;

step S131, judging whether to execute action intention; namely, judging whether the action intention of the B group priority exists or not, if so, outputting a corresponding action intention result in the step S134; otherwise, go to step S132;

step S132, judging whether to execute semantic intention; namely judging whether the semantic intention of the priority of the group C exists or not, if so, outputting a corresponding semantic intention result in the step S134; otherwise, go to step S133;

step S133, judging whether to execute the expression intention; judging whether the expression intention of the priority of the group D exists or not, and if so, outputting a corresponding expression intention result in the step S134; otherwise, the process goes to step S131 to resume the determination flow.

FIG. 4 is a schematic diagram illustrating the architecture of one embodiment of a system 2 for collaborative human-machine interaction with multimodal data provided by the present application; which may be implemented in the application environment of fig. 1, in conjunction with fig. 5-7. In this embodiment, the system 2 comprises:

the multi-modal interaction data collection unit 25 is configured to collect multi-modal interaction data in real time through the vehicle-mounted multimedia device, where the multi-modal interaction data includes at least one of user fingerprint data, face data, acoustic wave data, hand data, vehicle-mounted terminal key data, and vehicle-mounted screen touch data;

the matching processing unit 20 is configured to perform feature acquisition on each type of multimodal interaction data acquired in real time, and compare the acquired multimodal interaction data with pre-stored standard feature data corresponding to the type of multimodal interaction data respectively to obtain user intention corresponding to each type of multimodal interaction data;

a priority grouping association unit 21, configured to associate a corresponding priority for the user intention corresponding to each type of the multi-mode interaction data, and perform priority grouping;

and the user intention arbitration unit 22 is configured to perform priority arbitration on the various user intentions according to the priority grouping information, determine one of the user intentions as the selected user intention, and output the determined user intention.

Wherein, further include: the standard feature data obtaining unit 24 is configured to obtain standard feature data corresponding to various multi-mode interaction data in advance, where the standard feature data obtaining unit specifically obtains the standard feature data in the following manner:

The device for the multi-mode data collaborative human-computer interaction is connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module and is used for acquiring corresponding user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data.

Wherein the matching process unit 20 further comprises:

the authentication matching processing unit 200 is configured to match the acquired multimodal interaction data with corresponding standard feature data when the acquired multimodal interaction data is a face image, a fingerprint image, and an iris image, and obtain a face authentication request, a fingerprint authentication request, and an iris authentication request after the matching is successful;

the lip command request matching unit 201 is configured to match feature data corresponding to the lip image with basic lip data when the acquired multimodal interaction data is a lip image, and obtain a corresponding lip command request after the matching is successful;

the voice command request matching unit 202 is configured to, when the acquired multimodal interaction data is voice data, match the voice data with basic voiceprint data, and obtain a corresponding voice command request of a keyword corresponding to the voice after the matching is successful;

the hand instruction request matching unit 203 is configured to match the hand data with the base hand data when the acquired multimodal interaction data is hand data, so as to obtain a hand instruction request corresponding to the gesture or the hand pattern;

and the expression instruction request matching unit 204 is configured to match the facial expression with the basic expression data when the acquired multimodal interaction data is a facial expression image, so as to obtain an expression instruction request corresponding to the expression.

Wherein the priority packet associating unit 21 further includes:

an identity authentication intention association unit 210, configured to confirm the face authentication request, the fingerprint authentication request, and the iris authentication request as identity authentication intention, and associate a first set of priorities;

an action intention association unit 211, configured to confirm a key request, a touch click request, and the hand instruction request as action intention, and associate the action intention with a second group of priorities;

a semantic intention association unit 212, configured to identify the lip instruction request and the voice instruction request as semantic intention, and associate the lip instruction request and the voice instruction request as a third group of priorities;

an emotional intent association unit 213 for confirming the emotional instruction request as an emotional intent and associating as a fourth group of priorities;

Wherein the user intention arbitration unit 22 further includes:

an arbitration unit 220, configured to arbitrate the obtained intents in priority for each type, and take the user intention corresponding to the request with the highest priority as the selected user intention;

an output unit 221, configured to output an instruction request corresponding to the selected user intention determined by the arbitration unit;

and the interruption unit 222 is configured to interrupt the current output when detecting that the user intention with the higher priority appears in real time during the process of outputting the instruction request corresponding to the user intention.

For more details, reference is made to the description of fig. 2 above, and this is not described in detail here.

The embodiment of the application has the following beneficial effects:

the application provides a method, a system and a vehicle-mounted multimedia device for collaborative man-machine interaction of multi-mode data. The multi-mode sensor is utilized to collect the identity characteristic information and the intention information of the driver, so that multi-mode man-machine interaction data can be collected;

The above disclosure is only a preferred embodiment of the present application, and it is needless to say that the scope of the application is not limited thereto, and therefore, the equivalent changes according to the claims of the present application still fall within the scope of the present application.

Claims

1. The method for collaborative human-computer interaction of the multi-mode data is characterized by comprising the following steps:

step S10, acquiring multi-mode interaction data in real time through a vehicle-mounted multimedia device, wherein the multi-mode interaction data comprises at least one type of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;

and step S13, carrying out priority arbitration on each user intention according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.

2. The method as recited in claim 1, further comprising the step of:

3. The method according to claim 2, wherein the step S10 includes:

4. A method according to claim 2 or 3, wherein said step S11 further comprises:

5. The method of claim 4, wherein the step S12 further comprises:

6. The method of claim 5, wherein the step S13 further comprises:

7. The method of claim 6, wherein the step S13 further comprises:

8. The system for collaborative human-computer interaction of the multi-mode data is characterized by comprising the following steps:

the multi-mode interaction data acquisition unit is used for acquiring multi-mode interaction data in real time through the vehicle-mounted multimedia device, wherein the multi-mode interaction data comprise at least one type of user fingerprint data, face data, sound wave data, hand data, vehicle-mounted terminal key data and vehicle-mounted screen touch data;

and the user intention arbitration unit is used for carrying out priority arbitration on each user intention according to the priority grouping information, determining one user intention as the selected user intention and outputting the user intention.

9. The system as recited in claim 8, further comprising:

10. The system of claim 9, wherein the multi-modal data collaborative human-computer interaction device is connected to a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module, and a key module for acquiring fingerprint data, face data, sound wave data, hand data, vehicle terminal key data, and vehicle screen touch data corresponding to the user.

11. The system of claim 9 or 10, wherein the matching processing unit further comprises:

12. The system of claim 11, wherein the priority packet association unit further comprises:

13. The system of claim 12, wherein the user intent arbitration unit further comprises:

14. A vehicle-mounted multimedia device connected with a fingerprint acquisition module, an image acquisition module, an audio acquisition module, a touch screen module and a key module, wherein the vehicle-mounted multimedia device comprises the multi-mode data collaborative human-computer interaction system according to any one of claims 8, 9 and 11-13.