CN116543452A - Gesture recognition and gesture interaction method and device - Google Patents
Gesture recognition and gesture interaction method and device Download PDFInfo
- Publication number
- CN116543452A CN116543452A CN202310363839.0A CN202310363839A CN116543452A CN 116543452 A CN116543452 A CN 116543452A CN 202310363839 A CN202310363839 A CN 202310363839A CN 116543452 A CN116543452 A CN 116543452A
- Authority
- CN
- China
- Prior art keywords
- gesture
- key point
- point data
- hand key
- gesture recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 230000003993 interaction Effects 0.000 title claims abstract description 72
- 230000006870 function Effects 0.000 claims abstract description 31
- 238000010606 normalization Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 38
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 230000033001 locomotion Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 210000003811 finger Anatomy 0.000 description 24
- 210000002569 neuron Anatomy 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 238000006073 displacement reaction Methods 0.000 description 5
- 244000060701 Kaempferia pandurata Species 0.000 description 4
- 235000016390 Uvaria chamae Nutrition 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 210000003813 thumb Anatomy 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000003238 somatosensory effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 241000203475 Neopanax arboreus Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004424 eye movement Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a gesture recognition and gesture interaction method and device, wherein the gesture recognition method comprises the following steps: determining hand key point data to be identified; inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified; the gesture recognition model is obtained by training based on sample hand key point data and gesture type labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data. The gesture recognition method and the gesture recognition device can realize accurate and rapid gesture recognition, can realize man-machine interaction with multiple functions, and have higher stability and robustness.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a gesture recognition and gesture interaction method and device.
Background
The naked eye 3D display technology can reproduce a three-dimensional image with space depth information through the three-dimensional display equipment, a viewer can observe three-dimensional imaging through naked eyes on the premise of not wearing any auxiliary equipment, and the observation difficulty of a three-dimensional display user side is greatly reduced, so that the naked eye 3D display technology is considered to be an important development direction of the future display field. In the process of interaction with the naked eye 3D display, the traditional interaction modes such as a keyboard and a mouse are difficult to meet the information exchange requirement of a user and are more novel, and the novel interaction modes gradually become preferential choices such as voice interaction, eye movement tracking, gesture interaction and the like. Because a large amount of information can be conveyed between people through gestures, high-speed communication interaction is realized, gesture interaction is an important interaction technology of a 3D light field, and gesture recognition is also a key research content of experts and scholars in the field of man-machine interaction.
The traditional gesture recognition method usually adopts RGBD image sequences to classify gestures through deep learning, however, the method has the problems of long image processing time, low image definition and the like, so that the traditional method cannot ensure the recognition accuracy of complex gestures, and the method has negative influence on the follow-up naked eye 3D light field man-machine interaction based on gesture recognition results.
Disclosure of Invention
The invention provides a gesture recognition and gesture interaction method and device, which are used for solving the defects of low accuracy of complex gesture recognition and difficult man-machine interaction in the prior art, realizing accurate and rapid gesture recognition, realizing man-machine interaction with multiple functions and having higher stability and robustness.
The invention provides a gesture recognition method, which comprises the following steps:
determining hand key point data to be identified;
inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture type labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
According to the gesture recognition method provided by the invention, the determination of the sample hand key point data comprises the following steps:
determining a target coordinate system;
and carrying out pose normalization on the initial sample hand key point data based on the target coordinate system to obtain the sample hand key point data.
According to the gesture recognition method provided by the invention, the gesture normalization is performed on the initial sample hand key point data based on the target coordinate system to obtain the sample hand key point data, and the gesture recognition method comprises the following steps:
performing pose normalization on the initial sample hand key point data based on the target coordinate system to obtain normalized hand key point data;
and acquiring intra-class data differences of normalized hand key point data of each gesture class, and determining normalized hand key point data with the intra-class data differences smaller than or equal to a first threshold value as the sample hand key point data.
According to the gesture recognition method provided by the invention, the gesture normalization is performed on the initial sample hand key point data based on the target coordinate system to obtain normalized hand key point data, and the gesture recognition method comprises the following steps:
acquiring palm key points, palm direction vectors and finger direction vectors of the initial sample hand key point data;
and shifting the palm key point to the original point of the target coordinate system, and rotating the palm direction vector and the finger direction vector to the direction of the coordinate axis of the target coordinate system to obtain the normalized hand key point data.
According to the gesture recognition method provided by the invention, the gesture recognition model is trained by the following modes:
determining a multi-layer perceptron neural network;
inputting the sample hand key point data into the multi-layer perceptron neural network to obtain a predicted gesture category label corresponding to the sample hand key point data;
and updating model parameters of the multi-layer sensor neural network according to the gesture type label and the predicted gesture type label so as to train and obtain the gesture recognition model.
The invention also provides a gesture interaction method, which comprises the following steps:
acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on the gesture identification method;
and determining the target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
The invention also provides a gesture recognition device, which comprises:
the acquisition module is used for determining hand key point data to be identified;
the recognition module is used for inputting the hand key point data to be recognized into a gesture recognition model to obtain a target gesture category corresponding to the hand key point data to be recognized;
the gesture recognition model is obtained by training based on sample hand key point data and gesture type labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
According to the gesture recognition device provided by the invention, the acquisition module adopts the somatosensory controller, and the hand key points to be recognized comprise three-dimensional coordinates of the hand key points and hand instant motion direction vectors.
The invention also provides a gesture interaction device, which comprises:
the gesture recognition module is used for acquiring the hand key point data to be recognized and determining a target gesture category corresponding to the hand key point data to be recognized based on the gesture recognition method;
and the gesture interaction module is used for determining a target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the gesture recognition method or the gesture interaction method when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a gesture recognition method or a gesture interaction method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a gesture recognition method or a gesture interaction method as described in any of the above.
According to the gesture recognition and gesture interaction method and device, the traditional image is replaced by collecting the hand key point data, so that the accuracy and the robustness of gesture data and gesture classification recognition can be improved. Meanwhile, the gesture recognition model is obtained by training sample hand key point data and gesture category labels corresponding to the sample hand key point data, so that the gesture recognition model has a good recognition effect on complex gesture data. In addition, the sample hand key point data used in training is obtained after pose normalization of the initial sample hand key point data, and accuracy and robustness of gesture classification and recognition can be effectively improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a gesture recognition method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of deep learning of an MLP neural network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a gesture recognition apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a gesture interaction method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of light field display and human-computer interaction provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a gesture recognition apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes a gesture recognition method of the present invention with reference to fig. 1-2, and as shown in fig. 1, an embodiment of the present invention discloses a gesture recognition method, which at least includes the following steps:
step 101, determining hand key point data to be identified;
102, inputting hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture category labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
It should be noted that, the hand key point data to be identified is collected by a Motion controller (Leap Motion), and the Motion controller can collect three-dimensional coordinates of the hand skeleton key points of the human body and instantaneous Motion direction of the hand when the collection device samples in an effective identifiable range. The instantaneous motion direction is specifically referred to as a hand instantaneous motion direction vector. The motion sensing controller can track the key point data of 10 fingers of the hands, specifically, the three-dimensional coordinates of the bone key points of each hand comprise the three-dimensional coordinates of the key points of the proximal phalanges of the thumb, the middle phalanges, the distal phalanges of the thumb, the metacarpals of the index finger, the proximal phalanges of the thumb, the middle phalanges of the thumb, the distal phalanges of the finger, the metacarpals of the middle finger, the proximal phalanges of the finger, the middle phalanges of the finger, the distal phalanges of the finger, the proximal phalanges of the ring finger, the proximal phalanges of the finger, the distal phalanges of the finger, etc. As an example, the hand instantaneous motion direction vector may include a direction vector of each hand and obtaining palm down and palm center to middle finger root.
In the initial sample hand key point data, the difference of the gesture collection position and the initial rotation gesture angle may cause great data difference of the same gesture type, if such data is directly sent into a neural network to perform deep learning recognition gesture, the feature learning of a certain gesture is insufficient, and an accurate deep learning gesture recognition result cannot be obtained. The gesture recognition model of the embodiment of the invention adopts the traditional neural network model as a basic model and is obtained by training based on a large number of sample hand key point data, and because the sample key point data is obtained by carrying out pose normalization on initial sample hand key point data, the data used for training the model is effectively input after the pose normalization, thereby greatly reducing the difference of the data under each gesture type label and effectively improving the reliability of input training data and the model recognition precision.
Because the data acquired by the method is hand key point data, the gesture pose corresponding to the current acquired data can be determined based on all key point positions. After inputting the hand key point data to be recognized to the trained gesture recognition model, outputting the detected target gesture type, and then sending the target gesture type into a light field to be applied to the field of light field man-machine interaction.
In addition, it should be noted that, in the embodiment of the present invention, various gestures and actions generated by the hand gesture are defined, and the gesture recognition model mainly recognizes the gesture corresponding to the static gesture. The gesture category may be common human-machine interaction gestures, such as fist making and five-finger opening. The user can also define the user-defined gestures for adapting to the complex human-computer interaction scene, such as C-shaped gestures, extending a certain finger, and the like.
Compared with the traditional method that the gesture image is collected for image recognition so as to obtain the category of the gesture, the gesture recognition method provided by the embodiment of the invention has the advantages that three-dimensional information of each key point of the gesture is obtained by collecting the key point data of the hand, a large number of images are not needed to be used as a training data set, the data size of the key point data of the gesture is small, and the speed of model training can be effectively improved. Meanwhile, aiming at complex gestures, the method has better recognition accuracy, and can be widely applied to subsequent human-computer interaction scenes.
In some embodiments, determining sample hand keypoint data comprises:
determining a target coordinate system;
and carrying out pose normalization on the initial sample hand key point data based on the target coordinate system to obtain sample hand key point data.
It should be noted that, the initial sample hand key point data may be obtained by calling an existing database, or may be obtained by manually collecting the initial sample hand key point data and customizing different gesture category labels corresponding to the initial sample hand key point data. When the initial sample hand key point data is collected, one group of initial sample hand key points corresponds to one gesture, and the target coordinate system is determined by the initial sample hand key point corresponding to any one collected gesture, for example, the coordinate system where the initial sample hand key point corresponding to the first collected gesture is located may be used. Pose normalization based on the target coordinate system means that initial sample hand key point data corresponding to each gesture are unified into the target coordinate system, so that errors caused by difference of acquired spatial positions among the gestures can be avoided.
Specifically, the target coordinate system uses the key point of the palm position in the key points of the initial sample hand in the group as the origin of coordinates, and because each hand and the obtained palm direction vector and the finger root direction vector keep a vertical relationship, the palm direction vector is a direction vector vertical to the palm plane, and therefore the directions of the palm direction vector and the finger root direction vector are taken as two coordinate axes of the target coordinate system, so as to obtain the target coordinate system.
According to the gesture recognition method provided by the embodiment of the invention, the collected initial sample hand key point data of each gesture can be unified to the same coordinate system by setting the target coordinate system, so that the normalization of the pose is realized, the intra-group fluctuation of the training data is reduced, and the model recognition precision is improved.
In some embodiments, pose normalization is performed on initial sample hand keypoint data based on a target coordinate system to obtain sample hand keypoint data, including:
performing pose normalization on the initial sample hand key point data based on a target coordinate system to obtain normalized hand key point data;
and acquiring intra-class data differences of the normalized hand key point data of each gesture class, and determining the normalized hand key point data with the intra-class data differences smaller than or equal to a first threshold value as sample hand key point data.
Although the normalized hand key point data is normalized, the normalized hand key point data may also include data distorted due to shake during acquisition or gesture posture of acquisition. In order to avoid that the data influence the model training result, the embodiment of the invention screens out the thresholds which do not meet the normalization requirement by setting a first threshold before acquiring sample data required by training. The first threshold is a maximum difference for each gesture category.
For any gesture, the coordinate difference of the same position key point between every two sets of data in the multiple sets of normalized hand key point data corresponding to the gesture can be calculated, and the coordinate difference of the same position key point between the multiple sets of normalized hand key point data corresponding to the gesture and the standard key point data of the gesture can also be calculated. And taking the average value of the coordinate differences of all key points of each group of data, comparing the average value of the coordinate differences with a first threshold value as the intra-class data difference of the gesture class, and if the intra-class data difference of the gesture class is smaller than or equal to the first threshold value, determining all groups of normalized hand key point data of the gesture class as sample hand key point data of the gesture class. And sequentially screening the normalized hand key point data of all gesture categories by using a first threshold value, and finally obtaining sample hand key point data.
Specifically, the first threshold value may be set to 5mm to 1cm.
According to the gesture recognition method, normalized hand key point data are further screened by setting the first threshold, the fluctuation of the data in each gesture category of the finally obtained sample hand key point data is smaller, and the classification accuracy after the subsequent model training can be effectively improved.
In some embodiments, pose normalization is performed on initial sample hand keypoint data based on a target coordinate system to obtain normalized hand keypoint data, including:
acquiring palm key points, palm direction vectors and finger direction vectors of initial sample hand key point data;
and shifting the palm key point to the original point of the target coordinate system, and rotating the palm direction vector and the finger direction vector to the direction of the coordinate axis of the target coordinate system to obtain normalized hand key point data.
It should be noted that, since the initial sample hand keypoint data includes three-dimensional coordinate information of all keypoints, and a palm direction vector and a finger direction vector, the palm direction vector includes a direction vector perpendicular to a palm plane at the palm keypoint, and the finger direction vector includes a direction vector from the palm keypoint to any finger root keypoint in the same palm. In general, the palm direction vector and the finger direction vector are perpendicular to each other. According to the embodiment of the invention, the acquired initial sample hand key point data is normalized through displacement transformation and rotation transformation. Because all acquired key point skeleton data are based on the same right-hand Cartesian coordinate system, the gesture in space is subjected to pose transformation, and the data of the same gesture are constrained, so that the effect of preprocessing the hand key point skeleton data is achieved.
Specifically, coordinate displacement is performed on all key points corresponding to each gesture, after the displacement, all gestures in a space are near the origin of a target coordinate system (XYZ), and the palm key points are coincident with the origin of the coordinate system to serve as marks for finishing the displacement. The three-dimensional coordinate point position adjustment is finished, but because the gestures are not adjusted, the same gesture still keeps different gestures in the space, the significant difference between the data is not reduced, and gesture change processing needs to be continuously carried out on gesture rotation. In the embodiment of the invention, the palm direction vector and the finger direction vector are rotated to the same direction of the coordinate axis of the target coordinate system, for example, the palm direction vector is rotated to the positive axis direction of the X axis, and the finger direction vector is rotated to the positive axis direction of the Z axis. Or the palm direction vector is rotated to the X-axis negative axis direction, and the finger direction vector is rotated to the Y-axis positive axis direction
According to the gesture recognition method, the initial sample hand key point data is subjected to displacement change and gesture transformation, so that the palm center key point coordinates of the normalized hand key point data corresponding to the same gesture are located at the coordinate axis origin of the target coordinate system, all other key point skeleton data except the palm center in multiple groups of data of the same gesture fluctuate within a small range, normalization of the same gesture is achieved, and the significant difference between the gesture and the gesture disappears in the acquired same gesture data, so that the reliability of training data is improved.
In some embodiments, the gesture recognition model is trained by:
determining a multi-layer perceptron neural network;
inputting the sample hand key point data into a multi-layer sensor neural network to obtain a predicted gesture category label corresponding to the sample hand key point data;
and updating model parameters of the multi-layer perceptron neural network according to the gesture type label and the predicted gesture type label so as to train and obtain a gesture recognition model.
It should be noted that, in the multi-layer perceptron (Multilayer Perceptron, MLP) neural network according to the embodiment of the present invention, as shown in fig. 2, the Input layer is an Input layer of the network, responsible for inputting data, the hiden layer is a Hidden layer of the network, and is fully connected to the Input layer, the analog neuron, the Output layer is an Output layer of the network, and outputs the classification result. P (P) 0 Is any one of the sample hand keypoint data. The MLP neural network provided by the embodiment of the invention learns the jump rule relation among the data points, takes the data jump relation as gesture characteristics, and is finally used for gesture classification recognition.
Since neural networks are simulations and simulations of animal neuron systems, the basic structure of a multi-layer perceptron MLP can be derived based on a biological neuron model, most typically the MLP comprises three layers: an input layer, a hidden layer, and an output layer. The different layers of the MLP neural network are fully connected, namely any neuron of the upper layer is connected with all neurons of the lower layer. The MLP is composed of a plurality of node layers, each layer is completely connected to the next layer, except for input nodes, each node is a neuron with a nonlinear activation function, the multi-layer perceptron is similar to the working principle of human neurons, can simulate the change of the neurons during human learning, firstly learns, then stores data by using weights, adjusts the weights by using an algorithm and reduces deviation in the training process, and finally achieves the effect of data prediction.
Specifically, in the embodiment of the invention, the number of nodes of the gesture output layer of the MLP neural network is the number of types of recognized gestures, and the number can be set by a user. Each gesture collects hand key points therein, each key point can be unfolded into a group of sets containing three x, y and z, and the total number of the x, y and z sets of all collected hand key points corresponds to the number of Input Layer nodes of the neural network, and the number of Output Layer nodes is equal to the number of gesture types needing to be classified and identified.
According to the gesture recognition method, the gesture data are normalized, so that the reliability of training data can be improved, and the feature learning of the subsequent MLP neural network can be realized.
The gesture recognition apparatus provided by the present invention is described below, and the gesture recognition apparatus described below and the gesture recognition method described above may be referred to correspondingly to each other. As shown in fig. 3, a gesture recognition apparatus according to an embodiment of the present invention includes:
the acquisition module 301 is configured to determine hand key point data to be identified;
the recognition module 302 is configured to input the hand key point data to be recognized into a gesture recognition model, so as to obtain a target gesture category corresponding to the hand key point data to be recognized;
the gesture recognition model is obtained by training based on sample hand key point data and gesture category labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
Compared with the traditional method of acquiring gesture pictures for image recognition so as to obtain the category of the gesture, the gesture recognition device provided by the embodiment of the invention acquires three-dimensional information of each key point of the gesture by acquiring the hand key point data, and has better recognition accuracy for complex gestures. In addition, the sample hand key point data used in training is obtained after pose normalization of the initial sample hand key point data, and accuracy and robustness of gesture classification and recognition can be effectively improved.
In some embodiments, the acquisition module 301 employs a motion sensing controller, and the hand keypoints to be identified comprise three-dimensional coordinates of the hand keypoints and hand instantaneous motion direction vectors.
It should be noted that, the somatosensory controller can collect three-dimensional information of key points of human bones, and compared with RGBD images obtained by the traditional image collection method, the three-dimensional information can collect more accurate gesture information under the condition of shadows and shielding.
Meanwhile, although the traditional gesture recognition device also has a scheme of performing gesture recognition by adopting a motion sensing controller, different gesture recognition is often realized by acquiring information such as finger angles in the gestures, the gestures need to be defined in advance when the gesture recognition device does not exist, and the gestures with complex finger space structures are difficult to process in the self-defining process.
The embodiment of the invention also discloses a gesture interaction method, which at least comprises the following steps as shown in fig. 4:
step 401, acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on the gesture identification method of the above embodiment;
step 402, determining a target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
It should be noted that, the corresponding relationship between the predefined gesture type and the man-machine interaction function is set by the user in advance and is stored in the controller in advance, and in general, the gesture type and the man-machine interaction function are in one-to-one correspondence, as shown in fig. 5, a display function of the first row of the fist-making gesture type corresponding to the 3D model a may be defined, a display function of the second row of the C-shaped gesture type corresponding to the 3D model B may be defined, and other gesture types may be defined, for example, the five fingers may be opened to correspond to the zoom function. After the key point data of the hand to be identified are acquired, the method performs gesture identification in a deep learning mode, and sends an identification result into a light field to be applied to light field man-machine interaction.
Specifically, step 401 includes:
acquiring key point data of a hand to be identified, inputting a gesture identification model determined by a gesture identification method, and outputting a target gesture type in a label form through an output layer of the gesture identification model; wherein different gestures correspond to different labels.
Step 402 includes:
inputting the target gesture category corresponding to the label information into the light field equipment through data transmission;
after receiving the label information, the light field equipment is compared with a preset label table, and different man-machine interaction functions are triggered by the light field equipment according to different labels;
the corresponding relation between the gesture type and the man-machine interaction function is stored in the tag table.
The gesture interaction method provided by the embodiment of the invention aims at the problems that in the prior art, the time for processing the picture in the gesture scheme through image recognition is long, and meanwhile, the image processing is limited by the definition of the image obtained by the acquisition equipment. A high-precision gesture recognition method is designed to obtain target gesture types, and the influence on the function expansion degree and recognition accuracy of subsequent naked eye 3D light field man-machine interaction is improved.
The gesture interaction device provided by the invention is described below, and the gesture interaction device described below and the gesture interaction method described above can be referred to correspondingly. As shown in fig. 6, the gesture interaction device according to the embodiment of the present invention includes:
the gesture recognition module 601 is configured to obtain hand key point data to be recognized, and determine a target gesture category corresponding to the hand key point data to be recognized based on the gesture recognition method in the above embodiment;
the gesture interaction module 602 is configured to determine a target human-computer interaction function corresponding to the target gesture category based on a predefined correspondence between the gesture category and the human-computer interaction function.
The gesture interaction device provided by the embodiment of the invention aims at the problems that in the prior art, the time for processing the picture in the gesture scheme through image recognition is long, and meanwhile, the processing time is limited by the definition of the image obtained by the acquisition equipment. A high-precision gesture recognition method is designed to obtain target gesture types, and the influence on the function expansion degree and recognition accuracy of subsequent naked eye 3D light field man-machine interaction is improved.
In some embodiments, the apparatus further comprises a light field device, for example, a light field display may be selected, and a man-machine interaction function may be implemented.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a gesture recognition method comprising:
determining hand key point data to be identified;
inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture category labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
Or performing a gesture interaction method, the method comprising:
acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on any gesture identification method;
and determining the target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the gesture recognition method provided by the above methods, the method comprising:
determining hand key point data to be identified;
inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture category labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
Or performing a gesture interaction method, the method comprising:
acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on any gesture identification method;
and determining the target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the gesture recognition method provided by the above methods, the method comprising:
determining hand key point data to be identified;
inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture category labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
Or performing a gesture interaction method, the method comprising:
acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on any gesture identification method;
and determining the target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of gesture recognition, comprising:
determining hand key point data to be identified;
inputting the hand key point data to be identified into a gesture identification model to obtain a target gesture category corresponding to the hand key point data to be identified;
the gesture recognition model is obtained by training based on sample hand key point data and gesture type labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
2. The gesture recognition method of claim 1, wherein determining the sample hand keypoint data comprises:
determining a target coordinate system;
and carrying out pose normalization on the initial sample hand key point data based on the target coordinate system to obtain the sample hand key point data.
3. The gesture recognition method according to claim 2, wherein the performing pose normalization on the initial sample hand keypoint data based on the target coordinate system to obtain the sample hand keypoint data includes:
performing pose normalization on the initial sample hand key point data based on the target coordinate system to obtain normalized hand key point data;
and acquiring intra-class data differences of normalized hand key point data of each gesture class, and determining normalized hand key point data with the intra-class data differences smaller than or equal to a first threshold value as the sample hand key point data.
4. The gesture recognition method according to claim 3, wherein the performing pose normalization on the initial sample hand keypoint data based on the target coordinate system to obtain normalized hand keypoint data includes:
acquiring palm key points, palm direction vectors and finger direction vectors of the initial sample hand key point data;
and shifting the palm key point to the original point of the target coordinate system, and rotating the palm direction vector and the finger direction vector to the direction of the coordinate axis of the target coordinate system to obtain the normalized hand key point data.
5. The gesture recognition method according to any one of claims 1 to 4, wherein the gesture recognition model is trained by:
determining a multi-layer perceptron neural network;
inputting the sample hand key point data into the multi-layer perceptron neural network to obtain a predicted gesture category label corresponding to the sample hand key point data;
and updating model parameters of the multi-layer sensor neural network according to the gesture type label and the predicted gesture type label so as to train and obtain the gesture recognition model.
6. A method of gesture interaction, comprising:
acquiring hand key point data to be identified, and determining a target gesture category corresponding to the hand key point data to be identified based on the gesture identification method of any one of claims 1 to 5;
and determining the target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
7. A gesture recognition apparatus, comprising:
the acquisition module is used for determining hand key point data to be identified;
the recognition module is used for inputting the hand key point data to be recognized into a gesture recognition model to obtain a target gesture category corresponding to the hand key point data to be recognized;
the gesture recognition model is obtained by training based on sample hand key point data and gesture type labels corresponding to the sample hand key point data, and the sample hand key point data is obtained by performing pose normalization on initial sample hand key point data.
8. The gesture recognition apparatus of claim 7, wherein the acquisition module employs a motion sensing controller, and the hand keypoints to be recognized comprise three-dimensional coordinates of the hand keypoints and hand instantaneous motion direction vectors.
9. A gesture interaction device, comprising:
the gesture recognition module is used for acquiring the hand key point data to be recognized and determining a target gesture category corresponding to the hand key point data to be recognized based on the gesture recognition method of any one of claims 1 to 5;
and the gesture interaction module is used for determining a target human-computer interaction function corresponding to the target gesture category based on the corresponding relation between the predefined gesture category and the human-computer interaction function.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the gesture recognition method of any one of claims 1 to 5 or the gesture interaction method of claim 6 when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310363839.0A CN116543452A (en) | 2023-04-06 | 2023-04-06 | Gesture recognition and gesture interaction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310363839.0A CN116543452A (en) | 2023-04-06 | 2023-04-06 | Gesture recognition and gesture interaction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116543452A true CN116543452A (en) | 2023-08-04 |
Family
ID=87447955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310363839.0A Pending CN116543452A (en) | 2023-04-06 | 2023-04-06 | Gesture recognition and gesture interaction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116543452A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118131915A (en) * | 2024-05-07 | 2024-06-04 | 中国人民解放军国防科技大学 | Man-machine interaction method, device, equipment and storage medium based on gesture recognition |
-
2023
- 2023-04-06 CN CN202310363839.0A patent/CN116543452A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118131915A (en) * | 2024-05-07 | 2024-06-04 | 中国人民解放军国防科技大学 | Man-machine interaction method, device, equipment and storage medium based on gesture recognition |
CN118131915B (en) * | 2024-05-07 | 2024-07-12 | 中国人民解放军国防科技大学 | Man-machine interaction method, device, equipment and storage medium based on gesture recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qi et al. | Computer vision-based hand gesture recognition for human-robot interaction: a review | |
Mohandes et al. | Arabic sign language recognition using the leap motion controller | |
CN111222486B (en) | Training method, device and equipment for hand gesture recognition model and storage medium | |
WO2013027091A1 (en) | Systems and methods of detecting body movements using globally generated multi-dimensional gesture data | |
Santhalingam et al. | Sign language recognition analysis using multimodal data | |
CN108073851B (en) | Grabbing gesture recognition method and device and electronic equipment | |
CN111460976B (en) | Data-driven real-time hand motion assessment method based on RGB video | |
CN107832736B (en) | Real-time human body action recognition method and real-time human body action recognition device | |
CN112990154B (en) | Data processing method, computer equipment and readable storage medium | |
Ansar et al. | Robust hand gesture tracking and recognition for healthcare via Recurent neural network | |
KR20230080938A (en) | Method and apparatus of gesture recognition and classification using convolutional block attention module | |
CN111104911A (en) | Pedestrian re-identification method and device based on big data training | |
CN116543452A (en) | Gesture recognition and gesture interaction method and device | |
Nayakwadi et al. | Natural hand gestures recognition system for intelligent hci: A survey | |
Xu et al. | A novel method for hand posture recognition based on depth information descriptor | |
John et al. | Hand gesture identification using deep learning and artificial neural networks: A review | |
Kajan et al. | Comparison of algorithms for dynamic hand gesture recognition | |
Soroni et al. | Hand Gesture Based Virtual Blackboard Using Webcam | |
CN114332927A (en) | Classroom hand-raising behavior detection method, system, computer equipment and storage medium | |
Dhamanskar et al. | Human computer interaction using hand gestures and voice | |
KR100457928B1 (en) | Hand signal recognition method by subgroup based classification | |
Farouk | Principal component pyramids using image blurring for nonlinearity reduction in hand shape recognition | |
Rahaman et al. | Real-time computer vision-based gestures recognition system for bangla sign language using multiple linguistic features analysis | |
Harini et al. | A novel static and dynamic hand gesture recognition using self organizing map with deep convolutional neural network | |
Reddy et al. | Virtual Mouse using Hand and Eye Gestures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |