Hand gestures can be used for natural and intuitive human-computer interaction. To achieve this goal, computers should be able to visually recognize hand gestures from video input. However, vision-based hand tracking and gesture recognition is an extremely challenging problem due to the complexity of hand gestures, which are rich in diversities due to high degrees of freedom involved by the human hand. On the other hand, computer vision algorithms are notoriously brittle and computation intensive, which make most current gesture recognition systems fragile and inefficient.
This thesis proposes a new architecture to solve the problem of real-time vision-based hand tracking and gesture recognition with the combination of statistical and syntactic analysis. The fundamental idea is to use a divide-and-conquer strategy based on the hierarchical composition property of hand gestures so that the problem can be decoupled into two levels. The low-level of the architecture focuses on hand posture detection and tracking with Haar-like features and the AdaBoost learning algorithm. The Haar-like features can effectively catch the appearance properties of the hand postures. The AdaBoost learning algorithm can significantly speed up the performance and construct an accurate cascade of classifiers by combining a sequence of weak classifiers. To recognize different hand postures, a parallel cascades structure is implemented. This structure achieves real-time performance and high classification accuracy. The 3D position of the hand is recovered according to the camera's perspective projection. To make the system robust against cluttered backgrounds, background subtraction and noise removal are applied.
For the high-level hand gestures recognition, a stochastic context-free grammar (SCFG) is used to analyze the syntactic structure of the hand gestures with the terminal strings converted from the postures detected by the low-level of the architecture. Based on the similarity measurement and the probabilities associated with the production rules, given an input string, the corresponding hand gesture can be identified by looking for the production rule that has the greatest probability to generate this string. For the hand motion analysis, two SCFGs are defined to analyze two structured hand gestures with different trajectory patterns: the rectangle gesture and the diamond gesture. Based on the different probabilities associated with these two grammars, the SCFGs can effectively disambiguate the distorted trajectories and classify them correctly.
An application of gesture-based interaction with a 3D gaming virtual environment is implemented. With this system, the user can navigate the 3D gaming world by driving the avatar car with a set of hand postures. When the user wants to manipulate the virtual objects, he can use a set of hand gestures to select the target traffic sign and open a window to check the information of the correspondent learning object. This application demonstrates the gesture-based interface can achieve an improved interaction, which are more intuitive and flexible for the user.
Cited By
- Lee K and Min K (2015). An interactive image clipping system using hand motion recognition, Information Systems, 48:C, (296-300), Online publication date: 1-Mar-2015.
- Feng Z, Yang B, Li Y, Zheng Y, Zhao X, Yin J and Meng Q (2013). Real-time oriented behavior-driven 3D freehand tracking for direct interaction, Pattern Recognition, 46:2, (590-608), Online publication date: 1-Feb-2013.
A real time vision-based hand gestures recognition system
ISICA'10: Proceedings of the 5th international conference on Advances in computation and intelligenceHand gesture recognition is an important aspect in Human-Computer interaction, and can be used in various applications, such as virtual reality and computer games. In this paper, we propose a real time hand gesture recognition system. It includes three ...
Depth-based hand gesture recognition
In this article, a dynamic gesture recognition system with the depth information is proposed. The proposed system consists of three main components: preprocessing, static posture recognition and dynamic gesture recognition. In the first component, the ...
Real-Time Dynamic Hand Gesture Recognition
IS3C '14: Proceedings of the 2014 International Symposium on Computer, Consumer and ControlA real time dynamic hand gesture recognition system is performed in this paper. The eleven kinds of hand gestures have been dynamic recognized, which represent the number from one to nine. The dynamic images are caught by a dynamic video. We use the ...