US20230333633A1 - Twin pose detection method and system based on interactive indirect inference - Google Patents
Twin pose detection method and system based on interactive indirect inference Download PDFInfo
- Publication number
- US20230333633A1 US20230333633A1 US18/339,186 US202318339186A US2023333633A1 US 20230333633 A1 US20230333633 A1 US 20230333633A1 US 202318339186 A US202318339186 A US 202318339186A US 2023333633 A1 US2023333633 A1 US 2023333633A1
- Authority
- US
- United States
- Prior art keywords
- pose
- mobile phone
- reasoner
- human
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 16
- 238000012216 screening Methods 0.000 claims abstract description 4
- 230000003993 interaction Effects 0.000 claims description 23
- 230000001133 acceleration Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 20
- 230000009471 action Effects 0.000 claims description 19
- 238000005452 bending Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 62
- 238000006073 displacement reaction Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 210000000746 body region Anatomy 0.000 description 6
- 230000003542 behavioural effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000004040 coloring Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 210000001503 joint Anatomy 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000004424 eye movement Effects 0.000 description 2
- 230000002650 habitual effect Effects 0.000 description 2
- 230000004886 head movement Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 210000002478 hand joint Anatomy 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010399 physical interaction Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/0304—Detection arrangements using opto-electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0346—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/033—Recognition of patterns in medical or anatomical images of skeletal patterns
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
Definitions
- This application relates to digital intelligence technology, and more particularly to a twin pose detection method and system based on interactive indirect inference.
- digital twin technology the software empowerment and intelligent empowerment are achieved in the high-end manufacturing, and combined with the sensing device and intelligent device, there is no need to additionally provide collection and wearable devices, and the user can access virtual reality applications with physical interaction anywhere and anytime.
- Sensor and assistive peripherals have long been used to build a bridge between the physical world and the virtual digital for users.
- the improved sensory coherence is what users and the industry have always pursued and worked towards.
- sensor peripherals provided by various hardware manufacturers are not the facilities used by universal consumers such that the virtual reality is difficult to popularize and the experience consistency is low.
- the mobile smart terminals e.g., mobile phones, tablets, smart wears
- the mobile smart terminals have a screen display, network communication and a certain degree of computing power characteristics, which are already popular and versatile.
- Providing a basic and universal method for detecting poses using these universal devices can greatly facilitate the versatility and universality of virtual reality.
- some specific sensor and assistive peripherals are not required, such as HMDs that have wearability and compatibility limitations, handles that occupy hands and have convenience limitations, additional camera image capture that rely on place and camera equipment, and wearable pose positioners that are not available anywhere and anytime, limited in space, specialist and expensive.
- This application provides a method and system for detecting poses of twin based on interactive indirect inference, which enables software empowerment and intelligent empowerment through the digital intelligence technologies (digital twin technology) in high-end manufacturing, and enables ordinary users to access virtual reality applications anytime and anywhere through sensing devices and smart devices without external assistive acquisition and wearable devices.
- Cipheral Patent Publication No. 113610969A (Application No. 202110974124.X) discloses a method for generating a three-dimensional (3D) human body model, including: acquiring to-be-detected images taken from a plurality of perspectives; detecting human body regions contained in the to-be-detected images, and detecting a data set of skeletal key points contained in the human body regions; constructing a fusion affinity matrix between the to-be-detected images by using the human body regions and the data set of skeletal key points; determining a matching relationship between the body regions by using the fusion affinity matrix; performing pose construction based on the matching relationship and the data set of skeletal key points to generate the 3D human body model.
- the method can analyze human pose from various perspectives, extract the data of body regions and skeletal key points from the to-be-detected images, generate a 3D human body model by using the matching relationship between body regions and the data set of skeletal key points, and thus can fully and effectively restore the 3D poses of human body.
- the method relies on sensors and multiple perspectives, which is different from the indirect inference of the method provided by the present disclosure.
- Chinese Patent Publication No. 111311714A (Application No. 202010244300.X) discloses a method and system for pose prediction.
- the method includes: acquiring pose information of a target character in one or more existing frames; and inputting the pose information into a trained pose prediction model to determine predicted pose information of the target character in subsequent frames, where the pose information includes skeletal rotation angle information and gait movement information.
- This method is different from the method of the present disclosure in objectives, and detection techniques.
- Cipheral Patent Publication No. 112132955A (Application No. 202010902457.7) discloses a digital twin construction method for human skeleton.
- data at important locations of the human body is acquired via VR motion capture and sensor technology.
- Key data is obtained through data classification, screening, simplification and calculation via artificial intelligence.
- the spatial orientation and mechanical information of the target skeleton is obtained by solving the key data with human inverse dynamics and biomechanical algorithms. After fusing some of the sensor data with the computational results, simulation is performed on the target skeleton to obtain the biomechanical properties of the target skeleton and predict the biomechanical properties of the target skeleton in unknown poses using various prediction algorithms.
- the performance data is subjected to modelling and rendering to obtain a high-fidelity digital twin of the real skeleton, achieving a faithful twin mapping of the biomechanical properties of the skeleton.
- sensors and external sensing of VR devices are used. Given that these devices per se can greatly get direct sensing data to complete the pose detection, which is entirely different from the interactive indirect inference proposed in the present disclosure. What is more different is that in the present disclosure, inference engines are designed according to different parts to ensure targeted inference detection.
- Chinese Patent Publication No. 110495889A (Application No. 201910599978.7) discloses a method of pose assessment, an electronic device, a computer device and a storage medium.
- the method includes: obtaining to-be-tested images, where the to-be-tested images include a front full body image and a side full body image of a tester standing upright; extracting skeletal key points from the to-be-tested images; calculating, based on the skeletal key points, a pose vector of the tester; and obtaining a bending angle of the pose vector.
- This method relies on sensors and multiple perspectives, which is different from the indirect inference of the present disclosure.
- Chinese Patent Publication No. 113191324A (Application No. 202110565975.9) discloses a method for predicting pedestrian behavioral intention based on multi-task learning, including: constructing a training sample set; constructing a pedestrian behavioral intention prediction model using a base network, a pose detection network and an intention recognition network; and extracting image features to obtain a feature map with a single frame image of the training sample set as an input of the base network.
- An encoder of the pose detection network includes a part intensity field sub-network and a part association field sub-network.
- Pedestrian pose images are acquired by a decoder of the pose detection network according to the joint feature map and the bone feature map, where the feature map is set as an input of both the part intensity field sub-network and the part association field sub-network, a joint feature map and a bone feature map are set as an output of the part intensity field sub-network and the part association field sub-network, respectively.
- the feature map is set as an input of the intension recognition network
- the pedestrian behavioral intention image is set as an output of the intension recognition network.
- the pedestrian behavioral intention prediction model is trained and used to predict the pedestrian behavioral intention. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
- Chinese Patent Publication No. 112884780A (Application No. 202110165636.1) discloses a method and system for estimating poses of human body, including: inputting and training an image in a codec network having a four-layer coding layer and a four-layer decoding layer structure, and outputting a semantic segmentation result; converting a semantic probability map of a pixel obtained in the former two coding layers into an edge activation using an energy function pixel map, where the activation value responding to the pixel is larger than the activation value threshold, and the pixel is an edge pixel; obtaining an instance segmentation result by aggregating pixels belonging to the same instance based on the semantic labels in the semantic segmentation result, where the instance segmentation result includes a mask indicating the instance to which each pixel belongs; generating the human skeletal confidence map using the full convolutional network, and outputting the skeletal component labels to which each pixel belongs in each instance; and regressing locations of nodal points through the fully connected network to create a skeletal structure
- Chinese Patent No. 108830150B (Application No. 201810426144.1) discloses a method and device for three-dimensional (3D) pose estimation of human body, including (S1) acquiring, by a monocular camera, depth images and red, green and blue (RGB) images of the human body at different angles; (S2): constructing a human skeletal key point detection neural network based on the RGB images to obtain a key point-annotated image; (S3) constructing a two-dimensional (2D)-3D mapping network of hand joint nodes; (S4) calibrating the depth image and the key point-annotated image of the human body at the same angle, and performing 3D point cloud coloring transformation on the corresponding depth image to obtain a coloring depth image; (S5) predicting, by predefined learning network, the corresponding position of the annotated human skeleton key points in the depth image based on the key point-annotated image and the coloring depth image; and (S6) combining the outputs of steps (S3) and (S5) to achieve refined estimation of 3D
- Chinese Patent Publication No. 109885163A (Application No. 201910122971.6) discloses a method and system for multiplayer interactive collaboration in virtual reality.
- the system includes a motion acquisition device to acquire skeletal data of a user; a plurality of clients for data modeling to obtain pose data of the user based on the skeletal data and mapping it to initial joint position data of each joint point of the skeletal model, and a server used to bind the initial joint position data of the skeletal model to the scene character of the user and obtain and synchronously transmit the character position data to other scene characters.
- the clients are also used to update the initial joint position data of the scene character, and combine with the model animation of the virtual scene to form the skeletal animation of pose movement.
- Chinese Patent Publication No. 112926550A (Application No. 202110406810.7) discloses a human-computer interaction method based on 3D image pose matching of human and an apparatus thereof.
- the method includes: initializing an interaction machine and storing corresponding a 3D image of a template pose into the interaction machine based on interaction requirements; acquiring a plurality of nodes based on a deep learning method and constructing a 3D skeleton model based on the plurality of nodes; obtaining skeleton information of the current to-be-interacted human and inputting it into the 3D skeleton model to obtain human pose features; calculating a loss function value between the human pose features and the interaction data set; and comparing the loss function value with the set threshold value to determine whether to carry out human-machine interaction.
- this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing
- Chinese Patent Publication No. 110675474A (Application No. 201910758741.9) discloses a learning method of a virtual character model, an electronic device and a readable storage medium.
- the learning method for the virtual character model includes: obtaining a first skeletal pose information corresponding to an action of a target character in a current video image frame; obtaining skeletal pose adjustment information of a virtual character model corresponding to the current video image frame based on the first skeletal pose information and a second skeletal pose information, where the second skeletal pose information is the skeletal pose information of the virtual character model corresponding to a previous video image frame; driving the virtual character model according to the skeletal pose adjustment information for the virtual character model to learn the action of the target character in the current video image frame, so that the learning process between the virtual character model and a person can be simulated to form interactive experiences between a person and a virtual character, such as training, education, and nurturing.
- this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction
- Chinese Patent Publication No. 113158459A (Application No. 202110422431.7) discloses a method for human pose estimation based on fusion of vision and inertial information. Since the human pose estimation method based on 3D vision sensors cannot provide three-degree-of-freedom rotation information, in this method, by using the complementary nature of visual and inertial information, a nonlinear optimization method is used to adaptively fuse vision information, inertial information and human pose priori information to obtain the rotation angle of a skeletal node and the global position of a root skeletal node at each moment, and complete real-time estimation for poses of the human body. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
- Chinese Patent No. 108876815B (application number: 201810403604.9) discloses a skeletal pose calculation method, a virtual character model driving method, and a storage medium, where the skeletal pose calculation method is a key step of the virtual character model driving method, which includes an iterative calculation process for skeletal poses based on inverse kinematics. Based on inverse derivation, the joint angle change of the middle joint of the human skeletal chain is calculated based on the change of pose information of the limb, so that the joint angle of each joint is close to the optimal value after each iteration, effectively ensuring the smooth gradation effect when simulating the limb action and thus meeting the application requirements of realistic simulation of limb action.
- the rotation and movement of the intelligent device are visible, while the invisible twin human body is intended to be used as a constraint and mapping to solve the logical intermediate link that is transformed into the indirect inference of the pose setting.
- the mapping relationship between the twin human body and the virtual human part is a visualization constraint.
- An object of the present disclosure is to provide a method and system for twin pose detection based on interactive indirect inference to overcome the deficiencies in the prior art.
- this application provides a twin pose detection method based on interactive indirect inference, comprising:
- the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining reference information about program operation of the mobile phone;
- step (S3) the predetermined body mechanics constraint is performed through steps of:
- step (S3) is performed by a step of:
- this application provides a twin pose detection system based on interactive indirect inference, comprising:
- the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone;
- the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
- the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements
- the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
- FIG. 1 is a flow chart of a twin pose detection method based on interactive indirect inference according to an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of rotation of a mobile intelligent terminal under normal gravitational gravity, where a limb behavior is temporary skeletal extension;
- FIG. 3 schematically shows an inference of a pose of human skeleton according to an embodiment of the present disclosure
- FIG. 4 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
- FIG. 5 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
- FIG. 6 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
- FIG. 7 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
- FIG. 8 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure.
- FIG. 9 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure.
- sensors are used to obtain an accurate correspondence of judgement through direct acquisition, for example, the speed sensor is used to obtain speed.
- a set of relevant sensing information is used for indirect reasoning.
- the indirect reasoning acquires the most possible information about time space and pose through the basic sensor and avoids the use of additional and unreachable sensing equipment.
- the rotational change of the device as shown in FIG. 2 is not just its own rotation in the full 720° three-dimensional (3D) space, the change in pose of the device is caused by the skeletal linkage of the user.
- this 3D spatial rotation is mapped into several twin body states of the twin being used.
- this application provides a twin pose detection method based on interactive indirect inference, which includes the following steps.
- the plurality of sensors on the mobile phone include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone (e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker).
- a nine-axis gyroscope an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor
- a sensor capable of obtaining program operation reference information of the mobile phone e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker.
- the data set obtained by the plurality of sensors on the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
- the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
- the hand poses include a lift of the mobile phone by the left hand, a lift of the mobile phone by the right hand, and a lift of the mobile phone by both the left hand and the right hand.
- the arm poses include a raised-arm pose and a dropped-arm pose.
- the torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, a facing-upward lying pose, and a facing-downward lying pose.
- the head poses include a looking-straight ahead pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
- the leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
- the preferred ways for the mobile phone include the way of using the left hand as a usual hand and the way of using the right hand as the usual hand.
- the plurality of reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the current lifting left hand, right hand or both hands is reasoned by the corresponding preferred reasoner for left-hand motion and the right-handed part reasoner.
- the reasoning process is performed as follows. Based on the information collected by the sensors on the mobile phone, including: there is no horizontal and vertical displacement; as detected by the gyroscope, an angle between the screen display plane and a vertical plane of the ground is ⁇ 15°; the mobile phone is not at sea or on an aircraft according to the location information; and the mobile phone is continuously on the lighted screen without being touched, and there is a certain continuous vibration, it can be reasoned by the left-handed reasoner and the right-handed reasoner that the currently-raised hand is the non-dominant hand, and otherwise, the currently-raised hand is the dominant hand.
- the threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands.
- the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise.
- the hand reasoner When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone.
- the hand reasoner When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is raised.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
- the reasoning process includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105° (that is, the user is most likely looking down at the screen), it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, then tit is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion.
- the initial pose of the head is a straight sight.
- the head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left.
- the head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
- the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
- the predetermined human mechanics constraint is performed through the following steps.
- the human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending of the human body, coherence and connection of physiological structures of the human body and joint bending.
- step (S5) is performed by the following step.
- the plurality of initial virtual skeletons are subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
- the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
- This application also provides a twin pose detection system based on interactive indirect inference, which includes a module M 1 , a module M 2 , a module M 3 , and a module M 4 .
- the module M 1 is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; where the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
- the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
- the sensors of the mobile phone include the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone.
- the data set by the sensors of the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
- the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
- the hand poses include a raise of the mobile phone by the left hand, a raise of the mobile phone by the right hand, and a raise of the mobile phone by both the left hand and the right hand.
- the arm poses include a raised-arm pose and a dropped-arm pose.
- the torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose, and a downward facing lying pose.
- the head poses include a looking-straight pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
- the leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
- the module M 2 is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types.
- the preferred way of using the mobile phone include using the left hand as a dominant hand and using the right hand as a dominant hand.
- the reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- the module M 3 is configured to train the overall skeletal reasoner during the human-computer interaction, and obtain the corresponding probabilities by the overall skeletal reasoner based on the input overall skeletal pose.
- the module M 4 is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the currently raised is left hand, right hand or both hands is reasoned by the corresponding reasoner for left-hand motion and the reasoner for right-hand motion.
- the reasoning process includes the following steps. There is no horizontal and vertical displacement.
- the gyroscope detects that the angle of the screen display plane on the vertical ground is ⁇ 15°.
- the mobile phone is not at sea or in an aircraft from the positioning information.
- the mobile phone is continuously on the lighted screen and not in touch, and there is a continuous amount of vibration during the process.
- the threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands.
- the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise.
- the hand reasoner When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone.
- the hand reasoner When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is lifted.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone experiences certain continuous vibration, the pressure altitude of the mobile phone relative to the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
- the reasoning includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105°, and the user is looking down at the screen, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
- the head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion.
- the initial head pose is a straight sight.
- the head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left.
- the head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
- the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
- the leg reasoner According to the pressure altitude of the mobile phone from the ground, to reason a standing pose. If the displacement changes continuous without vibration, the travelling pose is reasoned by the leg reasoner. When the mobile phone is shaking and vibrating, and the displacement velocity is within the walking speed range, then the walking pose is inferred by the leg reasoner.
- the module M 5 is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons.
- the module M 6 is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
- the predetermined body mechanics constraint is performed by a module M 6 . 1 , a module M 6 . 2 , and a module M 6 . 3 .
- the module M 6 . 1 is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements.
- the module M 6 . 2 is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images.
- the origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e., the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
- the module M 6 . 3 is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data.
- the human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending, coherence and connection of physiological structures and joint bending.
- the module M 5 is configured to allow the plurality of initial virtual skeletons to be subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
- the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
- system, device and individual modules provided herein can be implemented in purely computer-readable program code. Besides, it is possible to logically program the method steps such that the system, device and individual modules provided herein can be implemented in the form of logic gates, switches, special integrated circuits, programmable logic controllers and embedded microcontrollers. Therefore, the system, device and the individual modules provided herein can be considered as a hardware component and the modules included therein for implementing the various programs can be considered as structures within the hardware component.
- the modules for implementing the various functions can also be considered as structures that can be both software programs for implementing the method and structures within the hardware component.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Environmental & Geological Engineering (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A method and system for twin pose detection based on interactive indirect inference. The method includes: acquiring, by sensors on a mobile phone, a data set in real time; and obtaining poses of individual parts of a skeleton of an object by reasoning using reasoners on individual parts based on the data set and a preferred way of the object for using the mobile phone; merging the poses to generate multiple initial virtual skeletons; under a predetermined human mechanics constraint, obtaining multiple overall virtual skeletons satisfying the predetermined human mechanics constraint from the initial virtual skeletons; and screening a predetermined number of overall virtual skeletons from the overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner.
Description
- This application claims the benefit of priority from Chinese Patent Application No. 202210715227.9, filed on Jun. 23, 2022. The content of the aforementioned application, including any intervening amendments thereto, is incorporated herein by reference in its entirety.
- This application relates to digital intelligence technology, and more particularly to a twin pose detection method and system based on interactive indirect inference. By means of the digital twin technology, the software empowerment and intelligent empowerment are achieved in the high-end manufacturing, and combined with the sensing device and intelligent device, there is no need to additionally provide collection and wearable devices, and the user can access virtual reality applications with physical interaction anywhere and anytime.
- Sensor and assistive peripherals (such as specific smart wearable products and head-mounted displays (HMDs)) have long been used to build a bridge between the physical world and the virtual digital for users. The improved sensory coherence is what users and the industry have always pursued and worked towards. However, the experience limitations of current sensor peripherals and the compatibility problems between devices exist. For example, sensor peripherals provided by various hardware manufacturers are not the facilities used by universal consumers such that the virtual reality is difficult to popularize and the experience consistency is low.
- The mobile smart terminals (e.g., mobile phones, tablets, smart wears) have a screen display, network communication and a certain degree of computing power characteristics, which are already popular and versatile. Providing a basic and universal method for detecting poses using these universal devices can greatly facilitate the versatility and universality of virtual reality. In this way, some specific sensor and assistive peripherals are not required, such as HMDs that have wearability and compatibility limitations, handles that occupy hands and have convenience limitations, additional camera image capture that rely on place and camera equipment, and wearable pose positioners that are not available anywhere and anytime, limited in space, specialist and expensive.
- The above objectively depicts the current situation of the popularity and real-time nature of virtual reality and introduces a basic method for pose detection of twin that can be built through mobile smart terminals. This application provides a method and system for detecting poses of twin based on interactive indirect inference, which enables software empowerment and intelligent empowerment through the digital intelligence technologies (digital twin technology) in high-end manufacturing, and enables ordinary users to access virtual reality applications anytime and anywhere through sensing devices and smart devices without external assistive acquisition and wearable devices.
- Chinese Patent Publication No. 113610969A (Application No. 202110974124.X) discloses a method for generating a three-dimensional (3D) human body model, including: acquiring to-be-detected images taken from a plurality of perspectives; detecting human body regions contained in the to-be-detected images, and detecting a data set of skeletal key points contained in the human body regions; constructing a fusion affinity matrix between the to-be-detected images by using the human body regions and the data set of skeletal key points; determining a matching relationship between the body regions by using the fusion affinity matrix; performing pose construction based on the matching relationship and the data set of skeletal key points to generate the 3D human body model. The method can analyze human pose from various perspectives, extract the data of body regions and skeletal key points from the to-be-detected images, generate a 3D human body model by using the matching relationship between body regions and the data set of skeletal key points, and thus can fully and effectively restore the 3D poses of human body. The method relies on sensors and multiple perspectives, which is different from the indirect inference of the method provided by the present disclosure.
- Chinese Patent Publication No. 111311714A (Application No. 202010244300.X) discloses a method and system for pose prediction. The method includes: acquiring pose information of a target character in one or more existing frames; and inputting the pose information into a trained pose prediction model to determine predicted pose information of the target character in subsequent frames, where the pose information includes skeletal rotation angle information and gait movement information. This method is different from the method of the present disclosure in objectives, and detection techniques.
- Chinese Patent Publication No. 112132955A (Application No. 202010902457.7) discloses a digital twin construction method for human skeleton. In this method, data at important locations of the human body is acquired via VR motion capture and sensor technology. Key data is obtained through data classification, screening, simplification and calculation via artificial intelligence. The spatial orientation and mechanical information of the target skeleton is obtained by solving the key data with human inverse dynamics and biomechanical algorithms. After fusing some of the sensor data with the computational results, simulation is performed on the target skeleton to obtain the biomechanical properties of the target skeleton and predict the biomechanical properties of the target skeleton in unknown poses using various prediction algorithms. Finally, the performance data is subjected to modelling and rendering to obtain a high-fidelity digital twin of the real skeleton, achieving a faithful twin mapping of the biomechanical properties of the skeleton. In this disclosure, sensors and external sensing of VR devices are used. Given that these devices per se can greatly get direct sensing data to complete the pose detection, which is entirely different from the interactive indirect inference proposed in the present disclosure. What is more different is that in the present disclosure, inference engines are designed according to different parts to ensure targeted inference detection.
- Chinese Patent Publication No. 110495889A (Application No. 201910599978.7) discloses a method of pose assessment, an electronic device, a computer device and a storage medium. The method includes: obtaining to-be-tested images, where the to-be-tested images include a front full body image and a side full body image of a tester standing upright; extracting skeletal key points from the to-be-tested images; calculating, based on the skeletal key points, a pose vector of the tester; and obtaining a bending angle of the pose vector. This method relies on sensors and multiple perspectives, which is different from the indirect inference of the present disclosure.
- Chinese Patent Publication No. 113191324A (Application No. 202110565975.9) discloses a method for predicting pedestrian behavioral intention based on multi-task learning, including: constructing a training sample set; constructing a pedestrian behavioral intention prediction model using a base network, a pose detection network and an intention recognition network; and extracting image features to obtain a feature map with a single frame image of the training sample set as an input of the base network. An encoder of the pose detection network includes a part intensity field sub-network and a part association field sub-network. Pedestrian pose images are acquired by a decoder of the pose detection network according to the joint feature map and the bone feature map, where the feature map is set as an input of both the part intensity field sub-network and the part association field sub-network, a joint feature map and a bone feature map are set as an output of the part intensity field sub-network and the part association field sub-network, respectively. The feature map is set as an input of the intension recognition network, and the pedestrian behavioral intention image is set as an output of the intension recognition network. The pedestrian behavioral intention prediction model is trained and used to predict the pedestrian behavioral intention. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
- Chinese Patent Publication No. 112884780A (Application No. 202110165636.1) discloses a method and system for estimating poses of human body, including: inputting and training an image in a codec network having a four-layer coding layer and a four-layer decoding layer structure, and outputting a semantic segmentation result; converting a semantic probability map of a pixel obtained in the former two coding layers into an edge activation using an energy function pixel map, where the activation value responding to the pixel is larger than the activation value threshold, and the pixel is an edge pixel; obtaining an instance segmentation result by aggregating pixels belonging to the same instance based on the semantic labels in the semantic segmentation result, where the instance segmentation result includes a mask indicating the instance to which each pixel belongs; generating the human skeletal confidence map using the full convolutional network, and outputting the skeletal component labels to which each pixel belongs in each instance; and regressing locations of nodal points through the fully connected network to create a skeletal structure of the human body in each instance to obtain human pose information. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
- Chinese Patent No. 108830150B (Application No. 201810426144.1) discloses a method and device for three-dimensional (3D) pose estimation of human body, including (S1) acquiring, by a monocular camera, depth images and red, green and blue (RGB) images of the human body at different angles; (S2): constructing a human skeletal key point detection neural network based on the RGB images to obtain a key point-annotated image; (S3) constructing a two-dimensional (2D)-3D mapping network of hand joint nodes; (S4) calibrating the depth image and the key point-annotated image of the human body at the same angle, and performing 3D point cloud coloring transformation on the corresponding depth image to obtain a coloring depth image; (S5) predicting, by predefined learning network, the corresponding position of the annotated human skeleton key points in the depth image based on the key point-annotated image and the coloring depth image; and (S6) combining the outputs of steps (S3) and (S5) to achieve refined estimation of 3D pose estimation of the human body. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
- Chinese Patent Publication No. 109885163A (Application No. 201910122971.6) discloses a method and system for multiplayer interactive collaboration in virtual reality. The system includes a motion acquisition device to acquire skeletal data of a user; a plurality of clients for data modeling to obtain pose data of the user based on the skeletal data and mapping it to initial joint position data of each joint point of the skeletal model, and a server used to bind the initial joint position data of the skeletal model to the scene character of the user and obtain and synchronously transmit the character position data to other scene characters. The clients are also used to update the initial joint position data of the scene character, and combine with the model animation of the virtual scene to form the skeletal animation of pose movement. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
- Chinese Patent Publication No. 112926550A (Application No. 202110406810.7) discloses a human-computer interaction method based on 3D image pose matching of human and an apparatus thereof. The method includes: initializing an interaction machine and storing corresponding a 3D image of a template pose into the interaction machine based on interaction requirements; acquiring a plurality of nodes based on a deep learning method and constructing a 3D skeleton model based on the plurality of nodes; obtaining skeleton information of the current to-be-interacted human and inputting it into the 3D skeleton model to obtain human pose features; calculating a loss function value between the human pose features and the interaction data set; and comparing the loss function value with the set threshold value to determine whether to carry out human-machine interaction. By using this method and apparatus, the use experience of human-machine interaction function can be improved. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
- Chinese Patent Publication No. 110675474A (Application No. 201910758741.9) discloses a learning method of a virtual character model, an electronic device and a readable storage medium. The learning method for the virtual character model includes: obtaining a first skeletal pose information corresponding to an action of a target character in a current video image frame; obtaining skeletal pose adjustment information of a virtual character model corresponding to the current video image frame based on the first skeletal pose information and a second skeletal pose information, where the second skeletal pose information is the skeletal pose information of the virtual character model corresponding to a previous video image frame; driving the virtual character model according to the skeletal pose adjustment information for the virtual character model to learn the action of the target character in the current video image frame, so that the learning process between the virtual character model and a person can be simulated to form interactive experiences between a person and a virtual character, such as training, education, and nurturing. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
- Chinese Patent Publication No. 113158459A (Application No. 202110422431.7) discloses a method for human pose estimation based on fusion of vision and inertial information. Since the human pose estimation method based on 3D vision sensors cannot provide three-degree-of-freedom rotation information, in this method, by using the complementary nature of visual and inertial information, a nonlinear optimization method is used to adaptively fuse vision information, inertial information and human pose priori information to obtain the rotation angle of a skeletal node and the global position of a root skeletal node at each moment, and complete real-time estimation for poses of the human body. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
- Chinese Patent No. 108876815B (application number: 201810403604.9) discloses a skeletal pose calculation method, a virtual character model driving method, and a storage medium, where the skeletal pose calculation method is a key step of the virtual character model driving method, which includes an iterative calculation process for skeletal poses based on inverse kinematics. Based on inverse derivation, the joint angle change of the middle joint of the human skeletal chain is calculated based on the change of pose information of the limb, so that the joint angle of each joint is close to the optimal value after each iteration, effectively ensuring the smooth gradation effect when simulating the limb action and thus meeting the application requirements of realistic simulation of limb action. In addition, multiple judgment mechanisms are adopted in the iterative calculation process, which can update the change in the angle of each joint and in the pose information of the limb in time for the next iteration, simplifying the judgment process and ensuring the effectiveness of the iterative cycle, facilitating the calculation speed of the system while ensuring the correct calculation results, and enhancing the real-time nature of the limb movement capture process. Although this method involves pose prediction and action generation, it is different from the indirect inference method provided in the present disclosure in generation modes of actions, data sources, and sources of dependence of landing points. Moreover, this publication actually relates to the optimization method for fast skeleton calculation, which aims to improve the continuity smooth of the animation, and the virtual character is also presented as the landing point. By contrast, in the present disclosure, the rotation and movement of the intelligent device are visible, while the invisible twin human body is intended to be used as a constraint and mapping to solve the logical intermediate link that is transformed into the indirect inference of the pose setting. Besides, when the final output is used on applications, it can be used in the virtual character in a rendering form or used for application simulation demonstration. Hence, the mapping relationship between the twin human body and the virtual human part is a visualization constraint.
- An object of the present disclosure is to provide a method and system for twin pose detection based on interactive indirect inference to overcome the deficiencies in the prior art.
- In a first aspect, this application provides a twin pose detection method based on interactive indirect inference, comprising:
-
- (S1) acquiring, by a plurality of sensors on a mobile phone, a data set in real time;
- and obtaining poses of individual parts of a skeleton of an object by reasoning using a plurality of reasoners on individual parts of the skeleton based on the data set and a preferred way of the object for using the mobile phone;
- (S2) merging the poses obtained in step (S1) to generate a plurality of initial virtual skeletons;
- (S3) under a predetermined human mechanics constraint, obtaining a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
- (S4) screening a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner.
- In some embodiments, the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- In some embodiments, the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining reference information about program operation of the mobile phone;
-
- the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
- the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
- the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
- the infrared distance sensor is configured to acquire an altitude of the mobile phone;
- the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
- the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
- In some embodiments, in step (S3), the predetermined body mechanics constraint is performed through steps of:
-
- (S3.1) acquiring a predetermined number of unconventional human pose images that meet predetermined requirements;
- (S3.2) extracting three-dimensional (3D) coordinates and pose data of virtual skeletal location points in Euclidean space based on the predetermined number of unconventional human pose images; and
- (S3.3) correcting constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data obtained in step (S3.2);
- wherein the human natural engineering mechanics is in accordance with natural category of physiological movements, and comprises physiological bending of the human body, coherence and connection of physiological structures of the human body and bending of joints of the human body.
- In some embodiments, step (S3) is performed by a step of:
-
- subjecting the plurality of initial virtual skeletons to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
- In a second aspect, this application provides a twin pose detection system based on interactive indirect inference, comprising:
-
- a first module;
- a second module;
- a third module; and
- a fourth module;
- wherein the first module is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone;
- the second module is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons;
- the third module is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
- the fourth module is configured to screen a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and perform reasoning on the predetermined number of overall virtual skeletons by using an overall skeleton reasoner to obtain a dynamic twin virtual pose in real time;
- the first module comprises a first submodule and a second submodule;
- wherein the first submodule is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
- the second submodule is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types; wherein the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
- In some embodiments, the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- In some embodiments, the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone;
-
- the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
- the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
- the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
- the infrared distance sensor is configured to acquire an altitude of the mobile phone;
- the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
- the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
- In some embodiments, the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
- the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements;
-
- the sixth module is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images; and the seventh module is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data;
- wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
- In some embodiments, the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
- Compared with the prior art, the beneficial effects of the present disclosure are described below.
-
- (1) By using the method provided herein, the change process of the poses of the human body can be reasoned indirectly through the intrinsic sensors of the smart mobile device by combining with the pose inertia of the user and relying on the body of relation between twin human joints and the device.
- (2) In this application, additional helmets, handles, external fit sensors and independent external cameras are not required for pose detection and generation. In contrast, the sensors on the mobile smart device, such as gyroscope, acceleration, level, geomagnetic, and touch screen sliding, are used to directly generate virtual poses based on the relative spatial relationship of the interaction used by the user, and then the physical poses are indirectly detected.
- (3) Based on the data obtained from the intrinsic sensors of the mobile phone, reasoning is performed by using the reasoner for individual parts of the human skeleton, combined with human mechanics constraints and the overall skeletal reasoner such that the accuracy of reasoning is improved.
- (4) correcting the constraint tolerance of the natural engineering mechanics of the human body based on unconventional pose, thus improving the accuracy of inference;
- (5) The corresponding preferred reasoner is trained by selecting the corresponding data set based on personal preferred mobile phone, which improves the accuracy of reasoning;
- (6) The virtual pose detection results are transformed into a physique virtual skeleton which is then provided for use in ecological applications.
- Other features, objects and advantages of the present disclosure will be more apparent according to the detailed description of non-limiting embodiments made with reference to the following accompanying drawings.
-
FIG. 1 is a flow chart of a twin pose detection method based on interactive indirect inference according to an embodiment of the present disclosure; -
FIG. 2 is a schematic diagram of rotation of a mobile intelligent terminal under normal gravitational gravity, where a limb behavior is temporary skeletal extension; -
FIG. 3 schematically shows an inference of a pose of human skeleton according to an embodiment of the present disclosure; -
FIG. 4 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; -
FIG. 5 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; -
FIG. 6 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; -
FIG. 7 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; -
FIG. 8 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; and -
FIG. 9 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure. - The present disclosure is described in detail below with reference to specific embodiments. The following embodiments can help those of skill in the art to further understand the present disclosure, but are not intended to limit the present disclosure in any way. It should be noted that to one of ordinary skill in the art can make several variations and improvements without departing from the conception of the present disclosure, and these variations and improvements shall fall within the scope of protection of the present disclosure.
- In the prior art, sensors are used to obtain an accurate correspondence of judgement through direct acquisition, for example, the speed sensor is used to obtain speed. In the present disclosure, a set of relevant sensing information is used for indirect reasoning. As there is a great repetition and indirect generation of sensing due to human behavior and use, the obtained basic sensing is different due to different time space and poses. The indirect reasoning acquires the most possible information about time space and pose through the basic sensor and avoids the use of additional and unreachable sensing equipment.
- The rotational change of the device as shown in
FIG. 2 is not just its own rotation in the full 720° three-dimensional (3D) space, the change in pose of the device is caused by the skeletal linkage of the user. By using the method and system for twin pose detection based on interactive indirect inference provided in the present disclosure, this 3D spatial rotation is mapped into several twin body states of the twin being used. - Referring to
FIGS. 1-9 , this application provides a twin pose detection method based on interactive indirect inference, which includes the following steps. -
- (S1) During human-computer interaction, a training set is constructed based on the data set obtained by a plurality of sensors on a mobile phone and poses of individual parts of a skeleton.
- Specifically, the plurality of sensors on the mobile phone include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone (e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker).
- The data set obtained by the plurality of sensors on the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
- Specifically, the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
- The hand poses include a lift of the mobile phone by the left hand, a lift of the mobile phone by the right hand, and a lift of the mobile phone by both the left hand and the right hand.
- The arm poses include a raised-arm pose and a dropped-arm pose.
- The torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, a facing-upward lying pose, and a facing-downward lying pose.
- The head poses include a looking-straight ahead pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
- The leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
-
- (S2) The training set is labeled and classified based on the preferred ways of the mobile phone, and a plurality of reasoners for individual pars of human skeleton are trained based on training sets in different multi-modal types.
- Specifically, the preferred ways for the mobile phone include the way of using the left hand as a usual hand and the way of using the right hand as the usual hand.
- The plurality of reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
-
- (S3) During human-computer interaction, an overall skeletal reasoner is trained, and the corresponding weights are obtained by the overall skeletal reasoner based on input overall skeletal poses.
- (S4) The data set is acquired by the plurality of sensors on the mobile phone in real-time. The reasoning is performed by the plurality of reasoners for individual parts of human skeleton based on preferred way of the object for using the mobile phone and the data set to obtain preferred poses of the individual parts of the human skeleton;
- In an embodiment, as shown in
FIGS. 3 and 4 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the current lifting left hand, right hand or both hands is reasoned by the corresponding preferred reasoner for left-hand motion and the right-handed part reasoner. - Specifically, the reasoning process is performed as follows. Based on the information collected by the sensors on the mobile phone, including: there is no horizontal and vertical displacement; as detected by the gyroscope, an angle between the screen display plane and a vertical plane of the ground is ±15°; the mobile phone is not at sea or on an aircraft according to the location information; and the mobile phone is continuously on the lighted screen without being touched, and there is a certain continuous vibration, it can be reasoned by the left-handed reasoner and the right-handed reasoner that the currently-raised hand is the non-dominant hand, and otherwise, the currently-raised hand is the dominant hand. The threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands. When the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise. When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone. When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is raised.
- In an embodiment, as shown in
FIGS. 4 and 5 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion. - Specifically, the reasoning process includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105° (that is, the user is most likely looking down at the screen), it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, then tit is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
- In an embodiment, as shown in
FIGS. 5, 6 and 8 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose. - Specifically, the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- Specifically, as shown in
FIG. 9 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. The head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion. The initial pose of the head is a straight sight. The head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left. The head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left. - Specifically, as shown in
FIG. 7 , the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner. - More specifically, according to the pressure altitude of the mobile phone with respect to the ground, it is reasoned that the user is in a standing pose. In the case of continuous displacement change and no shaking and vibration, it is reasoned by the reasoner for leg motion that the user is in a travelling pose. When the mobile phone is under shaking and vibration, and the displacement velocity is within the walking speed range, it is inferred by the reasoner for leg motion that the user is in a walking pose.
-
- (S5) The preferred poses of individual parts of the human skeleton are merged to generate a plurality of initial virtual skeletons.
- (S6) The plurality of initial virtual skeletons are subjected to a predetermined human mechanics constraint to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
- Specifically, the predetermined human mechanics constraint is performed through the following steps.
-
- (S6.1) A predetermined number of unconventional human pose images that meet predetermined requirements are acquired.
- (S6.2) 3D coordinates and pose data of virtual skeletal location points in Euclidean space are extracted based on the predetermined number of unconventional human pose images, where the origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e. the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
- (S6.3) Constraint tolerance of human natural engineering mechanics is corrected based on the 3D coordinates and pose data obtained in step (S6.2);
- The human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending of the human body, coherence and connection of physiological structures of the human body and joint bending.
- More specifically, step (S5) is performed by the following step. The plurality of initial virtual skeletons are subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
-
- (S7) The screened overall virtual skeletons are reasoned with the overall skeletal reasoner to obtain a dynamic twin virtual pose in real time. The dynamic twin virtual pose is presented in the form of skeleton animation time-series collection.
- Specifically, the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
- This application also provides a twin pose detection system based on interactive indirect inference, which includes a module M1, a module M2, a module M3, and a module M4.
- The module M1 is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; where the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
- Specifically, the sensors of the mobile phone include the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone.
- The data set by the sensors of the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
- Specifically, the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
- The hand poses include a raise of the mobile phone by the left hand, a raise of the mobile phone by the right hand, and a raise of the mobile phone by both the left hand and the right hand.
- The arm poses include a raised-arm pose and a dropped-arm pose.
- The torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose, and a downward facing lying pose.
- The head poses include a looking-straight pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
- The leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
- The module M2 is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types.
- Specifically, the preferred way of using the mobile phone include using the left hand as a dominant hand and using the right hand as a dominant hand.
- The reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
- The module M3 is configured to train the overall skeletal reasoner during the human-computer interaction, and obtain the corresponding probabilities by the overall skeletal reasoner based on the input overall skeletal pose.
- The module M4 is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone.
- In an embodiment, as shown in
FIGS. 3 and 4 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the currently raised is left hand, right hand or both hands is reasoned by the corresponding reasoner for left-hand motion and the reasoner for right-hand motion. - Specifically, the reasoning process includes the following steps. There is no horizontal and vertical displacement. The gyroscope detects that the angle of the screen display plane on the vertical ground is ±15°. The mobile phone is not at sea or in an aircraft from the positioning information. The mobile phone is continuously on the lighted screen and not in touch, and there is a continuous amount of vibration during the process. Then the reasoner for left-hand motion and the reasoner for right-hand motion reason that the currently lifted hand is the infrequently used hand, and otherwise, the currently lifted hand is the frequently used hand. The threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands. When the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise. When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone. When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is lifted.
- In an embodiment, as shown in
FIGS. 4 and 5 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone experiences certain continuous vibration, the pressure altitude of the mobile phone relative to the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion. - Specifically, the reasoning includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105°, and the user is looking down at the screen, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
- In an embodiment, as shown in
FIGS. 5, 6 and 8 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose. - Specifically, the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
- Specifically, as shown in
FIG. 9 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. The head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion. The initial head pose is a straight sight. The head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left. The head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left. - Specifically, as shown in
FIG. 7 , the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner. - More specifically, according to the pressure altitude of the mobile phone from the ground, to reason a standing pose. If the displacement changes continuous without vibration, the travelling pose is reasoned by the leg reasoner. When the mobile phone is shaking and vibrating, and the displacement velocity is within the walking speed range, then the walking pose is inferred by the leg reasoner.
- The module M5 is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons.
- The module M6 is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
- The predetermined body mechanics constraint is performed by a module M6.1, a module M6.2, and a module M6.3.
- The module M6.1 is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements.
- The module M6.2 is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images. The origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e., the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
- The module M6.3 is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data.
- The human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending, coherence and connection of physiological structures and joint bending.
- More specifically, the module M5 is configured to allow the plurality of initial virtual skeletons to be subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
-
- (S7) The screened overall virtual skeletons are reasoned with the overall skeletal reasoner to obtain a dynamic twin virtual pose in real time.
- Specifically, the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
- It is known to those skilled in the art that, the system, device and individual modules provided herein can be implemented in purely computer-readable program code. Besides, it is possible to logically program the method steps such that the system, device and individual modules provided herein can be implemented in the form of logic gates, switches, special integrated circuits, programmable logic controllers and embedded microcontrollers. Therefore, the system, device and the individual modules provided herein can be considered as a hardware component and the modules included therein for implementing the various programs can be considered as structures within the hardware component. The modules for implementing the various functions can also be considered as structures that can be both software programs for implementing the method and structures within the hardware component.
- Described above are specific embodiments of the present disclosure. It should be understood that the disclosure is not limited to the particular embodiments described above, and various variations or modifications made by a person skilled in the art without departing from the spirit and scope of the disclosure shall fall within the scope of the disclosure defined by the appended claims. The embodiments of the present application and the features therein may be combined with each other in any way without contradiction.
Claims (10)
1. A twin pose detection method based on interactive indirect inference, comprising:
(S1) acquiring, by a plurality of sensors on a mobile phone, a data set in real time;
and obtaining poses of individual parts of a skeleton of an object by reasoning using a plurality of reasoners on individual parts of the skeleton based on the data set and a preferred way of the object for using the mobile phone;
(S2) merging the poses obtained in step (S1) to generate a plurality of initial virtual skeletons;
(S3) under a predetermined human mechanics constraint, obtaining a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
(S4) screening a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner;
wherein step (S1) comprises:
(S1.1) during human-computer interaction, constructing a training set based on the data set and the poses of individual parts of the skeleton; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
(S1.2) labeling and classifying the training set based on the preferred way; training the plurality of reasoners based on training sets in different multi-modal types; wherein
the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
2. The twin pose detection method of claim 1 , wherein the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a reasoner for left-arm motion, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
3. The twin pose detection method of claim 1 , wherein the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
the infrared distance sensor is configured to acquire an altitude of the mobile phone;
the touch sensor is configured to acquire status information about whether a screen of the mobile phone is clicked; and
the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
4. The twin pose detection method of claim 1 , wherein in step (S3), the predetermined human mechanics constraint is performed through steps of:
(S3.1) acquiring a predetermined number of unconventional human pose images that meet predetermined requirements;
(S3.2) extracting three-dimensional (3D) coordinates and pose data of virtual skeletal location points in Euclidean space based on the predetermined number of unconventional human pose images; and
(S3.3) correcting constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data obtained in step (S3.2);
wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
5. The twin pose detection method of claim 4 , wherein step (S3) is performed by a step of:
under the predetermined human mechanics constraint, subjecting the plurality of initial virtual skeletons to normalization to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
6. A twin pose detection system based on interactive indirect inference, comprising:
a first module;
a second module;
a third module; and
a fourth module;
wherein the first module is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone;
the second module is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons;
the third module is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
the fourth module is configured to screen a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and perform reasoning on the predetermined number of overall virtual skeletons by using an overall skeleton reasoner to obtain a dynamic twin virtual pose in real time;
the first module comprises a first submodule and a second submodule;
wherein the first submodule is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
the second submodule is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types; wherein the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
7. The twin pose detection system of claim 6 , wherein the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
8. The twin pose detection system of claim 6 , wherein the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
the infrared distance sensor is configured to acquire an altitude of the mobile phone;
the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state of the mobile phone.
9. The twin pose detection system of claim 6 , wherein the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements;
the sixth module is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images; and
the seventh module is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data;
wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
10. The twin pose detection system of claim 6 , wherein the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210715227.9A CN114821006B (en) | 2022-06-23 | 2022-06-23 | Twin state detection method and system based on interactive indirect reasoning |
CN202210715227.9 | 2022-06-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230333633A1 true US20230333633A1 (en) | 2023-10-19 |
US11809616B1 US11809616B1 (en) | 2023-11-07 |
Family
ID=82522065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/339,186 Active US11809616B1 (en) | 2022-06-23 | 2023-06-21 | Twin pose detection method and system based on interactive indirect inference |
Country Status (2)
Country | Link |
---|---|
US (1) | US11809616B1 (en) |
CN (1) | CN114821006B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116617663A (en) * | 2022-02-08 | 2023-08-22 | 腾讯科技(深圳)有限公司 | Action instruction generation method and device, storage medium and electronic equipment |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115590483B (en) * | 2022-10-12 | 2023-06-30 | 深圳市联代科技有限公司 | Smart phone with health measurement system |
CN117441980B (en) * | 2023-12-20 | 2024-03-22 | 武汉纺织大学 | Intelligent helmet system and method based on intelligent computation of multi-sensor information |
Citations (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100197390A1 (en) * | 2009-01-30 | 2010-08-05 | Microsoft Corporation | Pose tracking pipeline |
US20110085705A1 (en) * | 2009-05-01 | 2011-04-14 | Microsoft Corporation | Detection of body and props |
US20110304557A1 (en) * | 2010-06-09 | 2011-12-15 | Microsoft Corporation | Indirect User Interaction with Desktop using Touch-Sensitive Control Surface |
US20120225719A1 (en) * | 2011-03-04 | 2012-09-06 | Mirosoft Corporation | Gesture Detection and Recognition |
US20130028517A1 (en) * | 2011-07-27 | 2013-01-31 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium detecting object pose |
US20130069931A1 (en) * | 2011-09-15 | 2013-03-21 | Microsoft Corporation | Correlating movement information received from different sources |
US20130077820A1 (en) * | 2011-09-26 | 2013-03-28 | Microsoft Corporation | Machine learning gesture detection |
US20130188081A1 (en) * | 2012-01-24 | 2013-07-25 | Charles J. Kulas | Handheld device with touch controls that reconfigure in response to the way a user operates the device |
US20130278501A1 (en) * | 2012-04-18 | 2013-10-24 | Arb Labs Inc. | Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis |
US20140035805A1 (en) * | 2009-04-02 | 2014-02-06 | David MINNEN | Spatial operating environment (soe) with markerless gestural control |
US20140195936A1 (en) * | 2013-01-04 | 2014-07-10 | MoneyDesktop, Inc. a Delaware Corporation | Presently operating hand detector |
US20140325373A1 (en) * | 2009-04-02 | 2014-10-30 | Oblong Industries, Inc. | Operating environment with gestural control and multiple client devices, displays, and users |
US20150032408A1 (en) * | 2012-03-08 | 2015-01-29 | Commissariat Al'energie Atomique Et Aux Energies Alternatives | System for capturing movements of an articulated structure |
US20150077336A1 (en) * | 2013-09-13 | 2015-03-19 | Nod, Inc. | Methods and Apparatus for Using the Human Body as an Input Device |
US20150154447A1 (en) * | 2013-12-04 | 2015-06-04 | Microsoft Corporation | Fusing device and image motion for user identification, tracking and device association |
US9094576B1 (en) * | 2013-03-12 | 2015-07-28 | Amazon Technologies, Inc. | Rendered audiovisual communication |
US9144744B2 (en) * | 2013-06-10 | 2015-09-29 | Microsoft Corporation | Locating and orienting device in space |
US20150355462A1 (en) * | 2014-06-06 | 2015-12-10 | Seiko Epson Corporation | Head mounted display, detection device, control method for head mounted display, and computer program |
US20160195940A1 (en) * | 2015-01-02 | 2016-07-07 | Microsoft Technology Licensing, Llc | User-input control device toggled motion tracking |
US20170118318A1 (en) * | 2015-10-21 | 2017-04-27 | Le Holdings (Beijing) Co., Ltd. | Mobile Phone |
US20170273639A1 (en) * | 2014-12-05 | 2017-09-28 | Myfiziq Limited | Imaging a Body |
US20170308165A1 (en) * | 2016-04-21 | 2017-10-26 | ivSystems Ltd. | Devices for controlling computers based on motions and positions of hands |
US20180020978A1 (en) * | 2016-07-25 | 2018-01-25 | Patrick Kaifosh | System and method for measuring the movements of articulated rigid bodies |
US20190080252A1 (en) * | 2017-04-06 | 2019-03-14 | AIBrain Corporation | Intelligent robot software platform |
US20190114836A1 (en) * | 2017-10-13 | 2019-04-18 | Fyusion, Inc. | Skeleton-based effects and background replacement |
US20190167059A1 (en) * | 2017-12-06 | 2019-06-06 | Bissell Inc. | Method and system for manual control of autonomous floor cleaner |
US20190197852A1 (en) * | 2017-12-27 | 2019-06-27 | Kerloss Sadek | Smart entry point spatial security system |
US10416755B1 (en) * | 2018-06-01 | 2019-09-17 | Finch Technologies Ltd. | Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system |
US20190339766A1 (en) * | 2018-05-07 | 2019-11-07 | Finch Technologies Ltd. | Tracking User Movements to Control a Skeleton Model in a Computer System |
US10796104B1 (en) * | 2019-07-03 | 2020-10-06 | Clinc, Inc. | Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system |
US20210072548A1 (en) * | 2019-09-10 | 2021-03-11 | Seiko Epson Corporation | Display system, control program for information processing device, method for controlling information processing device, and display device |
US20210233273A1 (en) * | 2020-01-24 | 2021-07-29 | Nvidia Corporation | Determining a 3-d hand pose from a 2-d image using machine learning |
US20210241529A1 (en) * | 2020-02-05 | 2021-08-05 | Snap Inc. | Augmented reality session creation using skeleton tracking |
US20210271863A1 (en) * | 2020-02-28 | 2021-09-02 | Fujitsu Limited | Behavior recognition method, behavior recognition device, and computer-readable recording medium |
US11210834B1 (en) * | 2015-09-21 | 2021-12-28 | TuringSense Inc. | Article of clothing facilitating capture of motions |
US20210402942A1 (en) * | 2020-06-29 | 2021-12-30 | Nvidia Corporation | In-cabin hazard prevention and safety control system for autonomous machine applications |
US11232294B1 (en) * | 2017-09-27 | 2022-01-25 | Amazon Technologies, Inc. | Generating tracklets from digital imagery |
US11249556B1 (en) * | 2020-11-30 | 2022-02-15 | Microsoft Technology Licensing, Llc | Single-handed microgesture inputs |
US20220245812A1 (en) * | 2019-08-06 | 2022-08-04 | The Johns Hopkins University | Platform to detect patient health condition based on images of physiological activity of a patient |
US20220258049A1 (en) * | 2021-02-16 | 2022-08-18 | Pritesh KANANI | System and method for real-time calibration of virtual apparel using stateful neural network inferences and interactive body measurements |
US20220410000A1 (en) * | 2019-07-09 | 2022-12-29 | Sony Interactive Entertainment Inc. | Skeleton model updating apparatus, skeleton model updating method, and program |
US20230127549A1 (en) * | 2020-06-25 | 2023-04-27 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, mobile device, head-mounted display, and system for estimating hand pose |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10565777B2 (en) * | 2016-09-30 | 2020-02-18 | Sony Interactive Entertainment Inc. | Field of view (FOV) throttling of virtual reality (VR) content in a head mounted display |
FR3068236B1 (en) * | 2017-06-29 | 2019-07-26 | Wandercraft | METHOD FOR SETTING UP AN EXOSQUELET |
US11281293B1 (en) * | 2019-04-30 | 2022-03-22 | Facebook Technologies, Llc | Systems and methods for improving handstate representation model estimates |
CN108876815B (en) | 2018-04-28 | 2021-03-30 | 深圳市瑞立视多媒体科技有限公司 | Skeleton posture calculation method, character virtual model driving method and storage medium |
CN108830150B (en) | 2018-05-07 | 2019-05-28 | 山东师范大学 | One kind being based on 3 D human body Attitude estimation method and device |
CN109885163A (en) | 2019-02-18 | 2019-06-14 | 广州卓远虚拟现实科技有限公司 | A kind of more people's interactive cooperation method and systems of virtual reality |
CN110472481B (en) * | 2019-07-01 | 2024-01-05 | 华南师范大学 | Sleeping gesture detection method, device and equipment |
CN110495889B (en) | 2019-07-04 | 2022-05-27 | 平安科技(深圳)有限公司 | Posture evaluation method, electronic device, computer device, and storage medium |
CN110502980B (en) * | 2019-07-11 | 2021-12-03 | 武汉大学 | Method for identifying scene behaviors of pedestrians playing mobile phones while crossing roads |
CN110675474B (en) | 2019-08-16 | 2023-05-02 | 咪咕动漫有限公司 | Learning method for virtual character model, electronic device, and readable storage medium |
CN111311714A (en) | 2020-03-31 | 2020-06-19 | 北京慧夜科技有限公司 | Attitude prediction method and system for three-dimensional animation |
CN112132955B (en) * | 2020-09-01 | 2024-02-06 | 大连理工大学 | Method for constructing digital twin body of human skeleton |
EP4224368A4 (en) * | 2020-09-29 | 2024-05-22 | Sony Semiconductor Solutions Corporation | Information processing system, and information processing method |
CN112884780A (en) | 2021-02-06 | 2021-06-01 | 罗普特科技集团股份有限公司 | Estimation method and system for human body posture |
CN112926550A (en) | 2021-04-15 | 2021-06-08 | 南京蓝镜数字科技有限公司 | Human-computer interaction method and device based on three-dimensional image human body posture matching |
CN113158459A (en) | 2021-04-20 | 2021-07-23 | 浙江工业大学 | Human body posture estimation method based on visual and inertial information fusion |
CN113191324A (en) | 2021-05-24 | 2021-07-30 | 清华大学 | Pedestrian behavior intention prediction method based on multi-task learning |
CN113610969B (en) | 2021-08-24 | 2024-03-08 | 国网浙江省电力有限公司双创中心 | Three-dimensional human body model generation method and device, electronic equipment and storage medium |
-
2022
- 2022-06-23 CN CN202210715227.9A patent/CN114821006B/en active Active
-
2023
- 2023-06-21 US US18/339,186 patent/US11809616B1/en active Active
Patent Citations (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100197390A1 (en) * | 2009-01-30 | 2010-08-05 | Microsoft Corporation | Pose tracking pipeline |
US20140325373A1 (en) * | 2009-04-02 | 2014-10-30 | Oblong Industries, Inc. | Operating environment with gestural control and multiple client devices, displays, and users |
US20140035805A1 (en) * | 2009-04-02 | 2014-02-06 | David MINNEN | Spatial operating environment (soe) with markerless gestural control |
US20110085705A1 (en) * | 2009-05-01 | 2011-04-14 | Microsoft Corporation | Detection of body and props |
US20110304557A1 (en) * | 2010-06-09 | 2011-12-15 | Microsoft Corporation | Indirect User Interaction with Desktop using Touch-Sensitive Control Surface |
US20120225719A1 (en) * | 2011-03-04 | 2012-09-06 | Mirosoft Corporation | Gesture Detection and Recognition |
US20130028517A1 (en) * | 2011-07-27 | 2013-01-31 | Samsung Electronics Co., Ltd. | Apparatus, method, and medium detecting object pose |
US20130069931A1 (en) * | 2011-09-15 | 2013-03-21 | Microsoft Corporation | Correlating movement information received from different sources |
US20130077820A1 (en) * | 2011-09-26 | 2013-03-28 | Microsoft Corporation | Machine learning gesture detection |
US20130188081A1 (en) * | 2012-01-24 | 2013-07-25 | Charles J. Kulas | Handheld device with touch controls that reconfigure in response to the way a user operates the device |
US20150032408A1 (en) * | 2012-03-08 | 2015-01-29 | Commissariat Al'energie Atomique Et Aux Energies Alternatives | System for capturing movements of an articulated structure |
US20130278501A1 (en) * | 2012-04-18 | 2013-10-24 | Arb Labs Inc. | Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis |
US20140195936A1 (en) * | 2013-01-04 | 2014-07-10 | MoneyDesktop, Inc. a Delaware Corporation | Presently operating hand detector |
US9094576B1 (en) * | 2013-03-12 | 2015-07-28 | Amazon Technologies, Inc. | Rendered audiovisual communication |
US9144744B2 (en) * | 2013-06-10 | 2015-09-29 | Microsoft Corporation | Locating and orienting device in space |
US20150077336A1 (en) * | 2013-09-13 | 2015-03-19 | Nod, Inc. | Methods and Apparatus for Using the Human Body as an Input Device |
US20150154447A1 (en) * | 2013-12-04 | 2015-06-04 | Microsoft Corporation | Fusing device and image motion for user identification, tracking and device association |
US20150355462A1 (en) * | 2014-06-06 | 2015-12-10 | Seiko Epson Corporation | Head mounted display, detection device, control method for head mounted display, and computer program |
US20170273639A1 (en) * | 2014-12-05 | 2017-09-28 | Myfiziq Limited | Imaging a Body |
US20160195940A1 (en) * | 2015-01-02 | 2016-07-07 | Microsoft Technology Licensing, Llc | User-input control device toggled motion tracking |
US11210834B1 (en) * | 2015-09-21 | 2021-12-28 | TuringSense Inc. | Article of clothing facilitating capture of motions |
US20170118318A1 (en) * | 2015-10-21 | 2017-04-27 | Le Holdings (Beijing) Co., Ltd. | Mobile Phone |
US20170308165A1 (en) * | 2016-04-21 | 2017-10-26 | ivSystems Ltd. | Devices for controlling computers based on motions and positions of hands |
US20180020978A1 (en) * | 2016-07-25 | 2018-01-25 | Patrick Kaifosh | System and method for measuring the movements of articulated rigid bodies |
US20190080252A1 (en) * | 2017-04-06 | 2019-03-14 | AIBrain Corporation | Intelligent robot software platform |
US11232294B1 (en) * | 2017-09-27 | 2022-01-25 | Amazon Technologies, Inc. | Generating tracklets from digital imagery |
US20190114836A1 (en) * | 2017-10-13 | 2019-04-18 | Fyusion, Inc. | Skeleton-based effects and background replacement |
US20190167059A1 (en) * | 2017-12-06 | 2019-06-06 | Bissell Inc. | Method and system for manual control of autonomous floor cleaner |
US20190197852A1 (en) * | 2017-12-27 | 2019-06-27 | Kerloss Sadek | Smart entry point spatial security system |
US20190339766A1 (en) * | 2018-05-07 | 2019-11-07 | Finch Technologies Ltd. | Tracking User Movements to Control a Skeleton Model in a Computer System |
US10416755B1 (en) * | 2018-06-01 | 2019-09-17 | Finch Technologies Ltd. | Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system |
US10796104B1 (en) * | 2019-07-03 | 2020-10-06 | Clinc, Inc. | Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system |
US20220410000A1 (en) * | 2019-07-09 | 2022-12-29 | Sony Interactive Entertainment Inc. | Skeleton model updating apparatus, skeleton model updating method, and program |
US20220245812A1 (en) * | 2019-08-06 | 2022-08-04 | The Johns Hopkins University | Platform to detect patient health condition based on images of physiological activity of a patient |
US20210072548A1 (en) * | 2019-09-10 | 2021-03-11 | Seiko Epson Corporation | Display system, control program for information processing device, method for controlling information processing device, and display device |
US20210233273A1 (en) * | 2020-01-24 | 2021-07-29 | Nvidia Corporation | Determining a 3-d hand pose from a 2-d image using machine learning |
US20210241529A1 (en) * | 2020-02-05 | 2021-08-05 | Snap Inc. | Augmented reality session creation using skeleton tracking |
US20210271863A1 (en) * | 2020-02-28 | 2021-09-02 | Fujitsu Limited | Behavior recognition method, behavior recognition device, and computer-readable recording medium |
US20230127549A1 (en) * | 2020-06-25 | 2023-04-27 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, mobile device, head-mounted display, and system for estimating hand pose |
US20210402942A1 (en) * | 2020-06-29 | 2021-12-30 | Nvidia Corporation | In-cabin hazard prevention and safety control system for autonomous machine applications |
US11249556B1 (en) * | 2020-11-30 | 2022-02-15 | Microsoft Technology Licensing, Llc | Single-handed microgesture inputs |
US20220258049A1 (en) * | 2021-02-16 | 2022-08-18 | Pritesh KANANI | System and method for real-time calibration of virtual apparel using stateful neural network inferences and interactive body measurements |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116617663A (en) * | 2022-02-08 | 2023-08-22 | 腾讯科技(深圳)有限公司 | Action instruction generation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN114821006A (en) | 2022-07-29 |
CN114821006B (en) | 2022-09-20 |
US11809616B1 (en) | 2023-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11809616B1 (en) | Twin pose detection method and system based on interactive indirect inference | |
CN110930483B (en) | Role control method, model training method and related device | |
JP7178396B2 (en) | Method and computer system for generating data for estimating 3D pose of object included in input image | |
CN112906604B (en) | Behavior recognition method, device and system based on skeleton and RGB frame fusion | |
CN114399826A (en) | Image processing method and apparatus, image device, and storage medium | |
KR20220025023A (en) | Animation processing method and apparatus, computer storage medium, and electronic device | |
CN107688391A (en) | A kind of gesture identification method and device based on monocular vision | |
CN113496507A (en) | Human body three-dimensional model reconstruction method | |
CN110135249A (en) | Human bodys' response method based on time attention mechanism and LSTM | |
CN115933868B (en) | Three-dimensional comprehensive teaching field system of turnover platform and working method thereof | |
CN101520902A (en) | System and method for low cost motion capture and demonstration | |
CN107392131A (en) | A kind of action identification method based on skeleton nodal distance | |
US10970849B2 (en) | Pose estimation and body tracking using an artificial neural network | |
CN1648840A (en) | Head carried stereo vision hand gesture identifying device | |
CN107621880A (en) | A kind of robot wheel chair interaction control method based on improvement head orientation estimation method | |
CN113255514B (en) | Behavior identification method based on local scene perception graph convolutional network | |
Huang et al. | A review of 3D human body pose estimation and mesh recovery | |
CN115798042A (en) | Escalator passenger abnormal behavior data construction method based on digital twins | |
Zhang et al. | Emotion recognition from body movements with as-lstm | |
CN116449947B (en) | Automobile cabin domain gesture recognition system and method based on TOF camera | |
CN113673494B (en) | Human body posture standard motion behavior matching method and system | |
CN116310102A (en) | Three-dimensional reconstruction method, terminal and medium of transparent object image based on deep learning | |
CN114202606A (en) | Image processing method, electronic device, storage medium, and computer program product | |
Liang et al. | Interactive Experience Design of Traditional Dance in New Media Era Based on Action Detection | |
Gao | The Application of Virtual Technology Based on Posture Recognition in Art Design Teaching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |