Nothing Special   »   [go: up one dir, main page]

US20230333633A1 - Twin pose detection method and system based on interactive indirect inference - Google Patents

Twin pose detection method and system based on interactive indirect inference Download PDF

Info

Publication number
US20230333633A1
US20230333633A1 US18/339,186 US202318339186A US2023333633A1 US 20230333633 A1 US20230333633 A1 US 20230333633A1 US 202318339186 A US202318339186 A US 202318339186A US 2023333633 A1 US2023333633 A1 US 2023333633A1
Authority
US
United States
Prior art keywords
pose
mobile phone
reasoner
human
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US18/339,186
Other versions
US11809616B1 (en
Inventor
Qing Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of US20230333633A1 publication Critical patent/US20230333633A1/en
Application granted granted Critical
Publication of US11809616B1 publication Critical patent/US11809616B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/033Recognition of patterns in medical or anatomical images of skeletal patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions

Definitions

  • This application relates to digital intelligence technology, and more particularly to a twin pose detection method and system based on interactive indirect inference.
  • digital twin technology the software empowerment and intelligent empowerment are achieved in the high-end manufacturing, and combined with the sensing device and intelligent device, there is no need to additionally provide collection and wearable devices, and the user can access virtual reality applications with physical interaction anywhere and anytime.
  • Sensor and assistive peripherals have long been used to build a bridge between the physical world and the virtual digital for users.
  • the improved sensory coherence is what users and the industry have always pursued and worked towards.
  • sensor peripherals provided by various hardware manufacturers are not the facilities used by universal consumers such that the virtual reality is difficult to popularize and the experience consistency is low.
  • the mobile smart terminals e.g., mobile phones, tablets, smart wears
  • the mobile smart terminals have a screen display, network communication and a certain degree of computing power characteristics, which are already popular and versatile.
  • Providing a basic and universal method for detecting poses using these universal devices can greatly facilitate the versatility and universality of virtual reality.
  • some specific sensor and assistive peripherals are not required, such as HMDs that have wearability and compatibility limitations, handles that occupy hands and have convenience limitations, additional camera image capture that rely on place and camera equipment, and wearable pose positioners that are not available anywhere and anytime, limited in space, specialist and expensive.
  • This application provides a method and system for detecting poses of twin based on interactive indirect inference, which enables software empowerment and intelligent empowerment through the digital intelligence technologies (digital twin technology) in high-end manufacturing, and enables ordinary users to access virtual reality applications anytime and anywhere through sensing devices and smart devices without external assistive acquisition and wearable devices.
  • Cipheral Patent Publication No. 113610969A (Application No. 202110974124.X) discloses a method for generating a three-dimensional (3D) human body model, including: acquiring to-be-detected images taken from a plurality of perspectives; detecting human body regions contained in the to-be-detected images, and detecting a data set of skeletal key points contained in the human body regions; constructing a fusion affinity matrix between the to-be-detected images by using the human body regions and the data set of skeletal key points; determining a matching relationship between the body regions by using the fusion affinity matrix; performing pose construction based on the matching relationship and the data set of skeletal key points to generate the 3D human body model.
  • the method can analyze human pose from various perspectives, extract the data of body regions and skeletal key points from the to-be-detected images, generate a 3D human body model by using the matching relationship between body regions and the data set of skeletal key points, and thus can fully and effectively restore the 3D poses of human body.
  • the method relies on sensors and multiple perspectives, which is different from the indirect inference of the method provided by the present disclosure.
  • Chinese Patent Publication No. 111311714A (Application No. 202010244300.X) discloses a method and system for pose prediction.
  • the method includes: acquiring pose information of a target character in one or more existing frames; and inputting the pose information into a trained pose prediction model to determine predicted pose information of the target character in subsequent frames, where the pose information includes skeletal rotation angle information and gait movement information.
  • This method is different from the method of the present disclosure in objectives, and detection techniques.
  • Cipheral Patent Publication No. 112132955A (Application No. 202010902457.7) discloses a digital twin construction method for human skeleton.
  • data at important locations of the human body is acquired via VR motion capture and sensor technology.
  • Key data is obtained through data classification, screening, simplification and calculation via artificial intelligence.
  • the spatial orientation and mechanical information of the target skeleton is obtained by solving the key data with human inverse dynamics and biomechanical algorithms. After fusing some of the sensor data with the computational results, simulation is performed on the target skeleton to obtain the biomechanical properties of the target skeleton and predict the biomechanical properties of the target skeleton in unknown poses using various prediction algorithms.
  • the performance data is subjected to modelling and rendering to obtain a high-fidelity digital twin of the real skeleton, achieving a faithful twin mapping of the biomechanical properties of the skeleton.
  • sensors and external sensing of VR devices are used. Given that these devices per se can greatly get direct sensing data to complete the pose detection, which is entirely different from the interactive indirect inference proposed in the present disclosure. What is more different is that in the present disclosure, inference engines are designed according to different parts to ensure targeted inference detection.
  • Chinese Patent Publication No. 110495889A (Application No. 201910599978.7) discloses a method of pose assessment, an electronic device, a computer device and a storage medium.
  • the method includes: obtaining to-be-tested images, where the to-be-tested images include a front full body image and a side full body image of a tester standing upright; extracting skeletal key points from the to-be-tested images; calculating, based on the skeletal key points, a pose vector of the tester; and obtaining a bending angle of the pose vector.
  • This method relies on sensors and multiple perspectives, which is different from the indirect inference of the present disclosure.
  • Chinese Patent Publication No. 113191324A (Application No. 202110565975.9) discloses a method for predicting pedestrian behavioral intention based on multi-task learning, including: constructing a training sample set; constructing a pedestrian behavioral intention prediction model using a base network, a pose detection network and an intention recognition network; and extracting image features to obtain a feature map with a single frame image of the training sample set as an input of the base network.
  • An encoder of the pose detection network includes a part intensity field sub-network and a part association field sub-network.
  • Pedestrian pose images are acquired by a decoder of the pose detection network according to the joint feature map and the bone feature map, where the feature map is set as an input of both the part intensity field sub-network and the part association field sub-network, a joint feature map and a bone feature map are set as an output of the part intensity field sub-network and the part association field sub-network, respectively.
  • the feature map is set as an input of the intension recognition network
  • the pedestrian behavioral intention image is set as an output of the intension recognition network.
  • the pedestrian behavioral intention prediction model is trained and used to predict the pedestrian behavioral intention. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
  • Chinese Patent Publication No. 112884780A (Application No. 202110165636.1) discloses a method and system for estimating poses of human body, including: inputting and training an image in a codec network having a four-layer coding layer and a four-layer decoding layer structure, and outputting a semantic segmentation result; converting a semantic probability map of a pixel obtained in the former two coding layers into an edge activation using an energy function pixel map, where the activation value responding to the pixel is larger than the activation value threshold, and the pixel is an edge pixel; obtaining an instance segmentation result by aggregating pixels belonging to the same instance based on the semantic labels in the semantic segmentation result, where the instance segmentation result includes a mask indicating the instance to which each pixel belongs; generating the human skeletal confidence map using the full convolutional network, and outputting the skeletal component labels to which each pixel belongs in each instance; and regressing locations of nodal points through the fully connected network to create a skeletal structure
  • Chinese Patent No. 108830150B (Application No. 201810426144.1) discloses a method and device for three-dimensional (3D) pose estimation of human body, including (S1) acquiring, by a monocular camera, depth images and red, green and blue (RGB) images of the human body at different angles; (S2): constructing a human skeletal key point detection neural network based on the RGB images to obtain a key point-annotated image; (S3) constructing a two-dimensional (2D)-3D mapping network of hand joint nodes; (S4) calibrating the depth image and the key point-annotated image of the human body at the same angle, and performing 3D point cloud coloring transformation on the corresponding depth image to obtain a coloring depth image; (S5) predicting, by predefined learning network, the corresponding position of the annotated human skeleton key points in the depth image based on the key point-annotated image and the coloring depth image; and (S6) combining the outputs of steps (S3) and (S5) to achieve refined estimation of 3D
  • Chinese Patent Publication No. 109885163A (Application No. 201910122971.6) discloses a method and system for multiplayer interactive collaboration in virtual reality.
  • the system includes a motion acquisition device to acquire skeletal data of a user; a plurality of clients for data modeling to obtain pose data of the user based on the skeletal data and mapping it to initial joint position data of each joint point of the skeletal model, and a server used to bind the initial joint position data of the skeletal model to the scene character of the user and obtain and synchronously transmit the character position data to other scene characters.
  • the clients are also used to update the initial joint position data of the scene character, and combine with the model animation of the virtual scene to form the skeletal animation of pose movement.
  • Chinese Patent Publication No. 112926550A (Application No. 202110406810.7) discloses a human-computer interaction method based on 3D image pose matching of human and an apparatus thereof.
  • the method includes: initializing an interaction machine and storing corresponding a 3D image of a template pose into the interaction machine based on interaction requirements; acquiring a plurality of nodes based on a deep learning method and constructing a 3D skeleton model based on the plurality of nodes; obtaining skeleton information of the current to-be-interacted human and inputting it into the 3D skeleton model to obtain human pose features; calculating a loss function value between the human pose features and the interaction data set; and comparing the loss function value with the set threshold value to determine whether to carry out human-machine interaction.
  • this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing
  • Chinese Patent Publication No. 110675474A (Application No. 201910758741.9) discloses a learning method of a virtual character model, an electronic device and a readable storage medium.
  • the learning method for the virtual character model includes: obtaining a first skeletal pose information corresponding to an action of a target character in a current video image frame; obtaining skeletal pose adjustment information of a virtual character model corresponding to the current video image frame based on the first skeletal pose information and a second skeletal pose information, where the second skeletal pose information is the skeletal pose information of the virtual character model corresponding to a previous video image frame; driving the virtual character model according to the skeletal pose adjustment information for the virtual character model to learn the action of the target character in the current video image frame, so that the learning process between the virtual character model and a person can be simulated to form interactive experiences between a person and a virtual character, such as training, education, and nurturing.
  • this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction
  • Chinese Patent Publication No. 113158459A (Application No. 202110422431.7) discloses a method for human pose estimation based on fusion of vision and inertial information. Since the human pose estimation method based on 3D vision sensors cannot provide three-degree-of-freedom rotation information, in this method, by using the complementary nature of visual and inertial information, a nonlinear optimization method is used to adaptively fuse vision information, inertial information and human pose priori information to obtain the rotation angle of a skeletal node and the global position of a root skeletal node at each moment, and complete real-time estimation for poses of the human body. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
  • Chinese Patent No. 108876815B (application number: 201810403604.9) discloses a skeletal pose calculation method, a virtual character model driving method, and a storage medium, where the skeletal pose calculation method is a key step of the virtual character model driving method, which includes an iterative calculation process for skeletal poses based on inverse kinematics. Based on inverse derivation, the joint angle change of the middle joint of the human skeletal chain is calculated based on the change of pose information of the limb, so that the joint angle of each joint is close to the optimal value after each iteration, effectively ensuring the smooth gradation effect when simulating the limb action and thus meeting the application requirements of realistic simulation of limb action.
  • the rotation and movement of the intelligent device are visible, while the invisible twin human body is intended to be used as a constraint and mapping to solve the logical intermediate link that is transformed into the indirect inference of the pose setting.
  • the mapping relationship between the twin human body and the virtual human part is a visualization constraint.
  • An object of the present disclosure is to provide a method and system for twin pose detection based on interactive indirect inference to overcome the deficiencies in the prior art.
  • this application provides a twin pose detection method based on interactive indirect inference, comprising:
  • the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining reference information about program operation of the mobile phone;
  • step (S3) the predetermined body mechanics constraint is performed through steps of:
  • step (S3) is performed by a step of:
  • this application provides a twin pose detection system based on interactive indirect inference, comprising:
  • the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone;
  • the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
  • the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements
  • the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
  • FIG. 1 is a flow chart of a twin pose detection method based on interactive indirect inference according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of rotation of a mobile intelligent terminal under normal gravitational gravity, where a limb behavior is temporary skeletal extension;
  • FIG. 3 schematically shows an inference of a pose of human skeleton according to an embodiment of the present disclosure
  • FIG. 4 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
  • FIG. 5 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
  • FIG. 6 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
  • FIG. 7 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure
  • FIG. 8 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure.
  • FIG. 9 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure.
  • sensors are used to obtain an accurate correspondence of judgement through direct acquisition, for example, the speed sensor is used to obtain speed.
  • a set of relevant sensing information is used for indirect reasoning.
  • the indirect reasoning acquires the most possible information about time space and pose through the basic sensor and avoids the use of additional and unreachable sensing equipment.
  • the rotational change of the device as shown in FIG. 2 is not just its own rotation in the full 720° three-dimensional (3D) space, the change in pose of the device is caused by the skeletal linkage of the user.
  • this 3D spatial rotation is mapped into several twin body states of the twin being used.
  • this application provides a twin pose detection method based on interactive indirect inference, which includes the following steps.
  • the plurality of sensors on the mobile phone include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone (e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker).
  • a nine-axis gyroscope an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor
  • a sensor capable of obtaining program operation reference information of the mobile phone e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker.
  • the data set obtained by the plurality of sensors on the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
  • the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
  • the hand poses include a lift of the mobile phone by the left hand, a lift of the mobile phone by the right hand, and a lift of the mobile phone by both the left hand and the right hand.
  • the arm poses include a raised-arm pose and a dropped-arm pose.
  • the torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, a facing-upward lying pose, and a facing-downward lying pose.
  • the head poses include a looking-straight ahead pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
  • the leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
  • the preferred ways for the mobile phone include the way of using the left hand as a usual hand and the way of using the right hand as the usual hand.
  • the plurality of reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the current lifting left hand, right hand or both hands is reasoned by the corresponding preferred reasoner for left-hand motion and the right-handed part reasoner.
  • the reasoning process is performed as follows. Based on the information collected by the sensors on the mobile phone, including: there is no horizontal and vertical displacement; as detected by the gyroscope, an angle between the screen display plane and a vertical plane of the ground is ⁇ 15°; the mobile phone is not at sea or on an aircraft according to the location information; and the mobile phone is continuously on the lighted screen without being touched, and there is a certain continuous vibration, it can be reasoned by the left-handed reasoner and the right-handed reasoner that the currently-raised hand is the non-dominant hand, and otherwise, the currently-raised hand is the dominant hand.
  • the threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands.
  • the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise.
  • the hand reasoner When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone.
  • the hand reasoner When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is raised.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
  • the reasoning process includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105° (that is, the user is most likely looking down at the screen), it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, then tit is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion.
  • the initial pose of the head is a straight sight.
  • the head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left.
  • the head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
  • the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
  • the predetermined human mechanics constraint is performed through the following steps.
  • the human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending of the human body, coherence and connection of physiological structures of the human body and joint bending.
  • step (S5) is performed by the following step.
  • the plurality of initial virtual skeletons are subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
  • the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
  • This application also provides a twin pose detection system based on interactive indirect inference, which includes a module M 1 , a module M 2 , a module M 3 , and a module M 4 .
  • the module M 1 is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; where the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
  • the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
  • the sensors of the mobile phone include the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone.
  • the data set by the sensors of the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
  • the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
  • the hand poses include a raise of the mobile phone by the left hand, a raise of the mobile phone by the right hand, and a raise of the mobile phone by both the left hand and the right hand.
  • the arm poses include a raised-arm pose and a dropped-arm pose.
  • the torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose, and a downward facing lying pose.
  • the head poses include a looking-straight pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
  • the leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
  • the module M 2 is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types.
  • the preferred way of using the mobile phone include using the left hand as a dominant hand and using the right hand as a dominant hand.
  • the reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • the module M 3 is configured to train the overall skeletal reasoner during the human-computer interaction, and obtain the corresponding probabilities by the overall skeletal reasoner based on the input overall skeletal pose.
  • the module M 4 is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the currently raised is left hand, right hand or both hands is reasoned by the corresponding reasoner for left-hand motion and the reasoner for right-hand motion.
  • the reasoning process includes the following steps. There is no horizontal and vertical displacement.
  • the gyroscope detects that the angle of the screen display plane on the vertical ground is ⁇ 15°.
  • the mobile phone is not at sea or in an aircraft from the positioning information.
  • the mobile phone is continuously on the lighted screen and not in touch, and there is a continuous amount of vibration during the process.
  • the threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands.
  • the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise.
  • the hand reasoner When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone.
  • the hand reasoner When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is lifted.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone experiences certain continuous vibration, the pressure altitude of the mobile phone relative to the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
  • the reasoning includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105°, and the user is looking down at the screen, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen.
  • the head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion.
  • the initial head pose is a straight sight.
  • the head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left.
  • the head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
  • the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
  • the leg reasoner According to the pressure altitude of the mobile phone from the ground, to reason a standing pose. If the displacement changes continuous without vibration, the travelling pose is reasoned by the leg reasoner. When the mobile phone is shaking and vibrating, and the displacement velocity is within the walking speed range, then the walking pose is inferred by the leg reasoner.
  • the module M 5 is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons.
  • the module M 6 is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
  • the predetermined body mechanics constraint is performed by a module M 6 . 1 , a module M 6 . 2 , and a module M 6 . 3 .
  • the module M 6 . 1 is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements.
  • the module M 6 . 2 is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images.
  • the origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e., the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
  • the module M 6 . 3 is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data.
  • the human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending, coherence and connection of physiological structures and joint bending.
  • the module M 5 is configured to allow the plurality of initial virtual skeletons to be subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
  • the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
  • system, device and individual modules provided herein can be implemented in purely computer-readable program code. Besides, it is possible to logically program the method steps such that the system, device and individual modules provided herein can be implemented in the form of logic gates, switches, special integrated circuits, programmable logic controllers and embedded microcontrollers. Therefore, the system, device and the individual modules provided herein can be considered as a hardware component and the modules included therein for implementing the various programs can be considered as structures within the hardware component.
  • the modules for implementing the various functions can also be considered as structures that can be both software programs for implementing the method and structures within the hardware component.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and system for twin pose detection based on interactive indirect inference. The method includes: acquiring, by sensors on a mobile phone, a data set in real time; and obtaining poses of individual parts of a skeleton of an object by reasoning using reasoners on individual parts based on the data set and a preferred way of the object for using the mobile phone; merging the poses to generate multiple initial virtual skeletons; under a predetermined human mechanics constraint, obtaining multiple overall virtual skeletons satisfying the predetermined human mechanics constraint from the initial virtual skeletons; and screening a predetermined number of overall virtual skeletons from the overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of priority from Chinese Patent Application No. 202210715227.9, filed on Jun. 23, 2022. The content of the aforementioned application, including any intervening amendments thereto, is incorporated herein by reference in its entirety.
  • TECHNICAL FIELD
  • This application relates to digital intelligence technology, and more particularly to a twin pose detection method and system based on interactive indirect inference. By means of the digital twin technology, the software empowerment and intelligent empowerment are achieved in the high-end manufacturing, and combined with the sensing device and intelligent device, there is no need to additionally provide collection and wearable devices, and the user can access virtual reality applications with physical interaction anywhere and anytime.
  • BACKGROUND
  • Sensor and assistive peripherals (such as specific smart wearable products and head-mounted displays (HMDs)) have long been used to build a bridge between the physical world and the virtual digital for users. The improved sensory coherence is what users and the industry have always pursued and worked towards. However, the experience limitations of current sensor peripherals and the compatibility problems between devices exist. For example, sensor peripherals provided by various hardware manufacturers are not the facilities used by universal consumers such that the virtual reality is difficult to popularize and the experience consistency is low.
  • The mobile smart terminals (e.g., mobile phones, tablets, smart wears) have a screen display, network communication and a certain degree of computing power characteristics, which are already popular and versatile. Providing a basic and universal method for detecting poses using these universal devices can greatly facilitate the versatility and universality of virtual reality. In this way, some specific sensor and assistive peripherals are not required, such as HMDs that have wearability and compatibility limitations, handles that occupy hands and have convenience limitations, additional camera image capture that rely on place and camera equipment, and wearable pose positioners that are not available anywhere and anytime, limited in space, specialist and expensive.
  • The above objectively depicts the current situation of the popularity and real-time nature of virtual reality and introduces a basic method for pose detection of twin that can be built through mobile smart terminals. This application provides a method and system for detecting poses of twin based on interactive indirect inference, which enables software empowerment and intelligent empowerment through the digital intelligence technologies (digital twin technology) in high-end manufacturing, and enables ordinary users to access virtual reality applications anytime and anywhere through sensing devices and smart devices without external assistive acquisition and wearable devices.
  • Chinese Patent Publication No. 113610969A (Application No. 202110974124.X) discloses a method for generating a three-dimensional (3D) human body model, including: acquiring to-be-detected images taken from a plurality of perspectives; detecting human body regions contained in the to-be-detected images, and detecting a data set of skeletal key points contained in the human body regions; constructing a fusion affinity matrix between the to-be-detected images by using the human body regions and the data set of skeletal key points; determining a matching relationship between the body regions by using the fusion affinity matrix; performing pose construction based on the matching relationship and the data set of skeletal key points to generate the 3D human body model. The method can analyze human pose from various perspectives, extract the data of body regions and skeletal key points from the to-be-detected images, generate a 3D human body model by using the matching relationship between body regions and the data set of skeletal key points, and thus can fully and effectively restore the 3D poses of human body. The method relies on sensors and multiple perspectives, which is different from the indirect inference of the method provided by the present disclosure.
  • Chinese Patent Publication No. 111311714A (Application No. 202010244300.X) discloses a method and system for pose prediction. The method includes: acquiring pose information of a target character in one or more existing frames; and inputting the pose information into a trained pose prediction model to determine predicted pose information of the target character in subsequent frames, where the pose information includes skeletal rotation angle information and gait movement information. This method is different from the method of the present disclosure in objectives, and detection techniques.
  • Chinese Patent Publication No. 112132955A (Application No. 202010902457.7) discloses a digital twin construction method for human skeleton. In this method, data at important locations of the human body is acquired via VR motion capture and sensor technology. Key data is obtained through data classification, screening, simplification and calculation via artificial intelligence. The spatial orientation and mechanical information of the target skeleton is obtained by solving the key data with human inverse dynamics and biomechanical algorithms. After fusing some of the sensor data with the computational results, simulation is performed on the target skeleton to obtain the biomechanical properties of the target skeleton and predict the biomechanical properties of the target skeleton in unknown poses using various prediction algorithms. Finally, the performance data is subjected to modelling and rendering to obtain a high-fidelity digital twin of the real skeleton, achieving a faithful twin mapping of the biomechanical properties of the skeleton. In this disclosure, sensors and external sensing of VR devices are used. Given that these devices per se can greatly get direct sensing data to complete the pose detection, which is entirely different from the interactive indirect inference proposed in the present disclosure. What is more different is that in the present disclosure, inference engines are designed according to different parts to ensure targeted inference detection.
  • Chinese Patent Publication No. 110495889A (Application No. 201910599978.7) discloses a method of pose assessment, an electronic device, a computer device and a storage medium. The method includes: obtaining to-be-tested images, where the to-be-tested images include a front full body image and a side full body image of a tester standing upright; extracting skeletal key points from the to-be-tested images; calculating, based on the skeletal key points, a pose vector of the tester; and obtaining a bending angle of the pose vector. This method relies on sensors and multiple perspectives, which is different from the indirect inference of the present disclosure.
  • Chinese Patent Publication No. 113191324A (Application No. 202110565975.9) discloses a method for predicting pedestrian behavioral intention based on multi-task learning, including: constructing a training sample set; constructing a pedestrian behavioral intention prediction model using a base network, a pose detection network and an intention recognition network; and extracting image features to obtain a feature map with a single frame image of the training sample set as an input of the base network. An encoder of the pose detection network includes a part intensity field sub-network and a part association field sub-network. Pedestrian pose images are acquired by a decoder of the pose detection network according to the joint feature map and the bone feature map, where the feature map is set as an input of both the part intensity field sub-network and the part association field sub-network, a joint feature map and a bone feature map are set as an output of the part intensity field sub-network and the part association field sub-network, respectively. The feature map is set as an input of the intension recognition network, and the pedestrian behavioral intention image is set as an output of the intension recognition network. The pedestrian behavioral intention prediction model is trained and used to predict the pedestrian behavioral intention. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
  • Chinese Patent Publication No. 112884780A (Application No. 202110165636.1) discloses a method and system for estimating poses of human body, including: inputting and training an image in a codec network having a four-layer coding layer and a four-layer decoding layer structure, and outputting a semantic segmentation result; converting a semantic probability map of a pixel obtained in the former two coding layers into an edge activation using an energy function pixel map, where the activation value responding to the pixel is larger than the activation value threshold, and the pixel is an edge pixel; obtaining an instance segmentation result by aggregating pixels belonging to the same instance based on the semantic labels in the semantic segmentation result, where the instance segmentation result includes a mask indicating the instance to which each pixel belongs; generating the human skeletal confidence map using the full convolutional network, and outputting the skeletal component labels to which each pixel belongs in each instance; and regressing locations of nodal points through the fully connected network to create a skeletal structure of the human body in each instance to obtain human pose information. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
  • Chinese Patent No. 108830150B (Application No. 201810426144.1) discloses a method and device for three-dimensional (3D) pose estimation of human body, including (S1) acquiring, by a monocular camera, depth images and red, green and blue (RGB) images of the human body at different angles; (S2): constructing a human skeletal key point detection neural network based on the RGB images to obtain a key point-annotated image; (S3) constructing a two-dimensional (2D)-3D mapping network of hand joint nodes; (S4) calibrating the depth image and the key point-annotated image of the human body at the same angle, and performing 3D point cloud coloring transformation on the corresponding depth image to obtain a coloring depth image; (S5) predicting, by predefined learning network, the corresponding position of the annotated human skeleton key points in the depth image based on the key point-annotated image and the coloring depth image; and (S6) combining the outputs of steps (S3) and (S5) to achieve refined estimation of 3D pose estimation of the human body. This method relies on sensors and image feature extraction, which differs from the indirect inference of the present disclosure and produce different results.
  • Chinese Patent Publication No. 109885163A (Application No. 201910122971.6) discloses a method and system for multiplayer interactive collaboration in virtual reality. The system includes a motion acquisition device to acquire skeletal data of a user; a plurality of clients for data modeling to obtain pose data of the user based on the skeletal data and mapping it to initial joint position data of each joint point of the skeletal model, and a server used to bind the initial joint position data of the skeletal model to the scene character of the user and obtain and synchronously transmit the character position data to other scene characters. The clients are also used to update the initial joint position data of the scene character, and combine with the model animation of the virtual scene to form the skeletal animation of pose movement. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
  • Chinese Patent Publication No. 112926550A (Application No. 202110406810.7) discloses a human-computer interaction method based on 3D image pose matching of human and an apparatus thereof. The method includes: initializing an interaction machine and storing corresponding a 3D image of a template pose into the interaction machine based on interaction requirements; acquiring a plurality of nodes based on a deep learning method and constructing a 3D skeleton model based on the plurality of nodes; obtaining skeleton information of the current to-be-interacted human and inputting it into the 3D skeleton model to obtain human pose features; calculating a loss function value between the human pose features and the interaction data set; and comparing the loss function value with the set threshold value to determine whether to carry out human-machine interaction. By using this method and apparatus, the use experience of human-machine interaction function can be improved. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
  • Chinese Patent Publication No. 110675474A (Application No. 201910758741.9) discloses a learning method of a virtual character model, an electronic device and a readable storage medium. The learning method for the virtual character model includes: obtaining a first skeletal pose information corresponding to an action of a target character in a current video image frame; obtaining skeletal pose adjustment information of a virtual character model corresponding to the current video image frame based on the first skeletal pose information and a second skeletal pose information, where the second skeletal pose information is the skeletal pose information of the virtual character model corresponding to a previous video image frame; driving the virtual character model according to the skeletal pose adjustment information for the virtual character model to learn the action of the target character in the current video image frame, so that the learning process between the virtual character model and a person can be simulated to form interactive experiences between a person and a virtual character, such as training, education, and nurturing. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
  • Chinese Patent Publication No. 113158459A (Application No. 202110422431.7) discloses a method for human pose estimation based on fusion of vision and inertial information. Since the human pose estimation method based on 3D vision sensors cannot provide three-degree-of-freedom rotation information, in this method, by using the complementary nature of visual and inertial information, a nonlinear optimization method is used to adaptively fuse vision information, inertial information and human pose priori information to obtain the rotation angle of a skeletal node and the global position of a root skeletal node at each moment, and complete real-time estimation for poses of the human body. Although this method involves interaction and action, it is different from the indirect inference method provided in the present disclosure in purposes of interaction, generation modes of actions, and sources of dependence of landing points.
  • Chinese Patent No. 108876815B (application number: 201810403604.9) discloses a skeletal pose calculation method, a virtual character model driving method, and a storage medium, where the skeletal pose calculation method is a key step of the virtual character model driving method, which includes an iterative calculation process for skeletal poses based on inverse kinematics. Based on inverse derivation, the joint angle change of the middle joint of the human skeletal chain is calculated based on the change of pose information of the limb, so that the joint angle of each joint is close to the optimal value after each iteration, effectively ensuring the smooth gradation effect when simulating the limb action and thus meeting the application requirements of realistic simulation of limb action. In addition, multiple judgment mechanisms are adopted in the iterative calculation process, which can update the change in the angle of each joint and in the pose information of the limb in time for the next iteration, simplifying the judgment process and ensuring the effectiveness of the iterative cycle, facilitating the calculation speed of the system while ensuring the correct calculation results, and enhancing the real-time nature of the limb movement capture process. Although this method involves pose prediction and action generation, it is different from the indirect inference method provided in the present disclosure in generation modes of actions, data sources, and sources of dependence of landing points. Moreover, this publication actually relates to the optimization method for fast skeleton calculation, which aims to improve the continuity smooth of the animation, and the virtual character is also presented as the landing point. By contrast, in the present disclosure, the rotation and movement of the intelligent device are visible, while the invisible twin human body is intended to be used as a constraint and mapping to solve the logical intermediate link that is transformed into the indirect inference of the pose setting. Besides, when the final output is used on applications, it can be used in the virtual character in a rendering form or used for application simulation demonstration. Hence, the mapping relationship between the twin human body and the virtual human part is a visualization constraint.
  • SUMMARY
  • An object of the present disclosure is to provide a method and system for twin pose detection based on interactive indirect inference to overcome the deficiencies in the prior art.
  • In a first aspect, this application provides a twin pose detection method based on interactive indirect inference, comprising:
      • (S1) acquiring, by a plurality of sensors on a mobile phone, a data set in real time;
      • and obtaining poses of individual parts of a skeleton of an object by reasoning using a plurality of reasoners on individual parts of the skeleton based on the data set and a preferred way of the object for using the mobile phone;
      • (S2) merging the poses obtained in step (S1) to generate a plurality of initial virtual skeletons;
      • (S3) under a predetermined human mechanics constraint, obtaining a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
      • (S4) screening a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner.
  • In some embodiments, the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • In some embodiments, the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining reference information about program operation of the mobile phone;
      • the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
      • the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
      • the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
      • the infrared distance sensor is configured to acquire an altitude of the mobile phone;
      • the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
      • the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
  • In some embodiments, in step (S3), the predetermined body mechanics constraint is performed through steps of:
      • (S3.1) acquiring a predetermined number of unconventional human pose images that meet predetermined requirements;
      • (S3.2) extracting three-dimensional (3D) coordinates and pose data of virtual skeletal location points in Euclidean space based on the predetermined number of unconventional human pose images; and
      • (S3.3) correcting constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data obtained in step (S3.2);
      • wherein the human natural engineering mechanics is in accordance with natural category of physiological movements, and comprises physiological bending of the human body, coherence and connection of physiological structures of the human body and bending of joints of the human body.
  • In some embodiments, step (S3) is performed by a step of:
      • subjecting the plurality of initial virtual skeletons to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
  • In a second aspect, this application provides a twin pose detection system based on interactive indirect inference, comprising:
      • a first module;
      • a second module;
      • a third module; and
      • a fourth module;
      • wherein the first module is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone;
      • the second module is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons;
      • the third module is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
      • the fourth module is configured to screen a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and perform reasoning on the predetermined number of overall virtual skeletons by using an overall skeleton reasoner to obtain a dynamic twin virtual pose in real time;
      • the first module comprises a first submodule and a second submodule;
      • wherein the first submodule is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
      • the second submodule is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types; wherein the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
  • In some embodiments, the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • In some embodiments, the plurality of sensors comprise the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone;
      • the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
      • the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
      • the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
      • the infrared distance sensor is configured to acquire an altitude of the mobile phone;
      • the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
      • the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
  • In some embodiments, the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
  • the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements;
      • the sixth module is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images; and the seventh module is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data;
      • wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
  • In some embodiments, the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
  • Compared with the prior art, the beneficial effects of the present disclosure are described below.
      • (1) By using the method provided herein, the change process of the poses of the human body can be reasoned indirectly through the intrinsic sensors of the smart mobile device by combining with the pose inertia of the user and relying on the body of relation between twin human joints and the device.
      • (2) In this application, additional helmets, handles, external fit sensors and independent external cameras are not required for pose detection and generation. In contrast, the sensors on the mobile smart device, such as gyroscope, acceleration, level, geomagnetic, and touch screen sliding, are used to directly generate virtual poses based on the relative spatial relationship of the interaction used by the user, and then the physical poses are indirectly detected.
      • (3) Based on the data obtained from the intrinsic sensors of the mobile phone, reasoning is performed by using the reasoner for individual parts of the human skeleton, combined with human mechanics constraints and the overall skeletal reasoner such that the accuracy of reasoning is improved.
      • (4) correcting the constraint tolerance of the natural engineering mechanics of the human body based on unconventional pose, thus improving the accuracy of inference;
      • (5) The corresponding preferred reasoner is trained by selecting the corresponding data set based on personal preferred mobile phone, which improves the accuracy of reasoning;
      • (6) The virtual pose detection results are transformed into a physique virtual skeleton which is then provided for use in ecological applications.
    BRIEF DESCRIPTION OF THE DRAWINGS
  • Other features, objects and advantages of the present disclosure will be more apparent according to the detailed description of non-limiting embodiments made with reference to the following accompanying drawings.
  • FIG. 1 is a flow chart of a twin pose detection method based on interactive indirect inference according to an embodiment of the present disclosure;
  • FIG. 2 is a schematic diagram of rotation of a mobile intelligent terminal under normal gravitational gravity, where a limb behavior is temporary skeletal extension;
  • FIG. 3 schematically shows an inference of a pose of human skeleton according to an embodiment of the present disclosure;
  • FIG. 4 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure;
  • FIG. 5 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure;
  • FIG. 6 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure;
  • FIG. 7 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure;
  • FIG. 8 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure; and
  • FIG. 9 schematically shows an inference of a pose of the human skeleton according to an embodiment of the present disclosure.
  • DETAILED DESCRIPTION OF EMBODIMENTS
  • The present disclosure is described in detail below with reference to specific embodiments. The following embodiments can help those of skill in the art to further understand the present disclosure, but are not intended to limit the present disclosure in any way. It should be noted that to one of ordinary skill in the art can make several variations and improvements without departing from the conception of the present disclosure, and these variations and improvements shall fall within the scope of protection of the present disclosure.
  • In the prior art, sensors are used to obtain an accurate correspondence of judgement through direct acquisition, for example, the speed sensor is used to obtain speed. In the present disclosure, a set of relevant sensing information is used for indirect reasoning. As there is a great repetition and indirect generation of sensing due to human behavior and use, the obtained basic sensing is different due to different time space and poses. The indirect reasoning acquires the most possible information about time space and pose through the basic sensor and avoids the use of additional and unreachable sensing equipment.
  • Embodiment 1
  • The rotational change of the device as shown in FIG. 2 is not just its own rotation in the full 720° three-dimensional (3D) space, the change in pose of the device is caused by the skeletal linkage of the user. By using the method and system for twin pose detection based on interactive indirect inference provided in the present disclosure, this 3D spatial rotation is mapped into several twin body states of the twin being used.
  • Referring to FIGS. 1-9 , this application provides a twin pose detection method based on interactive indirect inference, which includes the following steps.
      • (S1) During human-computer interaction, a training set is constructed based on the data set obtained by a plurality of sensors on a mobile phone and poses of individual parts of a skeleton.
  • Specifically, the plurality of sensors on the mobile phone include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone (e.g., a screen brightness acquisition sensor, a sensor for acquisition of masking light sensing, and a speaker).
  • The data set obtained by the plurality of sensors on the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
  • Specifically, the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
  • The hand poses include a lift of the mobile phone by the left hand, a lift of the mobile phone by the right hand, and a lift of the mobile phone by both the left hand and the right hand.
  • The arm poses include a raised-arm pose and a dropped-arm pose.
  • The torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, a facing-upward lying pose, and a facing-downward lying pose.
  • The head poses include a looking-straight ahead pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
  • The leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
      • (S2) The training set is labeled and classified based on the preferred ways of the mobile phone, and a plurality of reasoners for individual pars of human skeleton are trained based on training sets in different multi-modal types.
  • Specifically, the preferred ways for the mobile phone include the way of using the left hand as a usual hand and the way of using the right hand as the usual hand.
  • The plurality of reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
      • (S3) During human-computer interaction, an overall skeletal reasoner is trained, and the corresponding weights are obtained by the overall skeletal reasoner based on input overall skeletal poses.
      • (S4) The data set is acquired by the plurality of sensors on the mobile phone in real-time. The reasoning is performed by the plurality of reasoners for individual parts of human skeleton based on preferred way of the object for using the mobile phone and the data set to obtain preferred poses of the individual parts of the human skeleton;
  • In an embodiment, as shown in FIGS. 3 and 4 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the current lifting left hand, right hand or both hands is reasoned by the corresponding preferred reasoner for left-hand motion and the right-handed part reasoner.
  • Specifically, the reasoning process is performed as follows. Based on the information collected by the sensors on the mobile phone, including: there is no horizontal and vertical displacement; as detected by the gyroscope, an angle between the screen display plane and a vertical plane of the ground is ±15°; the mobile phone is not at sea or on an aircraft according to the location information; and the mobile phone is continuously on the lighted screen without being touched, and there is a certain continuous vibration, it can be reasoned by the left-handed reasoner and the right-handed reasoner that the currently-raised hand is the non-dominant hand, and otherwise, the currently-raised hand is the dominant hand. The threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands. When the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise. When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone. When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is raised.
  • In an embodiment, as shown in FIGS. 4 and 5 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
  • Specifically, the reasoning process includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105° (that is, the user is most likely looking down at the screen), it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, then tit is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
  • In an embodiment, as shown in FIGS. 5, 6 and 8 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • Specifically, the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • Specifically, as shown in FIG. 9 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. The head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion. The initial pose of the head is a straight sight. The head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left. The head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
  • Specifically, as shown in FIG. 7 , the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
  • More specifically, according to the pressure altitude of the mobile phone with respect to the ground, it is reasoned that the user is in a standing pose. In the case of continuous displacement change and no shaking and vibration, it is reasoned by the reasoner for leg motion that the user is in a travelling pose. When the mobile phone is under shaking and vibration, and the displacement velocity is within the walking speed range, it is inferred by the reasoner for leg motion that the user is in a walking pose.
      • (S5) The preferred poses of individual parts of the human skeleton are merged to generate a plurality of initial virtual skeletons.
      • (S6) The plurality of initial virtual skeletons are subjected to a predetermined human mechanics constraint to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
  • Specifically, the predetermined human mechanics constraint is performed through the following steps.
      • (S6.1) A predetermined number of unconventional human pose images that meet predetermined requirements are acquired.
      • (S6.2) 3D coordinates and pose data of virtual skeletal location points in Euclidean space are extracted based on the predetermined number of unconventional human pose images, where the origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e. the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
      • (S6.3) Constraint tolerance of human natural engineering mechanics is corrected based on the 3D coordinates and pose data obtained in step (S6.2);
  • The human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending of the human body, coherence and connection of physiological structures of the human body and joint bending.
  • More specifically, step (S5) is performed by the following step. The plurality of initial virtual skeletons are subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
      • (S7) The screened overall virtual skeletons are reasoned with the overall skeletal reasoner to obtain a dynamic twin virtual pose in real time. The dynamic twin virtual pose is presented in the form of skeleton animation time-series collection.
  • Specifically, the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
  • This application also provides a twin pose detection system based on interactive indirect inference, which includes a module M1, a module M2, a module M3, and a module M4.
  • The module M1 is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; where the plurality of sensors include a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone.
  • Specifically, the sensors of the mobile phone include the nine-axis gyroscope, the acceleration sensor, the speed sensor, the infrared distance sensor, the touch sensor and the sensor capable of obtaining program operation reference information of the mobile phone.
  • The data set by the sensors of the mobile phone includes a rotation angle of the mobile phone obtained by a nine-axis gyroscope, a horizontal movement acceleration of the mobile phone obtained by an acceleration sensor, a horizontal movement speed of the mobile phone obtained by a speed sensor, an altitude of the mobile phone obtained by infrared distance sensor, status information about whether a screen of the mobile phone is clicked obtained by a touch sensor, and a state of use of the mobile phone discriminated by a sensor capable of obtaining the reference information about program operation (such as non-clicked while being used or watched).
  • Specifically, the poses of individual parts of human skeleton include hand poses, arm poses, torso poses, head poses, and leg poses.
  • The hand poses include a raise of the mobile phone by the left hand, a raise of the mobile phone by the right hand, and a raise of the mobile phone by both the left hand and the right hand.
  • The arm poses include a raised-arm pose and a dropped-arm pose.
  • The torso poses include an upright pose, a sitting pose, a squatting pose, and a lying pose, further include a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose, and a downward facing lying pose.
  • The head poses include a looking-straight pose, a looking-left pose, a looking-right pose, a looking-up pose, and a looking-down pose.
  • The leg poses include a walking pose, a travelling pose, a sitting pose, a standing pose, and a lying pose.
  • The module M2 is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types.
  • Specifically, the preferred way of using the mobile phone include using the left hand as a dominant hand and using the right hand as a dominant hand.
  • The reasoners include a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
  • The module M3 is configured to train the overall skeletal reasoner during the human-computer interaction, and obtain the corresponding probabilities by the overall skeletal reasoner based on the input overall skeletal pose.
  • The module M4 is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone.
  • In an embodiment, as shown in FIGS. 3 and 4 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, and status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration. Then whether the currently raised is left hand, right hand or both hands is reasoned by the corresponding reasoner for left-hand motion and the reasoner for right-hand motion.
  • Specifically, the reasoning process includes the following steps. There is no horizontal and vertical displacement. The gyroscope detects that the angle of the screen display plane on the vertical ground is ±15°. The mobile phone is not at sea or in an aircraft from the positioning information. The mobile phone is continuously on the lighted screen and not in touch, and there is a continuous amount of vibration during the process. Then the reasoner for left-hand motion and the reasoner for right-hand motion reason that the currently lifted hand is the infrequently used hand, and otherwise, the currently lifted hand is the frequently used hand. The threshold for determining the quantitative vibration is determined by initializing the learning of operational preferences of the user, during which the user is allowed to hold the screen continuously to detect the difference in the habitual shaking of the hands. When the plurality of sensors of the mobile phone fail to detect the touch operation, there is a minimal amount of micro-vibration and it is not sensor noise. When the mobile phone is detected in a landscape mode, then it is reasoned by the hand reasoner that the both hands are raised to hold the mobile phone. When the mobile phone is detected in a portrait mode, it is reasoned by the hand reasoner that the single commonly used hand is lifted.
  • In an embodiment, as shown in FIGS. 4 and 5 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone experiences certain continuous vibration, the pressure altitude of the mobile phone relative to the ground (an adult of normal height is taken as an example herein), and the angle between the normal of the screen plane and the ground. Then whether the current arm is in a raised or lowered position is reasoned by the corresponding preferred reasoner for left-arm motion and the reasoner for right-arm motion.
  • Specifically, the reasoning includes the following steps. If the angle between the normal of the screen plane and the ground is more than 105°, and the user is looking down at the screen, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a dropped pose. If the angle between the normal of the screen plane and the ground is less than 75°, it is reasoned by the reasoner for left-arm motion and/or the reasoner for right-arm motion that the arm is in a raised pose.
  • In an embodiment, as shown in FIGS. 5, 6 and 8 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned by the corresponding preferred reasoner for torso motion to be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a backward leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • Specifically, the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. Then the torso is reasoned to be in an upright pose, a sitting pose or a squatting pose according to the angle between the normal of the screen plane and the ground, and is further reasoned be in a forward leaning upright pose, an upright standing pose, a forward leaning sitting pose, a back leaning sitting pose, an upright sitting pose, a squatting pose, an upward facing lying pose or a downward facing lying pose.
  • Specifically, as shown in FIG. 9 , the data set acquired by the plurality of sensors includes the horizontal and vertical displacement, the rotation angle of the mobile phone detected by the gyroscope detection, the angle of the screen display plane on the vertical ground detected by the gyroscope detection, positioning information, status information about whether the screen of the mobile phone is clicked and whether the mobile phone has a certain amount of continuous vibration, the pressure altitude of the mobile phone from the ground, the angle between the normal of the screen plane and the ground, and the orientation of the screen. The head poses consistent with the torso poses are inferred by the corresponding preferred reasoner for head motion. The initial head pose is a straight sight. The head pose also includes a changing pose, which depends on the opposite direction of the sliding of the screen. When the screen slides to the right, the sight turns towards the left. The head movement is currently considered to be synchronized with the eye movement, so the sight towards the left is equivalent to the head pose being towards the left.
  • Specifically, as shown in FIG. 7 , the data set acquired by the plurality of sensors includes displacement variation, displacement velocity, vibration of the mobile phone and the pressure altitude of the mobile phone from the ground. Then the leg pose is reasoned by the corresponding preferred leg reasoner.
  • More specifically, according to the pressure altitude of the mobile phone from the ground, to reason a standing pose. If the displacement changes continuous without vibration, the travelling pose is reasoned by the leg reasoner. When the mobile phone is shaking and vibrating, and the displacement velocity is within the walking speed range, then the walking pose is inferred by the leg reasoner.
  • The module M5 is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons.
  • The module M6 is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons.
  • The predetermined body mechanics constraint is performed by a module M6.1, a module M6.2, and a module M6.3.
  • The module M6.1 is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements.
  • The module M6.2 is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images. The origin of the 3D coordinates in Euclidean space of the virtual skeleton positioning points is the original position where the reasoning of the virtual skeleton is started, i.e., the 3D coordinate system in Euclidean space of the virtual skeleton positioning points is the reasoned 3D coordinate system of the virtual skeleton.
  • The module M6.3 is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data.
  • The human natural engineering mechanics is in accordance with natural category of physiological actions, and includes physiological bending, coherence and connection of physiological structures and joint bending.
  • More specifically, the module M5 is configured to allow the plurality of initial virtual skeletons to be subjected to constraint combination and normalization to obtain the plurality of overall virtual skeletons conforming to the constraints.
      • (S7) The screened overall virtual skeletons are reasoned with the overall skeletal reasoner to obtain a dynamic twin virtual pose in real time.
  • Specifically, the screened overall virtual skeletons are subjected to weighting by the overall skeletal reasoner, and the one with the highest weight is selected as the twin virtual pose.
  • It is known to those skilled in the art that, the system, device and individual modules provided herein can be implemented in purely computer-readable program code. Besides, it is possible to logically program the method steps such that the system, device and individual modules provided herein can be implemented in the form of logic gates, switches, special integrated circuits, programmable logic controllers and embedded microcontrollers. Therefore, the system, device and the individual modules provided herein can be considered as a hardware component and the modules included therein for implementing the various programs can be considered as structures within the hardware component. The modules for implementing the various functions can also be considered as structures that can be both software programs for implementing the method and structures within the hardware component.
  • Described above are specific embodiments of the present disclosure. It should be understood that the disclosure is not limited to the particular embodiments described above, and various variations or modifications made by a person skilled in the art without departing from the spirit and scope of the disclosure shall fall within the scope of the disclosure defined by the appended claims. The embodiments of the present application and the features therein may be combined with each other in any way without contradiction.

Claims (10)

What is claimed is:
1. A twin pose detection method based on interactive indirect inference, comprising:
(S1) acquiring, by a plurality of sensors on a mobile phone, a data set in real time;
and obtaining poses of individual parts of a skeleton of an object by reasoning using a plurality of reasoners on individual parts of the skeleton based on the data set and a preferred way of the object for using the mobile phone;
(S2) merging the poses obtained in step (S1) to generate a plurality of initial virtual skeletons;
(S3) under a predetermined human mechanics constraint, obtaining a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
(S4) screening a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and obtaining a dynamic twin virtual pose in real time by reasoning on the predetermined number of overall virtual skeletons using an overall skeleton reasoner;
wherein step (S1) comprises:
(S1.1) during human-computer interaction, constructing a training set based on the data set and the poses of individual parts of the skeleton; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
(S1.2) labeling and classifying the training set based on the preferred way; training the plurality of reasoners based on training sets in different multi-modal types; wherein
the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
2. The twin pose detection method of claim 1, wherein the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a reasoner for left-arm motion, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
3. The twin pose detection method of claim 1, wherein the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
the infrared distance sensor is configured to acquire an altitude of the mobile phone;
the touch sensor is configured to acquire status information about whether a screen of the mobile phone is clicked; and
the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state the mobile phone.
4. The twin pose detection method of claim 1, wherein in step (S3), the predetermined human mechanics constraint is performed through steps of:
(S3.1) acquiring a predetermined number of unconventional human pose images that meet predetermined requirements;
(S3.2) extracting three-dimensional (3D) coordinates and pose data of virtual skeletal location points in Euclidean space based on the predetermined number of unconventional human pose images; and
(S3.3) correcting constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data obtained in step (S3.2);
wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
5. The twin pose detection method of claim 4, wherein step (S3) is performed by a step of:
under the predetermined human mechanics constraint, subjecting the plurality of initial virtual skeletons to normalization to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
6. A twin pose detection system based on interactive indirect inference, comprising:
a first module;
a second module;
a third module; and
a fourth module;
wherein the first module is configured to perform reasoning to obtain poses of individual parts of a skeleton of an object using a plurality of reasoners on individual parts of the skeleton according to a preferred way of an object for using a mobile phone and a data set acquired in real time by a plurality of sensors on the mobile phone;
the second module is configured to merge the poses of individual parts of the skeleton to generate a plurality of initial virtual skeletons;
the third module is configured, under a predetermined human mechanics constraint, to obtain a plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint from the plurality of initial virtual skeletons; and
the fourth module is configured to screen a predetermined number of overall virtual skeletons from the plurality of overall virtual skeletons, and perform reasoning on the predetermined number of overall virtual skeletons by using an overall skeleton reasoner to obtain a dynamic twin virtual pose in real time;
the first module comprises a first submodule and a second submodule;
wherein the first submodule is configured to construct a training set based on the data set and the poses of individual parts of the skeleton during human-computer interaction; wherein the plurality of sensors comprise a nine-axis gyroscope, an acceleration sensor, a speed sensor, an infrared distance sensor, a touch sensor and a sensor capable of obtaining program operation reference information of the mobile phone; and
the second submodule is configured to label and classify the training set based on the preferred way, and train the plurality of reasoners based on training sets in different multi-modal types; wherein the preferred way of the object for using the mobile phone comprises using left hand as a dominant hand and using right hand as a dominant hand.
7. The twin pose detection system of claim 6, wherein the plurality of reasoners comprise a reasoner for left-hand motion, a reasoner for right-hand motion, a left-arm reasoner, a reasoner for right-arm motion, a reasoner for torso motion, a reasoner for head motion, a reasoner for left-leg motion and a reasoner for right-leg motion.
8. The twin pose detection system of claim 6, wherein the nine-axis gyroscope is configured to acquire a rotation angle of the mobile phone;
the acceleration sensor is configured to acquire a horizontal movement acceleration of the mobile phone;
the speed sensor is configured to acquire a horizontal movement speed of the mobile phone;
the infrared distance sensor is configured to acquire an altitude of the mobile phone;
the touch sensor is configured to acquire status information on whether a screen of the mobile phone is clicked; and
the sensor capable of obtaining the program operation reference information of the mobile phone is configured to determine a use state of the mobile phone.
9. The twin pose detection system of claim 6, wherein the predetermined body mechanics constraint is performed by using a fifth module, a sixth module, and a seventh module;
the fifth module is configured to acquire a plurality of unconventional human pose images that meet predetermined requirements;
the sixth module is configured to extract 3D coordinates and pose data of virtual skeletal location points in Euclidean space based on the plurality of unconventional human pose images; and
the seventh module is configured to correct constraint tolerance of human natural engineering mechanics based on the 3D coordinates and pose data;
wherein the human natural engineering mechanics is in accordance with natural category of physiological actions, and comprises physiological bending, coherence and connection of physiological structures and joint bending.
10. The twin pose detection system of claim 6, wherein the third module is configured to perform normalization on the plurality of initial virtual skeletons under the predetermined human mechanics constraint to obtain the plurality of overall virtual skeletons satisfying the predetermined human mechanics constraint.
US18/339,186 2022-06-23 2023-06-21 Twin pose detection method and system based on interactive indirect inference Active US11809616B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210715227.9A CN114821006B (en) 2022-06-23 2022-06-23 Twin state detection method and system based on interactive indirect reasoning
CN202210715227.9 2022-06-23

Publications (2)

Publication Number Publication Date
US20230333633A1 true US20230333633A1 (en) 2023-10-19
US11809616B1 US11809616B1 (en) 2023-11-07

Family

ID=82522065

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/339,186 Active US11809616B1 (en) 2022-06-23 2023-06-21 Twin pose detection method and system based on interactive indirect inference

Country Status (2)

Country Link
US (1) US11809616B1 (en)
CN (1) CN114821006B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116617663A (en) * 2022-02-08 2023-08-22 腾讯科技(深圳)有限公司 Action instruction generation method and device, storage medium and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115590483B (en) * 2022-10-12 2023-06-30 深圳市联代科技有限公司 Smart phone with health measurement system
CN117441980B (en) * 2023-12-20 2024-03-22 武汉纺织大学 Intelligent helmet system and method based on intelligent computation of multi-sensor information

Citations (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100197390A1 (en) * 2009-01-30 2010-08-05 Microsoft Corporation Pose tracking pipeline
US20110085705A1 (en) * 2009-05-01 2011-04-14 Microsoft Corporation Detection of body and props
US20110304557A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Indirect User Interaction with Desktop using Touch-Sensitive Control Surface
US20120225719A1 (en) * 2011-03-04 2012-09-06 Mirosoft Corporation Gesture Detection and Recognition
US20130028517A1 (en) * 2011-07-27 2013-01-31 Samsung Electronics Co., Ltd. Apparatus, method, and medium detecting object pose
US20130069931A1 (en) * 2011-09-15 2013-03-21 Microsoft Corporation Correlating movement information received from different sources
US20130077820A1 (en) * 2011-09-26 2013-03-28 Microsoft Corporation Machine learning gesture detection
US20130188081A1 (en) * 2012-01-24 2013-07-25 Charles J. Kulas Handheld device with touch controls that reconfigure in response to the way a user operates the device
US20130278501A1 (en) * 2012-04-18 2013-10-24 Arb Labs Inc. Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis
US20140035805A1 (en) * 2009-04-02 2014-02-06 David MINNEN Spatial operating environment (soe) with markerless gestural control
US20140195936A1 (en) * 2013-01-04 2014-07-10 MoneyDesktop, Inc. a Delaware Corporation Presently operating hand detector
US20140325373A1 (en) * 2009-04-02 2014-10-30 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US20150032408A1 (en) * 2012-03-08 2015-01-29 Commissariat Al'energie Atomique Et Aux Energies Alternatives System for capturing movements of an articulated structure
US20150077336A1 (en) * 2013-09-13 2015-03-19 Nod, Inc. Methods and Apparatus for Using the Human Body as an Input Device
US20150154447A1 (en) * 2013-12-04 2015-06-04 Microsoft Corporation Fusing device and image motion for user identification, tracking and device association
US9094576B1 (en) * 2013-03-12 2015-07-28 Amazon Technologies, Inc. Rendered audiovisual communication
US9144744B2 (en) * 2013-06-10 2015-09-29 Microsoft Corporation Locating and orienting device in space
US20150355462A1 (en) * 2014-06-06 2015-12-10 Seiko Epson Corporation Head mounted display, detection device, control method for head mounted display, and computer program
US20160195940A1 (en) * 2015-01-02 2016-07-07 Microsoft Technology Licensing, Llc User-input control device toggled motion tracking
US20170118318A1 (en) * 2015-10-21 2017-04-27 Le Holdings (Beijing) Co., Ltd. Mobile Phone
US20170273639A1 (en) * 2014-12-05 2017-09-28 Myfiziq Limited Imaging a Body
US20170308165A1 (en) * 2016-04-21 2017-10-26 ivSystems Ltd. Devices for controlling computers based on motions and positions of hands
US20180020978A1 (en) * 2016-07-25 2018-01-25 Patrick Kaifosh System and method for measuring the movements of articulated rigid bodies
US20190080252A1 (en) * 2017-04-06 2019-03-14 AIBrain Corporation Intelligent robot software platform
US20190114836A1 (en) * 2017-10-13 2019-04-18 Fyusion, Inc. Skeleton-based effects and background replacement
US20190167059A1 (en) * 2017-12-06 2019-06-06 Bissell Inc. Method and system for manual control of autonomous floor cleaner
US20190197852A1 (en) * 2017-12-27 2019-06-27 Kerloss Sadek Smart entry point spatial security system
US10416755B1 (en) * 2018-06-01 2019-09-17 Finch Technologies Ltd. Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system
US20190339766A1 (en) * 2018-05-07 2019-11-07 Finch Technologies Ltd. Tracking User Movements to Control a Skeleton Model in a Computer System
US10796104B1 (en) * 2019-07-03 2020-10-06 Clinc, Inc. Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system
US20210072548A1 (en) * 2019-09-10 2021-03-11 Seiko Epson Corporation Display system, control program for information processing device, method for controlling information processing device, and display device
US20210233273A1 (en) * 2020-01-24 2021-07-29 Nvidia Corporation Determining a 3-d hand pose from a 2-d image using machine learning
US20210241529A1 (en) * 2020-02-05 2021-08-05 Snap Inc. Augmented reality session creation using skeleton tracking
US20210271863A1 (en) * 2020-02-28 2021-09-02 Fujitsu Limited Behavior recognition method, behavior recognition device, and computer-readable recording medium
US11210834B1 (en) * 2015-09-21 2021-12-28 TuringSense Inc. Article of clothing facilitating capture of motions
US20210402942A1 (en) * 2020-06-29 2021-12-30 Nvidia Corporation In-cabin hazard prevention and safety control system for autonomous machine applications
US11232294B1 (en) * 2017-09-27 2022-01-25 Amazon Technologies, Inc. Generating tracklets from digital imagery
US11249556B1 (en) * 2020-11-30 2022-02-15 Microsoft Technology Licensing, Llc Single-handed microgesture inputs
US20220245812A1 (en) * 2019-08-06 2022-08-04 The Johns Hopkins University Platform to detect patient health condition based on images of physiological activity of a patient
US20220258049A1 (en) * 2021-02-16 2022-08-18 Pritesh KANANI System and method for real-time calibration of virtual apparel using stateful neural network inferences and interactive body measurements
US20220410000A1 (en) * 2019-07-09 2022-12-29 Sony Interactive Entertainment Inc. Skeleton model updating apparatus, skeleton model updating method, and program
US20230127549A1 (en) * 2020-06-25 2023-04-27 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, mobile device, head-mounted display, and system for estimating hand pose

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565777B2 (en) * 2016-09-30 2020-02-18 Sony Interactive Entertainment Inc. Field of view (FOV) throttling of virtual reality (VR) content in a head mounted display
FR3068236B1 (en) * 2017-06-29 2019-07-26 Wandercraft METHOD FOR SETTING UP AN EXOSQUELET
US11281293B1 (en) * 2019-04-30 2022-03-22 Facebook Technologies, Llc Systems and methods for improving handstate representation model estimates
CN108876815B (en) 2018-04-28 2021-03-30 深圳市瑞立视多媒体科技有限公司 Skeleton posture calculation method, character virtual model driving method and storage medium
CN108830150B (en) 2018-05-07 2019-05-28 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN109885163A (en) 2019-02-18 2019-06-14 广州卓远虚拟现实科技有限公司 A kind of more people's interactive cooperation method and systems of virtual reality
CN110472481B (en) * 2019-07-01 2024-01-05 华南师范大学 Sleeping gesture detection method, device and equipment
CN110495889B (en) 2019-07-04 2022-05-27 平安科技(深圳)有限公司 Posture evaluation method, electronic device, computer device, and storage medium
CN110502980B (en) * 2019-07-11 2021-12-03 武汉大学 Method for identifying scene behaviors of pedestrians playing mobile phones while crossing roads
CN110675474B (en) 2019-08-16 2023-05-02 咪咕动漫有限公司 Learning method for virtual character model, electronic device, and readable storage medium
CN111311714A (en) 2020-03-31 2020-06-19 北京慧夜科技有限公司 Attitude prediction method and system for three-dimensional animation
CN112132955B (en) * 2020-09-01 2024-02-06 大连理工大学 Method for constructing digital twin body of human skeleton
EP4224368A4 (en) * 2020-09-29 2024-05-22 Sony Semiconductor Solutions Corporation Information processing system, and information processing method
CN112884780A (en) 2021-02-06 2021-06-01 罗普特科技集团股份有限公司 Estimation method and system for human body posture
CN112926550A (en) 2021-04-15 2021-06-08 南京蓝镜数字科技有限公司 Human-computer interaction method and device based on three-dimensional image human body posture matching
CN113158459A (en) 2021-04-20 2021-07-23 浙江工业大学 Human body posture estimation method based on visual and inertial information fusion
CN113191324A (en) 2021-05-24 2021-07-30 清华大学 Pedestrian behavior intention prediction method based on multi-task learning
CN113610969B (en) 2021-08-24 2024-03-08 国网浙江省电力有限公司双创中心 Three-dimensional human body model generation method and device, electronic equipment and storage medium

Patent Citations (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100197390A1 (en) * 2009-01-30 2010-08-05 Microsoft Corporation Pose tracking pipeline
US20140325373A1 (en) * 2009-04-02 2014-10-30 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US20140035805A1 (en) * 2009-04-02 2014-02-06 David MINNEN Spatial operating environment (soe) with markerless gestural control
US20110085705A1 (en) * 2009-05-01 2011-04-14 Microsoft Corporation Detection of body and props
US20110304557A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Indirect User Interaction with Desktop using Touch-Sensitive Control Surface
US20120225719A1 (en) * 2011-03-04 2012-09-06 Mirosoft Corporation Gesture Detection and Recognition
US20130028517A1 (en) * 2011-07-27 2013-01-31 Samsung Electronics Co., Ltd. Apparatus, method, and medium detecting object pose
US20130069931A1 (en) * 2011-09-15 2013-03-21 Microsoft Corporation Correlating movement information received from different sources
US20130077820A1 (en) * 2011-09-26 2013-03-28 Microsoft Corporation Machine learning gesture detection
US20130188081A1 (en) * 2012-01-24 2013-07-25 Charles J. Kulas Handheld device with touch controls that reconfigure in response to the way a user operates the device
US20150032408A1 (en) * 2012-03-08 2015-01-29 Commissariat Al'energie Atomique Et Aux Energies Alternatives System for capturing movements of an articulated structure
US20130278501A1 (en) * 2012-04-18 2013-10-24 Arb Labs Inc. Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis
US20140195936A1 (en) * 2013-01-04 2014-07-10 MoneyDesktop, Inc. a Delaware Corporation Presently operating hand detector
US9094576B1 (en) * 2013-03-12 2015-07-28 Amazon Technologies, Inc. Rendered audiovisual communication
US9144744B2 (en) * 2013-06-10 2015-09-29 Microsoft Corporation Locating and orienting device in space
US20150077336A1 (en) * 2013-09-13 2015-03-19 Nod, Inc. Methods and Apparatus for Using the Human Body as an Input Device
US20150154447A1 (en) * 2013-12-04 2015-06-04 Microsoft Corporation Fusing device and image motion for user identification, tracking and device association
US20150355462A1 (en) * 2014-06-06 2015-12-10 Seiko Epson Corporation Head mounted display, detection device, control method for head mounted display, and computer program
US20170273639A1 (en) * 2014-12-05 2017-09-28 Myfiziq Limited Imaging a Body
US20160195940A1 (en) * 2015-01-02 2016-07-07 Microsoft Technology Licensing, Llc User-input control device toggled motion tracking
US11210834B1 (en) * 2015-09-21 2021-12-28 TuringSense Inc. Article of clothing facilitating capture of motions
US20170118318A1 (en) * 2015-10-21 2017-04-27 Le Holdings (Beijing) Co., Ltd. Mobile Phone
US20170308165A1 (en) * 2016-04-21 2017-10-26 ivSystems Ltd. Devices for controlling computers based on motions and positions of hands
US20180020978A1 (en) * 2016-07-25 2018-01-25 Patrick Kaifosh System and method for measuring the movements of articulated rigid bodies
US20190080252A1 (en) * 2017-04-06 2019-03-14 AIBrain Corporation Intelligent robot software platform
US11232294B1 (en) * 2017-09-27 2022-01-25 Amazon Technologies, Inc. Generating tracklets from digital imagery
US20190114836A1 (en) * 2017-10-13 2019-04-18 Fyusion, Inc. Skeleton-based effects and background replacement
US20190167059A1 (en) * 2017-12-06 2019-06-06 Bissell Inc. Method and system for manual control of autonomous floor cleaner
US20190197852A1 (en) * 2017-12-27 2019-06-27 Kerloss Sadek Smart entry point spatial security system
US20190339766A1 (en) * 2018-05-07 2019-11-07 Finch Technologies Ltd. Tracking User Movements to Control a Skeleton Model in a Computer System
US10416755B1 (en) * 2018-06-01 2019-09-17 Finch Technologies Ltd. Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system
US10796104B1 (en) * 2019-07-03 2020-10-06 Clinc, Inc. Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system
US20220410000A1 (en) * 2019-07-09 2022-12-29 Sony Interactive Entertainment Inc. Skeleton model updating apparatus, skeleton model updating method, and program
US20220245812A1 (en) * 2019-08-06 2022-08-04 The Johns Hopkins University Platform to detect patient health condition based on images of physiological activity of a patient
US20210072548A1 (en) * 2019-09-10 2021-03-11 Seiko Epson Corporation Display system, control program for information processing device, method for controlling information processing device, and display device
US20210233273A1 (en) * 2020-01-24 2021-07-29 Nvidia Corporation Determining a 3-d hand pose from a 2-d image using machine learning
US20210241529A1 (en) * 2020-02-05 2021-08-05 Snap Inc. Augmented reality session creation using skeleton tracking
US20210271863A1 (en) * 2020-02-28 2021-09-02 Fujitsu Limited Behavior recognition method, behavior recognition device, and computer-readable recording medium
US20230127549A1 (en) * 2020-06-25 2023-04-27 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, mobile device, head-mounted display, and system for estimating hand pose
US20210402942A1 (en) * 2020-06-29 2021-12-30 Nvidia Corporation In-cabin hazard prevention and safety control system for autonomous machine applications
US11249556B1 (en) * 2020-11-30 2022-02-15 Microsoft Technology Licensing, Llc Single-handed microgesture inputs
US20220258049A1 (en) * 2021-02-16 2022-08-18 Pritesh KANANI System and method for real-time calibration of virtual apparel using stateful neural network inferences and interactive body measurements

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116617663A (en) * 2022-02-08 2023-08-22 腾讯科技(深圳)有限公司 Action instruction generation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114821006A (en) 2022-07-29
CN114821006B (en) 2022-09-20
US11809616B1 (en) 2023-11-07

Similar Documents

Publication Publication Date Title
US11809616B1 (en) Twin pose detection method and system based on interactive indirect inference
CN110930483B (en) Role control method, model training method and related device
JP7178396B2 (en) Method and computer system for generating data for estimating 3D pose of object included in input image
CN112906604B (en) Behavior recognition method, device and system based on skeleton and RGB frame fusion
CN114399826A (en) Image processing method and apparatus, image device, and storage medium
KR20220025023A (en) Animation processing method and apparatus, computer storage medium, and electronic device
CN107688391A (en) A kind of gesture identification method and device based on monocular vision
CN113496507A (en) Human body three-dimensional model reconstruction method
CN110135249A (en) Human bodys' response method based on time attention mechanism and LSTM
CN115933868B (en) Three-dimensional comprehensive teaching field system of turnover platform and working method thereof
CN101520902A (en) System and method for low cost motion capture and demonstration
CN107392131A (en) A kind of action identification method based on skeleton nodal distance
US10970849B2 (en) Pose estimation and body tracking using an artificial neural network
CN1648840A (en) Head carried stereo vision hand gesture identifying device
CN107621880A (en) A kind of robot wheel chair interaction control method based on improvement head orientation estimation method
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network
Huang et al. A review of 3D human body pose estimation and mesh recovery
CN115798042A (en) Escalator passenger abnormal behavior data construction method based on digital twins
Zhang et al. Emotion recognition from body movements with as-lstm
CN116449947B (en) Automobile cabin domain gesture recognition system and method based on TOF camera
CN113673494B (en) Human body posture standard motion behavior matching method and system
CN116310102A (en) Three-dimensional reconstruction method, terminal and medium of transparent object image based on deep learning
CN114202606A (en) Image processing method, electronic device, storage medium, and computer program product
Liang et al. Interactive Experience Design of Traditional Dance in New Media Era Based on Action Detection
Gao The Application of Virtual Technology Based on Posture Recognition in Art Design Teaching

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: MICROENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: MICROENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE