Nothing Special   »   [go: up one dir, main page]

WO2018058419A1 - Two-dimensional image based human body joint point positioning model construction method, and positioning method - Google Patents

Two-dimensional image based human body joint point positioning model construction method, and positioning method Download PDF

Info

Publication number
WO2018058419A1
WO2018058419A1 PCT/CN2016/100763 CN2016100763W WO2018058419A1 WO 2018058419 A1 WO2018058419 A1 WO 2018058419A1 CN 2016100763 W CN2016100763 W CN 2016100763W WO 2018058419 A1 WO2018058419 A1 WO 2018058419A1
Authority
WO
WIPO (PCT)
Prior art keywords
component
human body
model
image
sample set
Prior art date
Application number
PCT/CN2016/100763
Other languages
French (fr)
Chinese (zh)
Inventor
黄凯奇
张俊格
付连锐
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2016/100763 priority Critical patent/WO2018058419A1/en
Publication of WO2018058419A1 publication Critical patent/WO2018058419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the invention relates to the field of image processing and pattern recognition technology, in particular to a method for constructing a two-dimensional image human joint point positioning model and a positioning method based on the same.
  • the two-dimensional image human joint point positioning method is divided into two major methods: joint point regression and component detection.
  • the method of node regression is easy to implement, but it has the following two disadvantages: First, because the joint point regression method requires a rectangular frame obtained by the human body detector as an input, when the human body has a large movement, the human body detector is caused. A false check occurred, causing subsequent joint point regression to fail. Second, because the position of the end joints such as wrists and ankles changes greatly, and the position of the joints such as the head and shoulders changes little, the method of global regression on the image area may cause an under-fitting of the end joint points, thus affecting The positioning accuracy of the end joint points. In order to improve the second shortcoming, the third part of the literature divides the human body into upper, middle and lower regions, and returns the joint points of the three regions separately, but ignores the first shortcoming.
  • the first part of the image is extracted by means of sliding window scanning, and the parts are classified, and then the structural model is used to constrain the relative positional relationship between the parts, thereby detecting the optimal human body.
  • the component is configured and the coordinates of the area where the components are located and the corresponding joint points are obtained.
  • Two-dimensional image body parts The detection method involves two key technologies, one is the local feature representation of the component, and the other is the structural modeling of the human body.
  • the model structure used includes a tree structure model and a band graph model.
  • Most of the existing anatomical modeling methods use a tree structure model, see Document 4 and Document 6.
  • the tree structure model is simple in structure and convenient for fast reasoning, it is difficult to model complex occlusion relationships, especially self-occlusion relationships.
  • the biggest difference between the band graph model and the tree structure model is the introduction of loops in the model structure.
  • Document 7 and Document 8 are the band graph models used.
  • the band graph model enhances the model's expressive ability and robustness to occlusion, its reasoning complexity is high, which limits its application in anatomical modeling.
  • a method for constructing a two-dimensional image human joint point positioning model is provided in order to solve the technical problem of how to accurately and robustly locate a human joint point in a two-dimensional image.
  • a positioning method based on the construction method is also provided.
  • a method for constructing a two-dimensional image human joint point positioning model comprising:
  • the body part local apparent model and the occlusion relationship model are determined as a two-dimensional image human joint point positioning model.
  • the constructing the body part local feature training sample set may specifically include:
  • the body part local feature training sample set is constructed by using an image region in which the body part is located and a category obtained by clustering the body part.
  • the constructing the human body component global configuration sample set may specifically include:
  • the sample set and the image area are utilized to form the body part global configuration sample set.
  • the constructing the deep convolutional neural network may specifically include:
  • the image area in which the component is located is used as an input to the deep convolutional neural network.
  • the obtaining the occlusion relationship graph model by using the partial appearance model of the human body component and the global configuration sample set of the human body component may specifically include:
  • the training obtains the weight corresponding to the relative position between the two human body components having the constraint relationship and the apparent feature weight coefficient of any human body component, thereby obtaining an occlusion relationship graph model.
  • a two-dimensional image human joint point positioning method based on the above construction method is further provided, and the positioning method includes:
  • the occlusion graph model Based on the local appearance feature of the image to be detected, the occlusion graph model is utilized, and an optimal body part configuration is obtained according to the following formula:
  • the center position of the body part area in the optimal body part configuration is determined as the joint point position at the body part.
  • the utilizing the local appearance model of the human body component to extract the local appearance feature of the image to be detected may specifically include:
  • Each of the partial image regions is used as an input of the partial appearance model of the human body component to obtain a partial appearance feature of the image to be detected.
  • Embodiments of the present invention provide a method for constructing a two-dimensional image human joint point positioning model and a two-dimensional image human joint point positioning method based on the construction method.
  • the construction method may include: constructing a human component local feature training sample set and a human body component global configuration sample set by using a color image marked with a human joint point position coordinate and an occlusion state; constructing a deep convolutional neural network, and utilizing the human body component
  • the local feature training sample set is used to train the deep convolutional neural network to obtain the local apparent model of the human body component;
  • the occlusion relationship diagram model is obtained by using the local component appearance model of the human body component and the global configuration sample set of the human body component; the partial appearance model of the human body component is obtained.
  • the occlusion map model is determined as a two-dimensional image human joint point location model. It can be seen that the present invention can simultaneously model the self-occlusion and its occlusion, and learn the occlusion relationship between the human body components and between the components and the background; by combining the deep convolutional neural network feature extraction with the graph model structure, the present invention Robust positioning of human joint points in large motion poses and local occlusion situations.
  • the model structure adopted by the present invention can not only model the relationship between physically connected components, but also model the spatial context relationship between left and right limb components that are not directly connected, thereby enhancing the occlusion of the Lu Great.
  • the invention combines the local apparent model of the human body component and the structural model of the figure, can effectively overcome the adverse effects brought by the large-scale action and the partial occlusion, and improves the robustness of the joint position of the human body in the two-dimensional image.
  • FIG. 1 is a schematic flow chart of a method for constructing a two-dimensional image human joint point positioning model according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of constructing a set of local feature training samples of a human body component according to an embodiment of the present invention
  • FIG. 3 is a flow chart showing the construction of a global component sample set of a human body component according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a constructed deep convolutional neural network according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an occlusion diagram model according to an embodiment of the present invention.
  • FIG. 6 is a flow chart showing a method for positioning a two-dimensional image human joint point according to an embodiment of the invention.
  • the basic idea of the embodiment of the present invention is to model the occlusion relationship of the human body component in both the local feature representation of the component and the structural modeling of the human body.
  • an embodiment of the present invention provides a method for constructing a two-dimensional image human joint point positioning model.
  • the construction method can be implemented by step S100 to step S130. among them:
  • the above process of constructing a body part local feature training sample set can be implemented by the following preferred methods:
  • S103 Construct a human component local feature training sample set by using an image region in which the human body component is located and a category obtained by clustering the body component.
  • Step a Calculate the relative position ⁇ ij of the i-th component relative to its parent node component j. Where i and j take a positive integer.
  • Step b Clustering the relative positions ⁇ ij of all images using k-means.
  • the number of categories can be taken as 13.
  • Step c constructing a body part local feature training sample set by using the image area Ii in which the i-th component is located and the category ti (ti is the category of the component i) obtained by clustering the i-th component.
  • the above process of constructing a human component global configuration sample set can be implemented by the following preferred methods:
  • S105 Determine a sample label of a human body part.
  • Step d determining that the sample label of the i-th component is (xi, yi, oi, ti), where xi represents the abscissa of the component i; yi represents the ordinate of the component i; oi represents the occlusion state of the component i, and its value 0, 1 and 2, where 0 means visible, 1 means occluded by other parts of the human body, 2 means occluded by the background; ti means the category of the part i.
  • Step e Determine the image area corresponding to all the components.
  • Step f Using the sample tag and the image area to form a human component global configuration sample set.
  • S110 Constructing a deep convolutional neural network, training a deep convolutional neural network by using a local feature training sample set of a human body component, and obtaining a partial apparent model of the human body component.
  • the LeNet network structure was used to implement the training. Among them, the input of the LeNet network structure is a grayscale image; the basic unit is 3 convolution layers and 2 fully connected layers.
  • Embodiments of the present invention improve upon the above prior art.
  • constructing a deep convolutional neural network in this step can be accomplished by the following preferred means: determining the basic unit of the deep convolutional neural network as 5 convolutional layers and 3 fully connected layers.
  • the image area in which the component is located i.e., the color local area image
  • the depth convolutional neural network can output the probability of being a component class.
  • the probability of the component category indicates the probability that the image region belongs to component i.
  • FIG. 4 exemplarily shows a schematic diagram of a deep convolutional neural network constructed by an embodiment of the present invention.
  • the training process in this step can include forward propagation and back propagation processes.
  • the forward propagation process performs a convolution operation and a matrix multiplication operation on the image region where the color component is located layer by layer; the back propagation process transmits the error between the prediction error and the sample label by the gradient descent method layer by layer. And modify the parameters of the fully connected layer and the convolution layer.
  • the color component local area image may be scaled to 36 x 36 pixels as an input to the deep convolutional neural network.
  • the local apparent model parameters of the human body component in this step may be parameters of the convolutional layer and the fully connected layer neurons in the deep convolutional neural network.
  • the deep convolutional neural network is a supervised learning algorithm
  • the local apparent model of the human body component is obtained through supervised learning of the training samples, so no manual intervention is required.
  • the local apparent model of the human body component is realized by the deep convolutional neural network, it can make full use of a large number of training samples to fit the varied apparent features, and can also make the extracted component features more robust.
  • the feature extraction of deep convolutional neural network and the structure of the graph model can achieve robust positioning of human joint points in the case of large motion pose and local occlusion.
  • S120 Obtaining an occlusion relationship graph model by using a local appearance model of the human body component and a global configuration sample set of the human body component.
  • the step may specifically include:
  • S121 Establish a connection relationship between the components of the human body with a loop.
  • connection relationship between the various components of the human body By setting the connection relationship between the various components of the human body to have a loop-connected relationship, it is possible to model the occlusion relationship between the human body components and the occlusion relationship between the human body components and the background.
  • FIG. 5 exemplarily shows a schematic diagram of an occlusion diagram model.
  • the circle represents the 14 joint point parts of the human body, and each side represents the connection relationship between the various parts of the human body.
  • the connection relationship of the occlusion diagram model constructed by the embodiment of the present invention has a loop, that is, a loop diagram model.
  • the parameters of the graph structure model include the weight ⁇ ij corresponding to the relative position ⁇ ij between the components i and j in which the constraint relationship exists, and the corresponding apparent weight coefficient ⁇ i of the component i.
  • a structured support vector machine using the document IX for example: Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann and Yasemin Altun (2005), Large Margin Methods for Structured and Interdependent Output Variables, JMLR, Vol. 6, pages 1453-1484)
  • the parameters ⁇ ij and ⁇ i of the structural model are obtained by the dual coordinate descent method described in the literature. Wherein, if the occlusion state oi of the component i is 2, ⁇ i is set to 0, and the component i is blocked by the background.
  • the occlusion graph model constructed by the embodiment of the invention can not only express the occlusion relationship, but also has the reasoning complexity similar to the tree structure model. Moreover, since the deep convolutional neural network is a supervised learning algorithm, the occlusion map model is obtained by supervised learning through training samples, so no manual intervention is required.
  • S130 Determine a partial appearance model and an occlusion relationship model of the human body component as a two-dimensional image human joint point positioning model.
  • the embodiment of the present invention further provides a two-dimensional image human joint point positioning method.
  • the positioning method can be implemented by step S140 to step S170. among them:
  • S140 Acquire an image to be detected.
  • this step may include:
  • the probability pi of the partial image area image component i is obtained after 5 convolutional layers and 3 fully connected layers. Wherein, a larger pi indicates that the partial image area is more like the component i.
  • the probability pi obtained in this embodiment can be used as a partial appearance feature of the image to be detected for subsequent processing.
  • the predicted value of the joint point position at the component i can be obtained by the above formula (1) (xi*, yi*). (xi*, yi*) is the joint point of the component i positioned for this embodiment.
  • S170 Determine a center position of the body part area in the optimal body part configuration as a joint point position at the body part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A two-dimensional image based human body joint point positioning model construction method, and a positioning method based on the construction method. The construction method comprises: using a color image with human body joint point position coordinates and obstruction states being marked, to construct a human body part local characteristic training sample set and a human body part global formation sample set (S100); constructing a deep convolutional neural network, and using the human body part local characteristic training sample set to train the deep convolutional neural network, to obtain a human body part local appearance model (S110); obtaining an obstruction relationship diagram model by using the human body part local appearance model and the human body part global formation sample set (S120); and determining the human body part local appearance model and the obstruction relationship diagram model as a two-dimensional image human body joint point positioning model (S130). The present method solves the technical problem of how to accurately and robustly position human body joint points in a two-dimensional image.

Description

二维图像人体关节点定位模型的构建方法及定位方法Method and positioning method for two-dimensional image human joint point positioning model 技术领域Technical field
本发明涉及图像处理与模式识别技术领域,尤其是涉及一种二维图像人体关节点定位模型的构建方法以及基于该构建方法的定位方法。The invention relates to the field of image processing and pattern recognition technology, in particular to a method for constructing a two-dimensional image human joint point positioning model and a positioning method based on the same.
背景技术Background technique
在视频监控、手语识别、智能家居、人机交互、增强现实、图像检索、机器人等领域中,经常需要从二维图像中估计人体各个关节点的位置坐标。二维图像人体关节点定位在上述应用领域中起着关键的作用,蕴含着巨大的应用价值。在实际应用中,人体关节点定位的困难因素包括大尺度形变、视角变化、遮挡和复杂背景等。In the fields of video surveillance, sign language recognition, smart home, human-computer interaction, augmented reality, image retrieval, robotics, etc., it is often necessary to estimate the position coordinates of each joint point of the human body from the two-dimensional image. Two-dimensional image human joint point positioning plays a key role in the above application fields, and it has great application value. In practical applications, difficult factors in the positioning of human joint points include large-scale deformation, viewing angle changes, occlusion, and complex background.
目前,二维图像人体关节点定位方法分为关节点回归和部件检测两大类方法。At present, the two-dimensional image human joint point positioning method is divided into two major methods: joint point regression and component detection.
对于二维图像人体关节点回归方法,需要首先利用人体检测器来确定人体所在区域的位置和大小,然后在人体检测器确定的区域内提取图像特征,并采用回归的方法来预测人体关节点的坐标。相关内容可参见文献一和文献二。For the two-dimensional image human joint point regression method, it is necessary to first use the human body detector to determine the position and size of the area where the human body is located, and then extract the image features in the area determined by the human body detector, and use the regression method to predict the joint points of the human body. coordinate. For related content, please refer to Document 1 and Document 2.
关节点回归的方法易于实现,但其有以下两方面的缺点:第一,由于关节点回归方法需要用人体检测器得到的矩形框作为输入,当人体有大幅度动作时,会致使人体检测器发生误检,从而使后续的关节点回归失败。第二,由于手腕和脚腕等末端关节点位置变化较大而头部和肩部等关节位置变化较小,对图像区域全局回归的方法会使末端关节点出现欠拟合的情况,从而影响末端关节点的定位精度。文献三为了改进第二个缺点,将人体分为上中下三个区域,对这三个区域的关节点分别进行回归,但忽视了第一个缺点。The method of node regression is easy to implement, but it has the following two disadvantages: First, because the joint point regression method requires a rectangular frame obtained by the human body detector as an input, when the human body has a large movement, the human body detector is caused. A false check occurred, causing subsequent joint point regression to fail. Second, because the position of the end joints such as wrists and ankles changes greatly, and the position of the joints such as the head and shoulders changes little, the method of global regression on the image area may cause an under-fitting of the end joint points, thus affecting The positioning accuracy of the end joint points. In order to improve the second shortcoming, the third part of the literature divides the human body into upper, middle and lower regions, and returns the joint points of the three regions separately, but ignores the first shortcoming.
对于二维图像人体部件检测方法,其首先利用滑动窗口扫描的方式提取图像局部特征,并进行部件分类,然后利用结构化模型对部件之间的相对位置关系进行约束,从而检测出最优的人体部件构型,并得到各个部件所在的区域及相应关节点的位置坐标。二维图像人体部件 检测方法涉及两方面的关键技术,一个是部件的局部特征表达,另一个是人体的结构化建模。For the two-dimensional image human body part detection method, the first part of the image is extracted by means of sliding window scanning, and the parts are classified, and then the structural model is used to constrain the relative positional relationship between the parts, thereby detecting the optimal human body. The component is configured and the coordinates of the area where the components are located and the corresponding joint points are obtained. Two-dimensional image body parts The detection method involves two key technologies, one is the local feature representation of the component, and the other is the structural modeling of the human body.
在部件的局部特征表达方面,现有方法主要采用手工设计的特征或者通过学习得到的特征。文献四采用梯度方向直方图来对部件的局部特征进行表达;文献五则是采用了形状上下文特征。手工设计的特征不需要经过训练,简单快速,但缺点是特征表达能力弱,对噪声不鲁棒。文献六提出利用卷积神经网络对部件的局部区域进行特征提取,这样增强了不同姿态下部件局部特征的表达力,还增强了对噪声的鲁棒性。但是,文献六只是考虑了部件没有被遮挡的情况,该方法对被遮挡的关节点定位精度较差。In terms of local feature representation of components, existing methods primarily employ hand-designed features or features learned through learning. Document 4 uses the gradient direction histogram to express the local features of the component; the fifth is the shape context feature. Manually designed features do not need to be trained, simple and fast, but the disadvantage is that the feature expression is weak and not robust to noise. In the sixth paper, the convolutional neural network is used to extract the features of the local regions of the components, which enhances the expressive power of the local features of the components in different poses and enhances the robustness to noise. However, Document 6 only considers the case where the component is not occluded, and the method has poor positioning accuracy for the occluded joint point.
在人体的结构化建模方面,所采用的模型结构包括树型结构模型和带环图模型。现有的大部分人体结构化建模方法都采用了树型结构模型,参见文献四和文献六。树型结构模型虽然结构简单且便于快速推理,但难以对复杂的遮挡关系尤其是自遮挡关系进行建模。带环图模型相比树型结构模型,其最大的不同是在模型结构中引入了环路。例如:文献七和文献八即为采用的带环图模型。虽然带环图模型增强了模型的表达能力和对遮挡的鲁棒性,但其推理复杂度很高,这限制了它在人体结构建模中的应用。In the structural modeling of the human body, the model structure used includes a tree structure model and a band graph model. Most of the existing anatomical modeling methods use a tree structure model, see Document 4 and Document 6. Although the tree structure model is simple in structure and convenient for fast reasoning, it is difficult to model complex occlusion relationships, especially self-occlusion relationships. The biggest difference between the band graph model and the tree structure model is the introduction of loops in the model structure. For example, Document 7 and Document 8 are the band graph models used. Although the band graph model enhances the model's expressive ability and robustness to occlusion, its reasoning complexity is high, which limits its application in anatomical modeling.
有鉴于此,特提出本发明。In view of this, the present invention has been specifically proposed.
上述相关文献罗列如下:The above related documents are listed below:
文献一:Alexander Toshev and Christian Szegedy.Deeppose:Human pose estimation via deep neural networks.In IEEE Conference on Computer Vision and Pattern Recognition,pages 1653–1660,2014;Document 1: Alexander Toshev and Christian Szegedy. Deeppose: Human pose estimation via deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1653–1660, 2014;
文献二:US7778446B2,FAST HUMAN POSE ESTIMATION USING APPEARANCE AND MOTION VIA MULTI-DIMENSIONAL BOOSTING REGRESSION;Document 2: US7778446B2, FAST HUMAN POSE ESTIMATION USING APPEARANCE AND MOTION VIA MULTI-DIMENSIONAL BOOSTING REGRESSION;
文献三:Belagiannis Vasileios,Rupprecht Christian,Carneiro Gustavo,and Navab Nassir.Robust optimization for deep regression.In International Conference on Computer Vision,pages 2830–2838,2015;Document 3: Belagiannis Vasileios, Rupprecht Christian, Carneiro Gustavo, and Navab Nassir. Robust optimization for deep regression. In International Conference on Computer Vision, pages 2830–2838, 2015;
文献四:Y.Yang and D.Ramanan.Articulated pose estimation with flexible mixtures-of-parts.In IEEE Conference on Computer Vision and Pattern Recognition,pages 1385–1392,2011; Document 4: Y.Yang and D.Ramanan.Articulated pose estimation with flexible mixtures-of-parts.In IEEE Conference on Computer Vision and Pattern Recognition,pages 1385–1392,2011;
文献五:US7925081B2,SYSTEMS AND METHODS FOR HUMAN BODY POSE ESTIMATION;Document 5: US7925081B2, SYSTEMS AND METHODS FOR HUMAN BODY POSE ESTIMATION;
文献六:Xianjie Chen and Alan L.Yuille.Articulated pose estimation by a graphical model with image dependent pairwise relations.In Advances in Neural Information Processing Systems,pages 1736–1744,2014;Document 6: Xianjie Chen and Alan L. Yuille. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Advances in Neural Information Processing Systems, pages 1736–1744, 2014;
文献七:Mykhaylo Andriluka,Stefan Roth,and Bernt Schiele.Discriminative appearance models for pictorial structures.International Journal of Computer Vision,99(3):259–280,2012;Document 7: Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. Discriminative appearance models for pictorial structures. International Journal of Computer Vision, 99(3): 259–280, 2012;
文献八:Leonid Sigal and Michael J.Black.Measure locally,reason globally:Occlusion-sensitive articulated pose estimation.In IEEE Conference on Computer Vision and Pattern Recognition,pages 2041–2048,2006。Document 8: Leonid Sigal and Michael J. Black. Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2041–2048, 2006.
发明内容Summary of the invention
为了解决现有技术中的上述问题,即为了解决如何对二维图像中人体关节点进行准确和鲁棒地定位的技术问题而提供一种二维图像人体关节点定位模型的构建方法。此外,还提供一种基于该构建方法的定位方法。In order to solve the above problems in the prior art, a method for constructing a two-dimensional image human joint point positioning model is provided in order to solve the technical problem of how to accurately and robustly locate a human joint point in a two-dimensional image. In addition, a positioning method based on the construction method is also provided.
为了实现上述目的,一方面,提供以下技术方案:In order to achieve the above object, on the one hand, the following technical solutions are provided:
一种二维图像人体关节点定位模型的构建方法,所述构建方法包括:A method for constructing a two-dimensional image human joint point positioning model, the construction method comprising:
利用标记完人体关节点位置坐标和遮挡状态的彩色图像,构建人体部件局部特征训练样本集和人体部件全局构型样本集;Constructing a human component local feature training sample set and a human body component global configuration sample set by using a color image marking the position coordinates and the occlusion state of the human joint point;
构建深度卷积神经网络,利用所述人体部件局部特征训练样本集来训练所述深度卷积神经网络,得到人体部件局部表观模型;Constructing a deep convolutional neural network, training the deep convolutional neural network with the human component local feature training sample set, and obtaining a local apparent model of the human body component;
利用所述人体部件局部表观模型和所述人体部件全局构型样本集,得到遮挡关系图模型;Obtaining an occlusion relationship graph model by using the partial appearance model of the human body component and the global configuration sample set of the human body component;
将所述人体部件局部表观模型和所述遮挡关系图模型确定为二维图像人体关节点定位模型。The body part local apparent model and the occlusion relationship model are determined as a two-dimensional image human joint point positioning model.
较佳地,所述构建人体部件局部特征训练样本集具体可以包括: Preferably, the constructing the body part local feature training sample set may specifically include:
计算任一所述人体部件相对于其父结点的相对位置;Calculating the relative position of any of the body members relative to their parent node;
对所有所述彩色图像的所述相对位置进行聚类;Clustering the relative positions of all of the color images;
利用所述人体部件所在的图像区域与其聚类得到的类别,来构建所述人体部件局部特征训练样本集。The body part local feature training sample set is constructed by using an image region in which the body part is located and a category obtained by clustering the body part.
较佳地,所述构建人体部件全局构型样本集具体可以包括:Preferably, the constructing the human body component global configuration sample set may specifically include:
确定所述人体部件的样本标签;Determining a sample label of the body part;
确定所有所述人体部件对应的图像区域;Determining an image area corresponding to all of the body parts;
利用所述样本标签和所述图像区域来组成所述人体部件全局构型样本集。The sample set and the image area are utilized to form the body part global configuration sample set.
较佳地,所述构建深度卷积神经网络具体可以包括:Preferably, the constructing the deep convolutional neural network may specifically include:
将所述深度卷积神经网络的基本单元确定为5个卷积层和3个全连接层;Determining the basic unit of the deep convolutional neural network as 5 convolutional layers and 3 fully connected layers;
将所述部件所在的图像区域作为所述深度卷积神经网络的输入。The image area in which the component is located is used as an input to the deep convolutional neural network.
较佳地,所述利用所述人体部件局部表观模型和所述人体部件全局构型样本集,得到遮挡关系图模型具体可以包括:Preferably, the obtaining the occlusion relationship graph model by using the partial appearance model of the human body component and the global configuration sample set of the human body component may specifically include:
建立所述人体各部件之间带有环路的连接关系;Establishing a connection relationship between the components of the human body with a loop;
基于所述人体各部件之间带有环路的连接关系,利用所述人体部件局部表观模型,采用结构化支持向量机,在所述人体部件全局构型样本集上采用对偶坐标下降法,训练得到存在约束关系的任意二所述人体部件之间相对位置对应的权重以及任一人体部件的表观特征权重系数,从而得到遮挡关系图模型。Based on the connection relationship between the components of the human body with a loop, using the local apparent model of the human body component, using a structured support vector machine, using a dual coordinate descent method on the global component sample set of the human body component, The training obtains the weight corresponding to the relative position between the two human body components having the constraint relationship and the apparent feature weight coefficient of any human body component, thereby obtaining an occlusion relationship graph model.
为了实现上述目的,另一方面,还提供一种基于上述构建方法的二维图像人体关节点定位方法,所述定位方法包括:In order to achieve the above object, on the other hand, a two-dimensional image human joint point positioning method based on the above construction method is further provided, and the positioning method includes:
获取待检测图像;Obtaining an image to be detected;
利用所述人体部件局部表观模型,提取所述待检测图像的局部表观特征;Extracting, by the local appearance model of the human body component, a partial appearance feature of the image to be detected;
基于所述待检测图像的局部表观特征,利用所述遮挡关系图模型,并根据以下公式得到最优人体部件构型:Based on the local appearance feature of the image to be detected, the occlusion graph model is utilized, and an optimal body part configuration is obtained according to the following formula:
(xi*,yi*,oi*,ti*)=argmax(∑γij*Δij+∑ωi*pi);(xi*, yi*, oi*, ti*)=argmax(∑γij*Δij+∑ωi*pi);
其中,所述xi表示部件i的横坐标;所述yi表示所述部件i的纵坐标;所述oi表示所述部件i的遮挡状态;所述ti表示所述部件i 的类别;部件j为所述部件i的父结点部件;所述Δij表示所述部件i和j之间的相对位置;所述γij表示所述相对位置Δij对应的权重;所述ωi表示所述部件i的表观特征权重系数;所述pi表示所述部件i的局部表观特征;所述i和所述j取正整数;Wherein xi represents the abscissa of the component i; the yi represents the ordinate of the component i; the oi represents the occlusion state of the component i; and the ti represents the component i a category j; a component j is a parent node component of the component i; the Δij represents a relative position between the components i and j; the γij represents a weight corresponding to the relative position Δij; An apparent characteristic weight coefficient of the component i; the pi represents a local apparent feature of the component i; the i and the j take a positive integer;
将所述最优人体部件构型中所述人体部件区域的中心位置确定为所述人体部件处关节点位置。The center position of the body part area in the optimal body part configuration is determined as the joint point position at the body part.
较佳地,所述利用所述人体部件局部表观模型,提取所述待检测图像的局部表观特征具体可以包括:Preferably, the utilizing the local appearance model of the human body component to extract the local appearance feature of the image to be detected may specifically include:
将所述待检测图像划分成多个局部图像区域;Dividing the image to be detected into a plurality of partial image regions;
将各所述局部图像区域作为所述人体部件局部表观模型的输入,得到所述待检测图像的局部表观特征。Each of the partial image regions is used as an input of the partial appearance model of the human body component to obtain a partial appearance feature of the image to be detected.
本发明实施例提供一种二维图像人体关节点定位模型的构建方法及基于该构建方法的二维图像人体关节点定位方法。其中,该构建方法可包括:利用标记完人体关节点位置坐标和遮挡状态的彩色图像,构建人体部件局部特征训练样本集和人体部件全局构型样本集;构建深度卷积神经网络,利用人体部件局部特征训练样本集来训练深度卷积神经网络,得到人体部件局部表观模型;利用人体部件局部表观模型和人体部件全局构型样本集,得到遮挡关系图模型;将人体部件局部表观模型和遮挡关系图模型确定为二维图像人体关节点定位模型。可见,本发明能同时对自遮挡和它遮挡进行建模,并学习人体部件之间以及部件和背景之间的遮挡关系;通过将深度卷积神经网络特征提取与图模型结构相融合,本发明能够对较大运动姿态以及局部遮挡情况下人体关节点的鲁棒定位。本发明所采用的模型结构不仅能对物理上相连的部件之间的关系进行建模,还可以对不直接相连的左右肢体部件之间的空间上下文关系进行建模,从而增强了对遮挡的鲁棒性。本发明将人体部件局部表观模型和图结构模型的紧密结合,能够有效克服大幅度动作和局部遮挡带来的不利影响,提高了二维图像人体关节点定位的鲁棒性。Embodiments of the present invention provide a method for constructing a two-dimensional image human joint point positioning model and a two-dimensional image human joint point positioning method based on the construction method. The construction method may include: constructing a human component local feature training sample set and a human body component global configuration sample set by using a color image marked with a human joint point position coordinate and an occlusion state; constructing a deep convolutional neural network, and utilizing the human body component The local feature training sample set is used to train the deep convolutional neural network to obtain the local apparent model of the human body component; the occlusion relationship diagram model is obtained by using the local component appearance model of the human body component and the global configuration sample set of the human body component; the partial appearance model of the human body component is obtained. The occlusion map model is determined as a two-dimensional image human joint point location model. It can be seen that the present invention can simultaneously model the self-occlusion and its occlusion, and learn the occlusion relationship between the human body components and between the components and the background; by combining the deep convolutional neural network feature extraction with the graph model structure, the present invention Robust positioning of human joint points in large motion poses and local occlusion situations. The model structure adopted by the present invention can not only model the relationship between physically connected components, but also model the spatial context relationship between left and right limb components that are not directly connected, thereby enhancing the occlusion of the Lu Great. The invention combines the local apparent model of the human body component and the structural model of the figure, can effectively overcome the adverse effects brought by the large-scale action and the partial occlusion, and improves the robustness of the joint position of the human body in the two-dimensional image.
附图说明DRAWINGS
图1是根据本发明实施例的二维图像人体关节点定位模型的构建方法的流程示意图; 1 is a schematic flow chart of a method for constructing a two-dimensional image human joint point positioning model according to an embodiment of the present invention;
图2是根据本发明实施例的构建人体部件局部特征训练样本集的流程示意图;2 is a schematic flow chart of constructing a set of local feature training samples of a human body component according to an embodiment of the present invention;
图3是根据本发明实施例的构建人体部件全局构型样本集的流程示意图;3 is a flow chart showing the construction of a global component sample set of a human body component according to an embodiment of the present invention;
图4是根据本发明实施例的所构建的深度卷积神经网络示意图;4 is a schematic diagram of a constructed deep convolutional neural network according to an embodiment of the present invention;
图5是根据本发明实施例的遮挡关系图模型的示意图;FIG. 5 is a schematic diagram of an occlusion diagram model according to an embodiment of the present invention; FIG.
图6是根据本发明实施例的二维图像人体关节点定位方法的流程示意图。6 is a flow chart showing a method for positioning a two-dimensional image human joint point according to an embodiment of the invention.
具体实施方式detailed description
下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是,这些实施方式仅仅用于解释本发明的技术原理,并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the scope of the present invention.
本发明实施例的基本构思是在部件局部特征表达和人体结构化建模两方面对人体部件遮挡关系进行建模。The basic idea of the embodiment of the present invention is to model the occlusion relationship of the human body component in both the local feature representation of the component and the structural modeling of the human body.
在实际应用中,诸如名称为“一种人体姿态估计方法”、申请号为201510792096.4的现有技术公开了一种类似的人体关节点定位算法,其输入是彩色图像和深度图像;并且采用局部特征为梯度方向直方图特征;而且采用的是结构模型为树型结构。然而,该方法不能处理人体部件之间相互遮挡的情况。In a practical application, such as the prior art entitled "A Human Body Attitude Estimation Method", Application No. 201510792096.4 discloses a similar human joint point positioning algorithm whose input is a color image and a depth image; and uses local features It is a gradient direction histogram feature; and the structural model is a tree structure. However, this method cannot handle the situation in which the human body parts are occluded from each other.
为此,本发明实施例提出一种二维图像人体关节点定位模型的构建方法。如图1所示,该构建方法可以通过步骤S100至步骤S130来实现。其中:To this end, an embodiment of the present invention provides a method for constructing a two-dimensional image human joint point positioning model. As shown in FIG. 1, the construction method can be implemented by step S100 to step S130. among them:
S100:利用标记完人体关节点位置坐标和遮挡状态的彩色图像,构建人体部件局部特征训练样本集和人体部件全局构型样本集。S100: Constructing a human component local feature training sample set and a human body component global configuration sample set by using a color image marking the position coordinates and the occlusion state of the human joint point.
在一些实施例中,如图2所示,上述构建人体部件局部特征训练样本集的过程可以通过以下优选方式来实现:In some embodiments, as shown in FIG. 2, the above process of constructing a body part local feature training sample set can be implemented by the following preferred methods:
S101:计算任一人体部件相对于其父结点的相对位置。S101: Calculate the relative position of any human body component relative to its parent node.
S102:对所有彩色图像的上述相对位置进行聚类。S102: Clustering the relative positions of all the color images.
S103:利用该人体部件所在的图像区域与其聚类得到的类别,来构建人体部件局部特征训练样本集。 S103: Construct a human component local feature training sample set by using an image region in which the human body component is located and a category obtained by clustering the body component.
下面以一优选实施例来详细说明构建部件局部特征训练样本集的过程。The process of constructing a component local feature training sample set is described in detail below in a preferred embodiment.
步骤a:计算第i个部件相对其父结点部件j的相对位置Δij。其中,i和j取正整数。Step a: Calculate the relative position Δij of the i-th component relative to its parent node component j. Where i and j take a positive integer.
步骤b:利用k-means对所有图像的相对位置Δij进行聚类。Step b: Clustering the relative positions Δij of all images using k-means.
在实施过程中,类别数可以取为13。In the implementation process, the number of categories can be taken as 13.
步骤c:用第i个部件所在的图像区域Ii与第i个部件聚类得到的类别ti(ti为部件i的类别),来构建人体部件局部特征训练样本集。Step c: constructing a body part local feature training sample set by using the image area Ii in which the i-th component is located and the category ti (ti is the category of the component i) obtained by clustering the i-th component.
在一些实施例中,如图3所示,上述构建人体部件全局构型样本集的过程可以通过以下优选方式予以实现:In some embodiments, as shown in FIG. 3, the above process of constructing a human component global configuration sample set can be implemented by the following preferred methods:
S105:确定人体部件的样本标签。S105: Determine a sample label of a human body part.
S106:确定所有人体部件对应的图像区域。S106: Determine an image area corresponding to all body parts.
S107:利用样本标签和图像区域来组成人体部件全局构型样本集。S107: Using the sample label and the image area to form a human body component global configuration sample set.
下面以一优选实施例来详细说明构建人体部件全局构型样本集的过程。The process of constructing a global configuration sample set of a human body component is described in detail below with a preferred embodiment.
步骤d:确定第i个部件的样本标签为(xi,yi,oi,ti),其中xi表示部件i的横坐标;yi表示部件i的纵坐标;oi表示部件i的遮挡状态,其取值为0,1和2,其中,0表示可见,1表示被人体其它部件遮挡,2表示被背景遮挡;ti表示部件i的类别。Step d: determining that the sample label of the i-th component is (xi, yi, oi, ti), where xi represents the abscissa of the component i; yi represents the ordinate of the component i; oi represents the occlusion state of the component i, and its value 0, 1 and 2, where 0 means visible, 1 means occluded by other parts of the human body, 2 means occluded by the background; ti means the category of the part i.
步骤e:确定所有部件对应的图像区域。Step e: Determine the image area corresponding to all the components.
步骤f:利用样本标签和图像区域来组成人体部件全局构型样本集。Step f: Using the sample tag and the image area to form a human component global configuration sample set.
S110:构建深度卷积神经网络,利用人体部件局部特征训练样本集来训练深度卷积神经网络,得到人体部件局部表观模型及。S110: Constructing a deep convolutional neural network, training a deep convolutional neural network by using a local feature training sample set of a human body component, and obtaining a partial apparent model of the human body component.
现有技术(例如:Yoshua Bengio,Yann LeCun,Craig R.Nohl,Christopher J.C.Burges:LeRec:a NN/HMM hybrid for on-line handwriting recognition.Neural Computation 7(6):1289-1303(1995))采用了LeNet网络结构来实施训练。其中,LeNet网络结构的输入是灰度图像;基本单元为3个卷积层和2个全连接层。 Prior art (eg: Yoshua Bengio, Yann LeCun, Craig R. Nohl, Christopher JC Burges: LeRec: a NN/HMM hybrid for on-line handwriting recognition. Neural Computation 7(6): 1289-1303 (1995)) The LeNet network structure was used to implement the training. Among them, the input of the LeNet network structure is a grayscale image; the basic unit is 3 convolution layers and 2 fully connected layers.
本发明实施例对上述现有技术进行了改进。在一些实施例中,本步骤中构建深度卷积神经网络可以通过以下优选方式予以实现:将深度卷积神经网络的基本单元确定为5个卷积层和3个全连接层。将部件所在的图像区域(即彩色的局部区域图像)作为深度卷积神经网络的输入。通过上述方式的构建,深度卷积神经网络可以输出为部件类别的概率。其中,部件类别的概率表示图像区域属于部件i的概率。图4示例性地示出了本发明实施例构建的深度卷积神经网络示意图。Embodiments of the present invention improve upon the above prior art. In some embodiments, constructing a deep convolutional neural network in this step can be accomplished by the following preferred means: determining the basic unit of the deep convolutional neural network as 5 convolutional layers and 3 fully connected layers. The image area in which the component is located (i.e., the color local area image) is used as an input to the deep convolutional neural network. Through the construction of the above method, the depth convolutional neural network can output the probability of being a component class. Among them, the probability of the component category indicates the probability that the image region belongs to component i. FIG. 4 exemplarily shows a schematic diagram of a deep convolutional neural network constructed by an embodiment of the present invention.
在一些实施例中,本步骤中的训练过程可以包括前向传播和反向传播过程。其中,前向传播过程逐层对彩色的部件所在的图像区域进行卷积操作和矩阵乘法运算;反向传播过程将预测误差和样本标签之间的误差用梯度下降法逐层反向进行传递,并修正全连接层和卷积层的参数。In some embodiments, the training process in this step can include forward propagation and back propagation processes. The forward propagation process performs a convolution operation and a matrix multiplication operation on the image region where the color component is located layer by layer; the back propagation process transmits the error between the prediction error and the sample label by the gradient descent method layer by layer. And modify the parameters of the fully connected layer and the convolution layer.
在具体实施时,为了便于处理,可以将彩色的部件局部区域图像缩放到36×36像素,来作为深度卷积神经网络的输入。In a specific implementation, for ease of processing, the color component local area image may be scaled to 36 x 36 pixels as an input to the deep convolutional neural network.
本步骤中的人体部件局部表观模型参数可以是深度卷积神经网络中卷积层和全连接层神经元的参数。The local apparent model parameters of the human body component in this step may be parameters of the convolutional layer and the fully connected layer neurons in the deep convolutional neural network.
由于深度卷积神经网络是一种有监督的学习算法,人体部件局部表观模型是通过训练样本进行有监督的学习而得到的,所以不需要人工干预。Since the deep convolutional neural network is a supervised learning algorithm, the local apparent model of the human body component is obtained through supervised learning of the training samples, so no manual intervention is required.
又由于人体部件局部表观模型是利用深度卷积神经网络实现的,这样可以充分利用大量训练样本拟合变化多样的表观特征,还可以使所提取的部件特征更为鲁棒。Because the local apparent model of the human body component is realized by the deep convolutional neural network, it can make full use of a large number of training samples to fit the varied apparent features, and can also make the extracted component features more robust.
将深度卷积神经网络特征提取与图模型结构相融合,能够在较大运动姿态以及局部遮挡情况下实现人体关节点的鲁棒定位。The feature extraction of deep convolutional neural network and the structure of the graph model can achieve robust positioning of human joint points in the case of large motion pose and local occlusion.
S120:利用人体部件局部表观模型和人体部件全局构型样本集,得到遮挡关系图模型。S120: Obtaining an occlusion relationship graph model by using a local appearance model of the human body component and a global configuration sample set of the human body component.
在一些实施例中,本步骤具体可以包括:In some embodiments, the step may specifically include:
S121:建立人体各部件之间带有环路的连接关系。S121: Establish a connection relationship between the components of the human body with a loop.
通过将人体各部件之间的连接关系设置成带有环路的连接关系,能够对人体部件之间的遮挡关系以及人体部件与背景之间的遮挡关系进行建模。 By setting the connection relationship between the various components of the human body to have a loop-connected relationship, it is possible to model the occlusion relationship between the human body components and the occlusion relationship between the human body components and the background.
S122:基于人体各部件之间带有环路的连接关系,利用人体部件局部表观模型,采用结构化支持向量机,在上述人体部件全局构型样本集上采用对偶坐标下降法,训练得到存在约束关系的任意二人体部件之间相对位置对应的权重以及任一人体部件的表观特征权重系数,从而得到遮挡关系图模型。S122: based on the connection relationship between the various components of the human body, using the local apparent model of the human body component, using the structured support vector machine, using the dual coordinate descent method on the global configuration sample set of the human body component, the training is obtained. The weight corresponding to the relative position between any two human body parts of the constraint relationship and the apparent characteristic weight coefficient of any human body component, thereby obtaining an occlusion relationship diagram model.
图5示例性地示出了遮挡关系图模型的示意图。其中,圆圈表示人体的14个关节点部件,各条边表示人体各个部件之间的连接关系。与现有技术(例如:文献四)所述的树型结构模型相比,本发明实施例构建的遮挡关系图模型的连接关系带有环路,也即带环图模型。FIG. 5 exemplarily shows a schematic diagram of an occlusion diagram model. Among them, the circle represents the 14 joint point parts of the human body, and each side represents the connection relationship between the various parts of the human body. Compared with the tree structure model described in the prior art (for example, Document 4), the connection relationship of the occlusion diagram model constructed by the embodiment of the present invention has a loop, that is, a loop diagram model.
下面以一优选实施例的方式来说明得到遮挡关系图模型及其参数的过程。The process of obtaining the occlusion diagram model and its parameters is described below in a preferred embodiment.
图结构模型的参数包括存在约束关系的部件i和j之间的相对位置Δij对应的权重γij,以及部件i相应的表观特征权重系数ωi。利用文献九(例如:Ioannis Tsochantaridis,Thorsten Joachims,Thomas Hofmann and Yasemin Altun(2005),Large Margin Methods for Structured and Interdependent Output Variables,JMLR,Vol.6,pages 1453-1484)所述的结构化支持向量机,在人体部件全局构型样本集上,采用文献九所述的对偶坐标下降法训练得到结构模型的参数γij和ωi。其中,若部件i的遮挡状态oi取2时,则将ωi置为0,此时部件i被背景遮挡。The parameters of the graph structure model include the weight γij corresponding to the relative position Δij between the components i and j in which the constraint relationship exists, and the corresponding apparent weight coefficient ωi of the component i. A structured support vector machine using the document IX (for example: Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann and Yasemin Altun (2005), Large Margin Methods for Structured and Interdependent Output Variables, JMLR, Vol. 6, pages 1453-1484) On the global component sample set of the human body component, the parameters γij and ωi of the structural model are obtained by the dual coordinate descent method described in the literature. Wherein, if the occlusion state oi of the component i is 2, ωi is set to 0, and the component i is blocked by the background.
本发明实施例构建的遮挡关系图模型既能对遮挡关系进行表达,又具有和树型结构模型相近的推理复杂度。并且,由于深度卷积神经网络是一种有监督的学习算法,遮挡关系图模型是通过训练样本进行有监督的学习而得到的,所以不需要人工干预。The occlusion graph model constructed by the embodiment of the invention can not only express the occlusion relationship, but also has the reasoning complexity similar to the tree structure model. Moreover, since the deep convolutional neural network is a supervised learning algorithm, the occlusion map model is obtained by supervised learning through training samples, so no manual intervention is required.
S130:将人体部件局部表观模型和遮挡关系图模型确定为二维图像人体关节点定位模型。S130: Determine a partial appearance model and an occlusion relationship model of the human body component as a two-dimensional image human joint point positioning model.
在上述实施例的基础上,本发明实施例还提供一种二维图像人体关节点定位方法。如图6所示,该定位方法可以通过步骤S140至步骤S170来实现。其中:On the basis of the above embodiments, the embodiment of the present invention further provides a two-dimensional image human joint point positioning method. As shown in FIG. 6, the positioning method can be implemented by step S140 to step S170. among them:
S140:获取待检测图像。S140: Acquire an image to be detected.
S150:利用人体部件局部表观模型,提取待检测图像的局部表观特征。S150: Extracting a local appearance feature of the image to be detected by using a local appearance model of the human body component.
具体地,本步骤可以包括: Specifically, this step may include:
S151:将待检测图像划分成局部图像区域。S151: Divide the image to be detected into a partial image area.
S152:将各局部图像区域作为人体部件局部表观模型的输入,得到待检测图像的局部表观特征。S152: Taking each partial image area as an input of a partial appearance model of the human body component, obtaining a partial appearance feature of the image to be detected.
下面结合具体实例来说明提取待检测图像的局部表观特征的过程:The process of extracting the local apparent features of the image to be detected is described below with reference to specific examples:
将待检测图像划分成局部图像区域,将每个局部图像区域缩放到36×36像素大小,然后将缩放后的图像送入人体部件局部表观模型(即训练得到的深度卷积神经网络),经过5个卷积层和3个全连接层后得到该局部图像区域像部件i的概率pi。其中,pi越大表示该局部图像区域越像部件i。本实施例得到的概率pi可以作为待检测图像的局部表观特征,以用于后续的处理。Dividing the image to be detected into partial image regions, scaling each partial image region to a size of 36×36 pixels, and then feeding the scaled image into a partial appearance model of the human body component (ie, a trained deep convolutional neural network), The probability pi of the partial image area image component i is obtained after 5 convolutional layers and 3 fully connected layers. Wherein, a larger pi indicates that the partial image area is more like the component i. The probability pi obtained in this embodiment can be used as a partial appearance feature of the image to be detected for subsequent processing.
S160:基于待检测图像的局部表观特征,利用遮挡关系图模型,并根据以下公式得到最优人体部件构型:S160: Based on the local appearance feature of the image to be detected, the occlusion relationship model is used, and the optimal human body component configuration is obtained according to the following formula:
(xi*,yi*,oi*,ti*)=argmax(∑γij*Δij+∑ωi*pi)(1)(xi*, yi*, oi*, ti*)=argmax(∑γij*Δij+∑ωi*pi)(1)
其中,xi表示部件i的横坐标;yi表示部件i的纵坐标;oi表示部件i的遮挡状态;ti表示部件i的类别;部件j为部件i的父结点部件;Δij表示部件i和j之间的相对位置;γij表示相对位置Δij对应的权重;ωi表示部件i的表观特征权重系数;pi表示部件i的局部表观特征,例如可以为局部图像区域像部件i的概率;i和j取正整数。Where xi represents the abscissa of component i; yi represents the ordinate of component i; oi represents the occlusion state of component i; ti represents the class of component i; component j is the parent node component of component i; Δij represents component i and j The relative position between; γij represents the weight corresponding to the relative position Δij; ωi represents the apparent feature weight coefficient of the component i; pi represents the local apparent feature of the component i, for example, the probability of the partial image region image component i; i and j takes a positive integer.
通过上述公式(1)可以得到部件i处关节点位置预测值为(xi*,yi*)。(xi*,yi*)即为本实施例定位的部件i处关节点。The predicted value of the joint point position at the component i can be obtained by the above formula (1) (xi*, yi*). (xi*, yi*) is the joint point of the component i positioned for this embodiment.
S170:将最优人体部件构型中人体部件区域的中心位置确定为人体部件处关节点位置。S170: Determine a center position of the body part area in the optimal body part configuration as a joint point position at the body part.
尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,或者将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。Although the operation of the method of the present invention is described in a particular order in the figures, this is not a requirement or implied that the operations must be performed in that particular order, or that all of the illustrated operations must be performed to achieve the desired results. Additionally or alternatively, certain steps may be omitted, or multiple steps may be combined into one step execution, and/or one step may be decomposed into multiple steps.
需要理解的是,附图中的任何元素数量均用于示例而非限制,以及任何命名都仅用于区分,而不具有任何限制含义。 It is to be understood that any number of elements in the figures are used for the purposes of illustration and not limitation, and any no
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。 Heretofore, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the drawings, but it is obvious to those skilled in the art that the scope of the present invention is obviously not limited to the specific embodiments. Those skilled in the art can make equivalent changes or substitutions to the related technical features without departing from the principles of the present invention, and the technical solutions after the modifications or replacements fall within the scope of the present invention.

Claims (7)

  1. 一种二维图像人体关节点定位模型的构建方法,其特征在于,所述构建方法包括:A method for constructing a two-dimensional image human joint point positioning model, characterized in that the construction method comprises:
    利用标记完人体关节点位置坐标和遮挡状态的彩色图像,构建人体部件局部特征训练样本集和人体部件全局构型样本集;Constructing a human component local feature training sample set and a human body component global configuration sample set by using a color image marking the position coordinates and the occlusion state of the human joint point;
    构建深度卷积神经网络,利用所述人体部件局部特征训练样本集来训练所述深度卷积神经网络,得到人体部件局部表观模型;Constructing a deep convolutional neural network, training the deep convolutional neural network with the human component local feature training sample set, and obtaining a local apparent model of the human body component;
    利用所述人体部件局部表观模型和所述人体部件全局构型样本集,得到遮挡关系图模型;Obtaining an occlusion relationship graph model by using the partial appearance model of the human body component and the global configuration sample set of the human body component;
    将所述人体部件局部表观模型和所述遮挡关系图模型确定为二维图像人体关节点定位模型。The body part local apparent model and the occlusion relationship model are determined as a two-dimensional image human joint point positioning model.
  2. 根据权利要求1所述的构建方法,其特征在于,所述构建人体部件局部特征训练样本集具体包括:The construction method according to claim 1, wherein the constructing the body part local feature training sample set specifically comprises:
    计算任一所述人体部件相对于其父结点的相对位置;Calculating the relative position of any of the body members relative to their parent node;
    对所有所述彩色图像的所述相对位置进行聚类;Clustering the relative positions of all of the color images;
    利用所述人体部件所在的图像区域与其聚类得到的类别,来构建所述人体部件局部特征训练样本集。The body part local feature training sample set is constructed by using an image region in which the body part is located and a category obtained by clustering the body part.
  3. 根据权利要求1所述的构建方法,其特征在于,所述构建人体部件全局构型样本集具体包括:The constructing method according to claim 1, wherein the constructing the human component global configuration sample set specifically comprises:
    确定所述人体部件的样本标签;Determining a sample label of the body part;
    确定所有所述人体部件对应的图像区域;Determining an image area corresponding to all of the body parts;
    利用所述样本标签和所述图像区域来组成所述人体部件全局构型样本集。The sample set and the image area are utilized to form the body part global configuration sample set.
  4. 根据权利要求2或3所述的构建方法,其特征在于,所述构建深度卷积神经网络具体包括:The construction method according to claim 2 or 3, wherein the constructing the deep convolutional neural network specifically comprises:
    将所述深度卷积神经网络的基本单元确定为5个卷积层和3个全连接层;Determining the basic unit of the deep convolutional neural network as 5 convolutional layers and 3 fully connected layers;
    将所述部件所在的图像区域作为所述深度卷积神经网络的输入。The image area in which the component is located is used as an input to the deep convolutional neural network.
  5. 根据权利要求1所述的构建方法,其特征在于,所述利用所述人体部件局部表观模型和所述人体部件全局构型样本集,得到遮挡关系图模型,具体包括: The constructing method according to claim 1, wherein the occlusion relationship graph model is obtained by using the local component apparent model of the human body component and the global component sample set of the human body component, and specifically includes:
    建立所述人体各部件之间带有环路的连接关系;Establishing a connection relationship between the components of the human body with a loop;
    基于所述人体各部件之间带有环路的连接关系,利用所述人体部件局部表观模型,采用结构化支持向量机,在所述人体部件全局构型样本集上采用对偶坐标下降法,训练得到存在约束关系的任意二所述人体部件之间相对位置对应的权重以及任一人体部件的表观特征权重系数,从而得到遮挡关系图模型。Based on the connection relationship between the components of the human body with a loop, using the local apparent model of the human body component, using a structured support vector machine, using a dual coordinate descent method on the global component sample set of the human body component, The training obtains the weight corresponding to the relative position between the two human body components having the constraint relationship and the apparent feature weight coefficient of any human body component, thereby obtaining an occlusion relationship graph model.
  6. 一种基于上述权利要求1、2、3、5中任一所述构建方法的二维图像人体关节点定位方法,其特征在于,所述定位方法包括:A two-dimensional image human joint point positioning method according to any one of the preceding claims 1, 2, 3, and 5, wherein the positioning method comprises:
    获取待检测图像;Obtaining an image to be detected;
    利用所述人体部件局部表观模型,提取所述待检测图像的局部表观特征;Extracting, by the local appearance model of the human body component, a partial appearance feature of the image to be detected;
    基于所述待检测图像的局部表观特征,利用所述遮挡关系图模型,并根据以下公式得到最优人体部件构型:Based on the local appearance feature of the image to be detected, the occlusion graph model is utilized, and an optimal body part configuration is obtained according to the following formula:
    (xi*,yi*,oi*,ti*)=argmax(∑γij*Δij+∑ωi*pi);(xi*, yi*, oi*, ti*)=argmax(∑γij*Δij+∑ωi*pi);
    其中,所述xi表示部件i的横坐标;所述yi表示所述部件i的纵坐标;所述oi表示所述部件i的遮挡状态;所述ti表示所述部件i的类别;部件j为所述部件i的父结点部件;所述Δij表示所述部件i和j之间的相对位置;所述γij表示所述相对位置Δij对应的权重;所述ωi表示所述部件i的表观特征权重系数;所述pi表示所述部件i的局部表观特征;所述i和所述j取正整数;Wherein xi represents the abscissa of the component i; the yi represents the ordinate of the component i; the oi represents the occlusion state of the component i; the ti represents the category of the component i; the component j is a parent node component of the component i; the Δij represents a relative position between the components i and j; the γij represents a weight corresponding to the relative position Δij; and the ωi represents the appearance of the component i Feature weight coefficient; said pi represents a local apparent feature of said component i; said i and said j taking a positive integer;
    将所述最优人体部件构型中所述人体部件区域的中心位置确定为所述人体部件处关节点位置。The center position of the body part area in the optimal body part configuration is determined as the joint point position at the body part.
  7. 根据权利要求6所述的定位方法,其特征在于,所述利用所述人体部件局部表观模型,提取所述待检测图像的局部表观特征,具体包括:The locating method according to claim 6, wherein the extracting the local appearance feature of the image to be detected by using the local appearance model of the human body component comprises:
    将所述待检测图像划分成多个局部图像区域;Dividing the image to be detected into a plurality of partial image regions;
    将各所述局部图像区域作为所述人体部件局部表观模型的输入,得到所述待检测图像的局部表观特征。 Each of the partial image regions is used as an input of the partial appearance model of the human body component to obtain a partial appearance feature of the image to be detected.
PCT/CN2016/100763 2016-09-29 2016-09-29 Two-dimensional image based human body joint point positioning model construction method, and positioning method WO2018058419A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/100763 WO2018058419A1 (en) 2016-09-29 2016-09-29 Two-dimensional image based human body joint point positioning model construction method, and positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/100763 WO2018058419A1 (en) 2016-09-29 2016-09-29 Two-dimensional image based human body joint point positioning model construction method, and positioning method

Publications (1)

Publication Number Publication Date
WO2018058419A1 true WO2018058419A1 (en) 2018-04-05

Family

ID=61763085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/100763 WO2018058419A1 (en) 2016-09-29 2016-09-29 Two-dimensional image based human body joint point positioning model construction method, and positioning method

Country Status (1)

Country Link
WO (1) WO2018058419A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241853A (en) * 2018-08-10 2019-01-18 平安科技(深圳)有限公司 Pedestrian's method for collecting characteristics, device, computer equipment and storage medium
CN109712234A (en) * 2018-12-29 2019-05-03 北京卡路里信息技术有限公司 Generation method, device, equipment and the storage medium of three-dimensional (3 D) manikin
CN110457999A (en) * 2019-06-27 2019-11-15 广东工业大学 A method for animal pose behavior estimation and mood recognition based on deep learning and SVM
CN111291593A (en) * 2018-12-06 2020-06-16 成都品果科技有限公司 Method for detecting human body posture
CN113012229A (en) * 2021-03-26 2021-06-22 北京华捷艾米科技有限公司 Method and device for positioning human body joint points
CN113496176A (en) * 2020-04-07 2021-10-12 深圳爱根斯通科技有限公司 Motion recognition method and device and electronic equipment
CN114926594A (en) * 2022-06-17 2022-08-19 东南大学 Single-view occluded human motion reconstruction method based on self-supervised spatiotemporal motion priors
EP4276742A4 (en) * 2021-09-29 2024-04-24 NEC Corporation LEARNING DEVICE, ESTIMATION DEVICE, LEARNING METHOD, ESTIMATION METHOD AND PROGRAM

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080137956A1 (en) * 2006-12-06 2008-06-12 Honda Motor Co., Ltd. Fast Human Pose Estimation Using Appearance And Motion Via Multi-Dimensional Boosting Regression
US20090154796A1 (en) * 2007-12-12 2009-06-18 Fuji Xerox Co., Ltd. Systems and methods for human body pose estimation
CN103246884A (en) * 2013-05-22 2013-08-14 清华大学 Real-time human body action recognizing method and device based on depth image sequence
CN105069413A (en) * 2015-07-27 2015-11-18 电子科技大学 Human body gesture identification method based on depth convolution neural network
CN105117694A (en) * 2015-08-16 2015-12-02 北京航空航天大学 A single-picture human body posture estimation method utilizing rotation invariance characteristics
CN105389569A (en) * 2015-11-17 2016-03-09 北京工业大学 Human body posture estimation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080137956A1 (en) * 2006-12-06 2008-06-12 Honda Motor Co., Ltd. Fast Human Pose Estimation Using Appearance And Motion Via Multi-Dimensional Boosting Regression
US20090154796A1 (en) * 2007-12-12 2009-06-18 Fuji Xerox Co., Ltd. Systems and methods for human body pose estimation
CN103246884A (en) * 2013-05-22 2013-08-14 清华大学 Real-time human body action recognizing method and device based on depth image sequence
CN105069413A (en) * 2015-07-27 2015-11-18 电子科技大学 Human body gesture identification method based on depth convolution neural network
CN105117694A (en) * 2015-08-16 2015-12-02 北京航空航天大学 A single-picture human body posture estimation method utilizing rotation invariance characteristics
CN105389569A (en) * 2015-11-17 2016-03-09 北京工业大学 Human body posture estimation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALEXANDER TOSHEV ET AL.: "Deeppose:Human pose estimation via deep neural networks", - 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, - 20140623, 31 December 2014 (2014-12-31), pages 1653 - 1660, XP032649134, DOI: 10.1109/CVPR.2014.214 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241853A (en) * 2018-08-10 2019-01-18 平安科技(深圳)有限公司 Pedestrian's method for collecting characteristics, device, computer equipment and storage medium
CN109241853B (en) * 2018-08-10 2023-11-24 平安科技(深圳)有限公司 Pedestrian characteristic acquisition method and device, computer equipment and storage medium
CN111291593B (en) * 2018-12-06 2023-04-18 成都品果科技有限公司 Method for detecting human body posture
CN111291593A (en) * 2018-12-06 2020-06-16 成都品果科技有限公司 Method for detecting human body posture
CN109712234B (en) * 2018-12-29 2023-04-07 北京卡路里信息技术有限公司 Three-dimensional human body model generation method, device, equipment and storage medium
CN109712234A (en) * 2018-12-29 2019-05-03 北京卡路里信息技术有限公司 Generation method, device, equipment and the storage medium of three-dimensional (3 D) manikin
CN110457999B (en) * 2019-06-27 2022-11-04 广东工业大学 A method for animal pose behavior estimation and mood recognition based on deep learning and SVM
CN110457999A (en) * 2019-06-27 2019-11-15 广东工业大学 A method for animal pose behavior estimation and mood recognition based on deep learning and SVM
CN113496176A (en) * 2020-04-07 2021-10-12 深圳爱根斯通科技有限公司 Motion recognition method and device and electronic equipment
CN113496176B (en) * 2020-04-07 2024-05-14 深圳爱根斯通科技有限公司 Action recognition method and device and electronic equipment
CN113012229A (en) * 2021-03-26 2021-06-22 北京华捷艾米科技有限公司 Method and device for positioning human body joint points
EP4276742A4 (en) * 2021-09-29 2024-04-24 NEC Corporation LEARNING DEVICE, ESTIMATION DEVICE, LEARNING METHOD, ESTIMATION METHOD AND PROGRAM
CN114926594A (en) * 2022-06-17 2022-08-19 东南大学 Single-view occluded human motion reconstruction method based on self-supervised spatiotemporal motion priors

Similar Documents

Publication Publication Date Title
WO2018058419A1 (en) Two-dimensional image based human body joint point positioning model construction method, and positioning method
CN109325398B (en) Human face attribute analysis method based on transfer learning
CN106548194B (en) Construction method and positioning method of two-dimensional image human joint point positioning model
Li et al. Robust visual tracking based on convolutional features with illumination and occlusion handing
CN106055091B (en) A Hand Pose Estimation Method Based on Depth Information and Correction Method
CN102075686B (en) Robust real-time on-line camera tracking method
CN105069413A (en) Human body gesture identification method based on depth convolution neural network
Chen et al. Learning a deep network with spherical part model for 3D hand pose estimation
CN104616319B (en) Multiple features selection method for tracking target based on support vector machines
Sedai et al. A Gaussian process guided particle filter for tracking 3D human pose in video
Raskin et al. Dimensionality reduction using a Gaussian process annealed particle filter for tracking and classification of articulated body motions
CN109003291A (en) Method for tracking target and device
CN103077535A (en) Target tracking method on basis of multitask combined sparse representation
Lee et al. Human pose tracking using multi-level structured models
CN114049541A (en) Visual scene recognition method based on structural information characteristic decoupling and knowledge migration
CN107330363B (en) Rapid internet billboard detection method
Ikram et al. Real time hand gesture recognition using leap motion controller based on cnn-svm architechture
Pateraki et al. Visual human-robot communication in social settings
CN106127806B (en) RGB-D video target tracking methods based on depth Boltzmann machine cross-module formula feature learning
Lin et al. Robot grasping based on object shape approximation and LightGBM
CN107798329A (en) Adaptive particle filter method for tracking target based on CNN
Yang et al. An efficient tracking system by orthogonalized templates
Kim et al. Human Activity Recognition as Time‐Series Analysis
Singh et al. Simultaneous tracking and action recognition for single actor human actions
Keskin et al. STARS: Sign tracking and recognition system using input–output HMMs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16917173

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16917173

Country of ref document: EP

Kind code of ref document: A1