CN108764050A

CN108764050A - Skeleton Activity recognition method, system and equipment based on angle independence

Info

Publication number: CN108764050A
Application number: CN201810398601.0A
Authority: CN
Inventors: 原春锋; 李鸽; 胡卫明
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2018-11-06
Anticipated expiration: 2038-04-28
Also published as: CN108764050B

Abstract

The invention relates to the field of human behavior recognition, in particular to an angle-independent skeleton behavior recognition method, system and equipment, which only improves the accuracy of angle-independent skeleton behavior recognition. The angle-independent skeleton behavior recognition method of the present invention includes: designing a view-specific subnet based on the skeleton sequence of each view, focusing on key joints and key frames through spatial attention and time domain attention modules, and using multiple The layer-long short-term memory network learns the discriminative features of each view sequence; the output features of each specific view subnetwork are concatenated as the input of the public subnetwork, and the angle-independent features are further learned through the bidirectional long-term short-term memory network. The module focuses on key perspectives; a regularized cross-entropy loss function is proposed to promote the joint learning of multiple network modules. The invention effectively improves the recognition accuracy, and can automatically focus on learning the characteristics of the viewing angle with more information.

Description

Skeleton Behavior Recognition Method, System, and Equipment Based on Angle-Independence

技术领域technical field

本发明涉及人体行为识别领域，具体涉及一种基于角度无关性的骨架行为识别方法、系统及设备。The invention relates to the field of human behavior recognition, in particular to a skeleton behavior recognition method, system and equipment based on angle independence.

背景技术Background technique

作为计算机视觉的一个重要研究领域，人体行为识别是通过输入数据进行人体行为分类识别的系统。从系统的输入输出角度来看，输入是一种或者多种人体行为相关的数据，数据为不同传感器通过一定频率采样得到的时间序列。系统的输出是人体行为的识别分类结果。一般来说，人体行为识别系统的输入分为四种形式的数据：RGB时间序列、骨架时间序列、深度图视频和红外线视频。随着深度传感器的快速发展，骨架数据的获取越来越方便快捷，因此，基于骨架的人体行为识别也受到越来越多的关注。在行为识别研究中，复杂的数据变化是该研究的一个主要挑战，骨架数据相比于传统的RGB数据更具有鲁棒性，但是在视角变化方面，基于角度无关性的骨架行为识别和基于角度无关性的RGB行为识别同样具有挑战性。As an important research field of computer vision, human behavior recognition is a system that classifies and recognizes human behavior through input data. From the point of view of the input and output of the system, the input is one or more data related to human behavior, and the data is a time series obtained by different sensors through sampling at a certain frequency. The output of the system is the recognition and classification result of human behavior. Generally speaking, the input of human action recognition system is divided into four forms of data: RGB time series, skeleton time series, depth map video and infrared video. With the rapid development of depth sensors, the acquisition of skeleton data is becoming more and more convenient and fast. Therefore, skeleton-based human behavior recognition has also received more and more attention. In behavior recognition research, complex data changes are a major challenge in this research. Skeleton data is more robust than traditional RGB data, but in terms of viewing angle changes, skeleton behavior recognition based on angle independence and angle-based Unrelated RGB action recognition is equally challenging.

基于角度无关性的骨架行为识别技术，其关键在于三部分：一方面是如何提取判别性强的特征，一方面是如何减少角度变化对行为识别的影响，还有一方面是如何利用时域相关性对行为动作的动态变化建模。根据人体行为识别的建模方式，分为传统方法建模和深度学习方法建模。其中传统的方式分为两个过程：特征表征和动作的识别及理解。传统的特征提取方法有HOG(Histogram of Oriented Gradient)、SIFT(Scale-InvariantFeature Transform)等特征提取方法；提取后的特征通常采用常见的分类算法，如支持向量机(Support Vector Machine，SVM)进行分类识别。而随着深度学习理论的提出和发展，深度学习算法越来越多地应用于人体行为识别的研究。循环神经网络(Recurrent NeuralNetwork，RNN)致力于对视频信息的建模，将上几个隐含层数据作为当前时刻的输入，从而允许时间维度上的信息得以保留；基于长短时记忆(Long Short Term Memory，LSTM)的循环神经网络(RNN)是普通RNN模型的扩展，主要解决RNN模型中的梯度消亡现象。因此，近年来的大部分人体骨架行为识别都是利用深度学习中的基于LSTM的RNN模型。The key to the angle-independent skeleton behavior recognition technology lies in three parts: on the one hand, how to extract highly discriminative features, on the one hand, how to reduce the impact of angle changes on behavior recognition, and on the other hand, how to use time-domain correlation Sex models the dynamics of behavioral movements. According to the modeling method of human behavior recognition, it can be divided into traditional method modeling and deep learning method modeling. The traditional method is divided into two processes: feature representation and action recognition and understanding. Traditional feature extraction methods include HOG (Histogram of Oriented Gradient), SIFT (Scale-Invariant Feature Transform) and other feature extraction methods; the extracted features are usually classified by common classification algorithms, such as Support Vector Machine (Support Vector Machine, SVM) identify. With the introduction and development of deep learning theory, deep learning algorithms are increasingly used in the research of human behavior recognition. Recurrent Neural Network (RNN) is dedicated to modeling video information, using the last few hidden layer data as the input of the current moment, allowing information in the time dimension to be preserved; based on Long Short Term Memory (Long Short Term The recurrent neural network (RNN) of Memory, LSTM) is an extension of the ordinary RNN model, which mainly solves the gradient disappearance phenomenon in the RNN model. Therefore, most of the human skeleton behavior recognition in recent years utilizes the LSTM-based RNN model in deep learning.

当前的基于LSTM的角度无关性骨架行为识别方法的主要问题在于没有充分挖掘给定序列的全部信息以及识别准确率有待提高。具体来说，通过LSTM提取单个视角下视频序列的判别性特征，从而忽略了同一行为在多视角下视频之间的联系；同时多视角骨架数据中的每个关节点，每一帧以及每个视角都对角度无关性骨架行为识别有不同的作用，而仅通过LSTM的建模方式，骨架数据的不同组成结构对行为识别有相同的贡献，限制了角度无关性骨架行为识别的准确率。The main problem of the current LSTM-based angle-independent skeleton behavior recognition method is that it does not fully exploit all the information of the given sequence and the recognition accuracy needs to be improved. Specifically, LSTM is used to extract the discriminative features of video sequences under a single view, thus ignoring the connection between videos of the same behavior under multiple views; at the same time, each joint point in the multi-view skeleton data, each frame and each Views have different effects on angle-independent skeleton behavior recognition, but only through LSTM modeling methods, different composition structures of skeleton data have the same contribution to behavior recognition, which limits the accuracy of angle-independent skeleton behavior recognition.

发明内容Contents of the invention

为了解决现有技术中的上述问题，本发明提出了一种基于角度无关性的骨架行为识别方法、系统及设备，提高了角度无关性骨架行为识别的准确率。In order to solve the above problems in the prior art, the present invention proposes an angle-independent skeleton behavior recognition method, system and equipment, which improves the accuracy of angle-independent skeleton behavior recognition.

本发明的一方面，提出一种基于角度无关性的骨架行为识别方法，包括：In one aspect of the present invention, a method of skeleton behavior recognition based on angle independence is proposed, including:

将待识别的骨架时间序列按不同的视角，输入到训练好的骨架行为识别模型中；Input the skeleton time series to be recognized into the trained skeleton behavior recognition model according to different perspectives;

利用所述训练好的骨架行为识别模型，计算待识别骨架时间序列的行为类别概率；Using the trained skeleton behavior recognition model to calculate the behavior category probability of the skeleton time series to be recognized;

其中，in,

所述骨架行为识别模型，包括：预设数量的特定视角子网，以及公共子网；The skeleton behavior recognition model includes: a preset number of specific perspective subnets and public subnets;

所述骨架行为识别模型的训练方法，包括以下步骤：The training method of described skeleton behavior recognition model, comprises the following steps:

步骤S1，针对每个所述特定视角子网，输入与该特定视角对应的一帧训练数据，分别计算空域注意力权重、时域注意力权重，进而计算出该特定视角子网的判别性特征；Step S1, for each of the specific view subnets, input a frame of training data corresponding to the specific view, respectively calculate the spatial domain attention weight, the temporal domain attention weight, and then calculate the discriminative features of the specific view subnetwork ;

步骤S2，将各个所述特定视角子网的判别性特征串联为视角序列，作为所述公共子网的输入，计算角度无关性特征和视角注意力权重，进而计算出所述训练数据的行为类别的概率；Step S2, concatenating the discriminative features of each of the specific view subnetworks into a view sequence, as the input of the public subnetwork, calculating the view independent features and view attention weights, and then calculating the behavior category of the training data The probability;

步骤S3，判断训练数据是否已全部输入，若是，则转至步骤S4；否则，转至步骤S1；Step S3, judging whether all the training data has been input, if so, go to step S4; otherwise, go to step S1;

步骤S4，计算损失函数；Step S4, calculating the loss function;

步骤S5，判断损失函数是否收敛，若是则训练结束，否则，转至步骤S6；Step S5, judging whether the loss function is convergent, if so, the training ends, otherwise, go to step S6;

步骤S6，调整所述骨架行为识别模型的参数，转至步骤S1。Step S6, adjust the parameters of the skeleton behavior recognition model, go to step S1.

优选地，所述特定视角子网，包括：空域注意力模块、时域注意力模块、判别性特征提取模块；Preferably, the view-specific subnetwork includes: a spatial attention module, a temporal attention module, and a discriminative feature extraction module;

所述公共子网，包括：双向长短时记忆网络、视角注意力模块、概率计算模块。The public subnet includes: a two-way long-short-term memory network, a perspective attention module, and a probability calculation module.

优选地，步骤S1中“计算空域注意力权重、时域注意力权重，进而计算出该特定视角子网的判别性特征”，具体包括：Preferably, in step S1, "calculate the spatial domain attention weight and the temporal domain attention weight, and then calculate the discriminative features of the specific viewing angle subnet", specifically including:

通过所述空域注意力模块，为每个关节点分配注意力权重；Assigning attention weights to each joint point through the spatial attention module;

通过所述时域注意力模块，为每一帧分配时域注意力权重；Through the time domain attention module, assign time domain attention weights for each frame;

根据所述训练数据和所述空域注意力权重，通过所述判别性特征提取模块提取所述训练数据在该特定视角上的判别性特征；According to the training data and the spatial attention weight, the discriminative feature of the training data on the specific viewing angle is extracted by the discriminative feature extraction module;

根据所述时域注意力权重和所述该特定视角上的判别性特征，输出该特定视角子网的判别性特征。Outputting the discriminative features of the specific view subnet according to the temporal attention weights and the discriminative features on the specific view.

优选地，步骤S2中“计算角度无关性特征和视角注意力权重，进而计算出所述训练数据的行为类别的概率”，具体包括：Preferably, in step S2, "calculate angle-independent features and perspective attention weights, and then calculate the probability of the behavior category of the training data", specifically includes:

通过所述双向长短时记忆网络输出角度无关性特征；Outputting angle-independent features through the bidirectional long-short-term memory network;

通过所述视角注意力模块给每个所述特定视角分配不同的视角注意力权重；Assigning different perspective attention weights to each of the specific perspectives through the perspective attention module;

根据所述角度无关性特征、所述视角注意力权重，通过所述概率计算模块，得到所述训练数据的行为类别的概率。The probability of the behavior category of the training data is obtained through the probability calculation module according to the angle-independent feature and the view weight.

优选地，所述损失函数为：Preferably, the loss function is:

其中，in,

第一项为整个网络的交叉熵损失；y_i为所述训练数据的真实标签；为所述公共子网预测出的所述训练数据属于第i个行为类别的概率；C为行为类别的数量；The first item is the cross-entropy loss of the entire network; _yi is the true label of the training data; The probability that the training data predicted for the public subnetwork belongs to the i-th behavior category; C is the number of behavior categories;

λ₁、λ₂和λ₃为平衡整个网络的参数；λ ₁ , λ ₂ and λ ₃ are parameters for balancing the entire network;

第二项为所述空域注意力模块的正则项；K为关节点个数；v为所述特定视角的个数；T为输入的所述训练数据的帧数；为第j个视角下第t帧中第k个关节点的所述空域注意力权重；The second item is the regular term of the spatial attention module; K is the number of joint points; v is the number of the specific viewing angle; T is the number of frames of the input training data; is the spatial attention weight of the k-th joint point in the t-th frame under the j-th viewing angle;

第三项为所述时域注意力模块的正则项；为第j个视角下第t帧的所述时域注意力权重；The third item is the regular term of the temporal attention module; Be the time-domain attention weight of the t-th frame under the j-th viewing angle;

第四项为参数的正则项；W_sv为网络的连接矩阵，使用L₁范数防止整个网络过拟合。The fourth item is the regular term of the parameter; W _sv is the connection matrix of the network, and the L ₁ norm is used to prevent the entire network from overfitting.

优选地，所述空域注意力模块，由LSTM层、两个全连接层，以及一个tanh激活单元组成；Preferably, the spatial attention module consists of an LSTM layer, two fully connected layers, and a tanh activation unit;

相应地，计算所述空域注意权重的方法包括：Correspondingly, the method for calculating the airspace attention weight includes:

输入数据经过所述空域注意力模块，得出第t帧内个K关节点的对应分数：The input data passes through the airspace attention module to obtain the corresponding score of the K joint point in the tth frame:

将得到的分数归一化，得到每个关节点的所述空域注意力权重：Normalize the obtained scores to obtain the spatial attention weight of each joint point:

其中，in,

W_es、W_xs、W_hs均为需要学习的参数矩阵；为第t帧的输入数据；表示第t-1帧的输入数据经过LSTM层的空域隐输出；b_s和b_es均为偏置项；W _es , W _xs , W _hs are parameter matrices that need to be learned; is the input data of frame t; Indicates the input data of frame t-1 Spatial hidden output through the LSTM layer; b _s and be _es are both bias items;

为第j个视角下第t帧中第k个关节点的所述空域注意力权重；为第j个视角下第t帧中第k个关节点的对应分数；为第j个视角下第t帧中第l个关节点的对应分数。 is the spatial attention weight of the k-th joint point in the t-th frame under the j-th viewing angle; is the corresponding score of the k-th joint point in the t-th frame under the j-th viewing angle; is the corresponding score of the l-th joint point in the t-th frame under the j-th viewing angle.

优选地，所述时域注意力模块，由LSTM层、一个全连接层，以及一个ReLU激活单元组成；Preferably, the temporal attention module is composed of an LSTM layer, a fully connected layer, and a ReLU activation unit;

相应地，计算所述时域注意力权重的方法包括：Correspondingly, the method for calculating the temporal domain attention weight includes:

其中，in,

为第t帧的所述时域注意力权重；W_e1、W_e2均为需要学习的参数矩阵，表示第t-1帧的输入数据经过LSTM层后的时域隐输出；b_e为偏置项。 is the time-domain attention weight of the t-th frame; W _e1 and W _e2 are parameter matrices that need to be learned, Indicates the time-domain hidden output of the input data of the t-1th frame after passing through the LSTM layer; b _e is the bias item.

优选地，所述判别性特征提取模块，由3层LSTM构成；Preferably, the discriminative feature extraction module is composed of 3 layers of LSTM;

所述判别性特征提取模块的输入由所述空域注意力权重和输入数据逐元素点乘得到：The input of the discriminative feature extraction module The element-wise dot product of the spatial attention weight and the input data is obtained:

其中，in,

分别为第j个视角下第t帧中第k个关节点的所述空域注意力权重和输入数据； Respectively, the spatial domain attention weight and input data of the kth joint point in the tth frame under the jth viewing angle;

所述判别性特征提取模块的输出与所述时域注意力权重进行点乘作为所述公共子网输入数据的第j个元素：The output of the discriminative feature extraction module Perform dot multiplication with the time-domain attention weight as the jth element of the public subnetwork input data:

优选地，所述骨架行为识别模型的训练方法，在步骤S1之前还包括对训练数据进行预处理的步骤：Preferably, the training method of the skeleton behavior recognition model also includes the step of preprocessing the training data before step S1:

步骤S0，将同一受试者在相同环境下的相同行为对应的骨架序列，按视角的不同进行分组；将每个关节点的坐标保留小数点后四位；每个视角的骨架序列取前100帧，不足100帧的，取最后一帧数据补齐。Step S0, group the skeleton sequences corresponding to the same behavior of the same subject in the same environment according to different viewing angles; keep the coordinates of each joint point to four decimal places; take the first 100 frames of the skeleton sequence for each viewing angle If there are less than 100 frames, the data of the last frame will be used to make up.

本发明的第二方面，提出一种存储设备，其中存储有程序，所述程序适于由处理器加载并执行，以实现上面所述的基于角度无关性的骨架行为识别方法。In the second aspect of the present invention, a storage device is provided, in which a program is stored, and the program is suitable for being loaded and executed by a processor, so as to realize the above-mentioned angle-independence-based skeleton behavior recognition method.

本发明的第三方面，提出一种处理设备，包括：处理器和存储设备；其中，所述处理器适于执行程序；所述存储设备适于存储该程序；所述程序适于由处理器加载并执行以实现上面所述的基于角度无关性的骨架行为识别方法。In a third aspect of the present invention, a processing device is proposed, including: a processor and a storage device; wherein, the processor is adapted to execute a program; the storage device is adapted to store the program; and the program is adapted to be executed by the processor Load and execute to realize the angle-independent-based skeleton behavior recognition method described above.

本发明的第四方面，提出一种基于角度无关性的骨架行为识别系统，包括：控制单元，以及骨架行为识别模型；In the fourth aspect of the present invention, an angle-independent skeleton behavior recognition system is proposed, including: a control unit, and a skeleton behavior recognition model;

所述控制单元，用于对所述骨架行为识别模型进行训练，并利用训练好的所述骨架行为识别模型计算待识别骨架时间序列的行为类别概率；The control unit is used to train the skeleton behavior recognition model, and use the trained skeleton behavior recognition model to calculate the behavior category probability of the skeleton time series to be recognized;

其中，in,

所述特定视角子网，包括：空域注意力模块、时域注意力模块、判别性特征提取模块；The specific view subnet includes: a spatial attention module, a temporal attention module, and a discriminative feature extraction module;

本发明的有益效果：Beneficial effects of the present invention:

本发明基于每个视角的骨架序列设计特定视角子网，通过空域注意力和时域注意力模块分别重点关注关键关节点和关键帧，通过多层长短时记忆网络学习每个视角序列的判别性特征；将各个特定视角子网的输出特征串联起来作为公共子网的输入，通过双向长短时记忆网络进一步学习角度无关性特征，通过视角注意力模块重点关注关键视角；提出正则化交叉熵损失函数推动网络多模块共同学习。本发明融合特定视角子网和公共子网，充分挖掘给定多视角序列的全部信息，同时添加时空注意力以及视角注意力模块，有效地提高了行为识别的准确率，能够自动专注学习信息较多的视角特征，充分挖掘了给定序列的全部信息。The invention designs a view-specific subnetwork based on the skeleton sequence of each view, focuses on key joints and key frames through the spatial attention and temporal attention modules, and learns the discriminability of each view sequence through a multi-layer long-short-term memory network Features; the output features of each specific view subnetwork are concatenated as the input of the common subnetwork, and the view-independent features are further learned through the bidirectional long short-term memory network, and the key view points are focused on through the view attention module; a regularized cross-entropy loss function is proposed Promote the joint learning of multiple modules of the network. The invention integrates specific view subnets and public subnets, fully mines all information of a given multi-view sequence, and adds spatiotemporal attention and view attention modules at the same time, effectively improving the accuracy of behavior recognition, and can automatically focus on learning information Multi-view features fully exploit all the information of a given sequence.

附图说明Description of drawings

图1是本发明的基于角度无关性的骨架行为识别方法实施例的流程示意图；Fig. 1 is the schematic flow chart of the embodiment of the skeleton behavior recognition method based on angle independence of the present invention;

图2是本发明的骨架行为识别模型的训练方法实施例的流程示意图；Fig. 2 is the schematic flow chart of the training method embodiment of skeleton behavior recognition model of the present invention;

图3是本发明的骨架行为识别模型实施例的信号流向示意图；Fig. 3 is a schematic diagram of the signal flow of the skeleton behavior recognition model embodiment of the present invention;

图4是本发明的骨架行为识别模型实施例中特定视角子网的构成示意图；Fig. 4 is a schematic diagram of the composition of a specific view subnet in an embodiment of the skeleton behavior recognition model of the present invention;

图5是本发明的骨架行为识别模型实施例中公共子网的构成示意图；Fig. 5 is a schematic diagram of the composition of the public subnet in the skeleton behavior recognition model embodiment of the present invention;

图6是本发明的基于角度无关性的骨架行为识别系统实施例的构成示意图。FIG. 6 is a schematic diagram of an embodiment of the angle-independent-based skeleton behavior recognition system of the present invention.

具体实施方式Detailed ways

下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the protection scope of the present invention.

为解决现有骨架行为识别技术没有充分挖掘给定序列的全部信息以及识别准确率有待提高的问题。本发明提出一种基于时空视角注意力深度网络的角度无关性骨架行为识别方法，融合特定视角子网和公共子网充分挖掘给定多视角序列的全部信息，同时添加时空注意力以及视角注意力，以提高行为识别的准确率。其方法设计思路如下：(1)基于每个视角的骨架序列设计特定视角子网，通过空域注意力和时域注意力模块分别重点关注关键关节点和关键帧，通过判别性特征提取模块学习每个视角序列的判别性特征；(2)将各个特定视角子网的输出特征作为公共子网的输入，通过双向长短时记忆网络Bi-LSTM进一步学习角度无关性特征，通过视角注意力模块重点关注关键视角；(3)提出正则化交叉熵损失函数推动网络多模块共同学习。In order to solve the problem that the existing skeleton behavior recognition technology does not fully mine all the information of the given sequence and the recognition accuracy needs to be improved. The present invention proposes an angle-independent skeletal behavior recognition method based on the deep network of spatio-temporal perspective attention, which integrates specific perspective subnetworks and public subnetworks to fully mine all the information of a given multi-viewpoint sequence, and simultaneously adds spatiotemporal attention and perspective attention , to improve the accuracy of behavior recognition. The design idea of the method is as follows: (1) Design a view-specific subnetwork based on the skeleton sequence of each view, focus on key joints and key frames through the spatial attention and temporal attention modules, and learn each view through the discriminative feature extraction module. Discriminative features of each view sequence; (2) The output features of each specific view subnetwork are used as the input of the public subnetwork, and the view-independent features are further learned through the bidirectional long-short-term memory network Bi-LSTM, and the view-attention module focuses on Key perspective; (3) A regularized cross-entropy loss function is proposed to promote the joint learning of multiple network modules.

图1是本发明的基于角度无关性的骨架行为识别方法实施例的流程示意图。如图1所示，本实施例的识别方法包括以下步骤：FIG. 1 is a schematic flow chart of an embodiment of the method for recognizing skeleton behavior based on angle independence in the present invention. As shown in Figure 1, the identification method of this embodiment includes the following steps:

步骤A1，将待识别的骨架时间序列按不同的视角，输入到训练好的骨架行为识别模型中；Step A1, input the skeleton time series to be recognized into the trained skeleton behavior recognition model according to different perspectives;

步骤A2，利用训练好的骨架行为识别模型，计算待识别骨架时间序列的行为类别概率。Step A2, using the trained skeleton behavior recognition model to calculate the behavior category probability of the skeleton time series to be recognized.

其中，骨架行为识别模型，包括：预设数量的特定视角子网，以及公共子网。Among them, the skeleton behavior recognition model includes: a preset number of view-specific subnets and public subnets.

图2是本发明的骨架行为识别模型的训练方法实施例的流程示意图。图3是本发明的骨架行为识别模型实施例的信号流向示意图。如图2所示，本实施例的训练方法包括步骤S0-S6：Fig. 2 is a schematic flowchart of an embodiment of the training method of the skeleton behavior recognition model of the present invention. Fig. 3 is a schematic diagram of the signal flow of the embodiment of the skeleton behavior recognition model of the present invention. As shown in Figure 2, the training method of the present embodiment includes steps S0-S6:

在步骤S0中，对训练数据进行预处理：将同一受试者在相同环境下的相同行为对应的骨架序列，按视角的不同进行分组；将每个关节点的坐标保留小数点后四位；每个视角的骨架序列取前100帧(样本小于100帧的，取最后一帧数据补齐；样本大于100帧的，取前100帧)。In step S0, the training data is preprocessed: the skeleton sequences corresponding to the same behavior of the same subject in the same environment are grouped according to different viewing angles; the coordinates of each joint point are kept to four decimal places; each Take the first 100 frames of the skeleton sequence of each perspective (if the sample is less than 100 frames, take the last frame data to complete; if the sample is more than 100 frames, take the first 100 frames).

如图3所示，将训练数据集分成v组，分别送入v个视角子网。每个特定视角子网包括：空域注意力模块、时域注意力模块、判别性特征提取模块；v个视角子网的输出串联输入公共子网，公共子网包括：双向长短时记忆网络、视角注意力模块、概率计算模块。As shown in Figure 3, the training data set is divided into v groups, which are sent to v view subnetworks respectively. Each view-specific subnet includes: spatial attention module, temporal attention module, and discriminative feature extraction module; the outputs of v view subnetworks are serially input into the public subnetwork, and the public subnetwork includes: bidirectional long-short-term memory network, perspective Attention module, probability calculation module.

在第j个特定视角子网中，输入数据为给定动作在第j个视角下第t帧中的K个关节点信息，如公式(1)所示：In the j-th specific viewing angle subnetwork, the input data is the K joint point information in the t-th frame under the j-th viewing angle for a given action, as shown in formula (1):

其中，K表示一帧中关节点的数量 Among them, K represents the number of joint nodes in a frame

在步骤S1中，针对每个特定视角子网，输入与该特定视角对应的一帧训练数据，分别计算空域注意力权重、时域注意力权重，进而计算出该特定视角子网的判别性特征。该步骤具体分为步骤S11-S14：In step S1, for each specific view subnetwork, input a frame of training data corresponding to the specific view, calculate the spatial domain attention weight and temporal domain attention weight respectively, and then calculate the discriminative features of the specific view subnetwork . This step is specifically divided into steps S11-S14:

在步骤S11中，通过空域注意力模块(Spatial Attention Module，SAM)，为每个关节点分配注意力权重。In step S11, assign attention weights to each joint point through the Spatial Attention Module (SAM).

图4是本发明的骨架行为识别模型实施例中特定视角子网的构成示意图。如图4所示：空域注意力模块由LSTM层、两个全连接层(FC)，以及一个tanh激活单元组成。将上一帧输入数据通过LSTM层得出的隐输出作为当前帧的历史信息，该历史信息和当前帧的输入数据共同经过全连接层，以及激活层的非线性操作得到如公式(2)所示的当前第t帧内个K关节点的对应分数：Fig. 4 is a schematic diagram of the composition of a specific view subnetwork in the embodiment of the skeleton behavior recognition model of the present invention. As shown in Figure 4: the spatial attention module consists of an LSTM layer, two fully connected layers (FC), and a tanh activation unit. Input the previous frame into data Hidden output from LSTM layer As the historical information of the current frame, the historical information and the input data of the current frame Through the fully connected layer and the nonlinear operation of the activation layer, the corresponding scores of the K joint points in the current t-th frame are obtained as shown in the formula (2):

的计算方法如公式(3)所示： The calculation method of is shown in formula (3):

得出的分数分别对应每个关节点，表示每个关节点对该模型的重要程度。再将得到的分数进行归一化，得到每个关节点的关节点选择门，即空域注意力权重，对于第k个关节点，关节点选择门如公式(4)所示：The resulting score corresponds to each joint point, indicating how important each joint point is to the model. Then normalize the obtained scores to obtain the joint point selection gate of each joint point, that is, the spatial attention weight. For the kth joint point, the joint point selection gate is shown in formula (4):

其中：in:

W_es、W_xs、W_hs均为需要学习的参数矩阵；为第t帧的输入数据；表示第t-1帧的输入数据经过LSTM层的空域隐输出；b_s和b_es均为偏置项；为第j个视角下第t帧中第k个关节点的空域注意力权重；为第j个视角下第t帧中第k个关节点的对应分数；为第j个视角下第t帧中第l个关节点的对应分数。W _es , W _xs , W _hs are parameter matrices that need to be learned; is the input data of frame t; Indicates the input data of frame t-1 Spatial hidden output through the LSTM layer; b _s and be _es are both bias items; is the spatial attention weight of the k-th joint point in the t-th frame under the j-th view; is the corresponding score of the k-th joint point in the t-th frame under the j-th viewing angle; is the corresponding score of the l-th joint point in the t-th frame under the j-th viewing angle.

在步骤S12中，通过时域注意力模块(Temporal Attention Model，TAM)，为每一帧分配时域注意力权重。In step S12, a temporal attention weight is assigned to each frame through a temporal attention module (Temporal Attention Model, TAM).

由图4还可以看出，时域注意力模块由LSTM层、一个全连接层，以及一个ReLU激活单元组成。计算时域注意力权重的方法如公式(5)所示：It can also be seen from Figure 4 that the temporal attention module consists of an LSTM layer, a fully connected layer, and a ReLU activation unit. The method of calculating the temporal attention weight is shown in formula (5):

其中，in,

为计算出的第t帧的时域注意力权重；W_e1、W_e2均为需要学习的参数矩阵，表示第t-1帧的输入数据经过LSTM层后的时域隐输出；b_e为偏置项。 is the calculated time-domain attention weight of the t-th frame; W _e1 and W _e2 are parameter matrices that need to be learned, Indicates the time-domain hidden output of the input data of the t-1th frame after passing through the LSTM layer; b _e is the bias item.

在步骤S13中，根据训练数据和空域注意力权重，通过判别性特征提取模块提取训练数据在该特定视角上的判别性特征。In step S13, according to the training data and the spatial attention weight, the discriminative feature extraction module extracts the discriminative features of the training data on the specific viewing angle.

由图4还可以看出，判别性特征提取模块由3层LSTM构成。该模块的输入如公式(6)所示：It can also be seen from Figure 4 that the discriminative feature extraction module is composed of 3 layers of LSTM. The module's input As shown in formula (6):

输入由空域注意力权重和输入数据逐元素点乘得到，如公式(7)所示：enter It is obtained by element-wise dot multiplication of the spatial attention weight and the input data, as shown in formula (7):

分别为第j个视角下第t帧中第k个关节点的空域注意力权重和输入数据。该步骤中，将空域注意力权重作用于判别性特征提取模块的输入数据，使网络能够自动地选择性学习关键的关节点。 are the spatial attention weight and input data of the k-th joint point in the t-th frame from the j-th viewpoint, respectively. In this step, the spatial attention weight is applied to the input data of the discriminative feature extraction module, so that the network can automatically and selectively learn key joint points.

在步骤S14中，根据时域注意力权重和该特定视角上的判别性特征，输出该特定视角子网的判别性特征。In step S14, according to the temporal attention weight and the discriminative feature on the specific view, the discriminative features of the specific view subnetwork are output.

判别性特征提取模块的输出与时域注意力权重点乘，得到该特定视角子网的判别性特征，作为公共子网输入数据的第j个元素，如公式(8)所示：The output of the discriminative feature extraction module Multiplied with the time-domain attention weight, the discriminative feature of the specific view subnetwork is obtained, which is used as the jth element of the input data of the public subnetwork, as shown in formula (8):

在步骤S2中，将各个特定视角子网的判别性特征串联为视角序列作为公共子网的输入，计算角度无关性特征和视角注意力权重，进而计算出训练数据的行为类别的概率。该步骤具体分为步骤S21-S23：In step S2, the discriminative features of each view-specific sub-network are concatenated into a view sequence as the input of the common sub-network, and the view-independent features and view attention weights are calculated, and then the probability of the behavior category of the training data is calculated. This step is specifically divided into steps S21-S23:

在步骤S21中，通过双向长短时记忆网络Bi-LSTM输出角度无关性特征。In step S21, the angle-independent feature is output through a bidirectional long-short-term memory network Bi-LSTM.

将预设数量(v个)特定视角子网的输出串联为视角序列作为公共视角子网中双向长短时记忆网络的输入，如公式(9)所示：Concatenate the output of the preset number (v) specific view subnetworks into a view sequence as the input of the bidirectional long-short-term memory network in the public view subnetwork, as shown in formula (9):

z＝[α¹,α²,...,α^v] (9)z=[α ¹ ,α ² ,...,α ^v ] (9)

图5是本发明的骨架行为识别模型实施例中公共子网的构成示意图。如图5所示：Bi-LSTM学习同一行为在多个视角下潜在的共有特征，即角度无关性特征，也就是根据第j个视角下的上下文信息，计算正向和逆向的隐状态和如公式(10)、(11)所示：Fig. 5 is a schematic diagram of the composition of the public subnet in the embodiment of the skeleton behavior recognition model of the present invention. As shown in Figure 5: Bi-LSTM learns the potential shared features of the same behavior in multiple perspectives, that is, angle-independent features, that is, according to the context information in the jth perspective, the forward and reverse hidden states are calculated and As shown in formulas (10) and (11):

然后将两个方向的隐状态和串联构成一个隐状态h_j。其中，W_j为双向LSTM中需要学习的权重参数。Then the hidden states in both directions and Concatenation constitutes a hidden state h _j . Among them, W _j is the weight parameter that needs to be learned in the bidirectional LSTM.

在步骤S22中，Bi-LSTM每个视角的输出组成序列作为VAM的输入，如图5中的虚线箭头所示，将v个隐状态组成隐状态集，如公式(12)所示：In step S22, the output sequence of each view of Bi-LSTM is used as the input of VAM, as shown by the dotted arrow in Figure 5, and v hidden states are formed into a hidden state set, as shown in formula (12):

H＝(h₁,h₂,...,h_v) (12)H=(h ₁ ,h ₂ ,...,h _v ) (12)

再通过视角注意力模块(View Attention Module，VAM)给每个特定视角分配不同的视角注意力权重，由图5可知本实施例中视角注意力模块包括两个全连接层FC和一个Tanh激活层，计算出的视角注意力权重如公式(13)所示，是针对每个视角分配一个权重值：Then assign different perspective attention weights to each specific perspective through the View Attention Module (VAM). It can be seen from Figure 5 that the perspective attention module in this embodiment includes two fully connected layers FC and a Tanh activation layer , the calculated perspective attention weight is shown in formula (13), which assigns a weight value to each perspective:

β＝(β¹,β²,...,β^V) (13)β=(β ¹ ,β ² ,...,β ^V ) (13)

在步骤S23中，根据角度无关性特征、视角注意力权重，通过概率计算模块，得到训练数据的行为类别的概率。由图5可知本实施例中概率计算模块包括一个全连接层和一个Softmax层。In step S23, according to the angle-independent feature and the angle-of-view attention weight, the probability calculation module is used to obtain the probability of the behavior category of the training data. It can be seen from FIG. 5 that the probability calculation module in this embodiment includes a fully connected layer and a Softmax layer.

步骤S3，判断T帧训练数据是否已全部输入，若是，则转至步骤S4；否则，转至步骤S1；Step S3, judging whether all T frames of training data have been input, if so, go to step S4; otherwise, go to step S1;

在步骤S4中，计算正则化的交叉熵损失函数，如公式(14)所示：In step S4, the regularized cross-entropy loss function is calculated, as shown in formula (14):

其中，in,

第一项为整个网络的交叉熵损失；y_i为训练数据的真实标签；为公共子网预测出的训练数据属于第i个行为类别的概率；C为行为类别的数量。The first item is the cross-entropy loss of the entire network; y _i is the real label of the training data; The probability that the training data predicted for the public subnetwork belongs to the i-th behavior category; C is the number of behavior categories.

λ₁、λ₂和λ₃为平衡整个网络的参数。λ ₁ , λ ₂ and λ ₃ are parameters for balancing the entire network.

第二项为空域注意力模块的正则项，使得骨架行为识别模型能够动态地集中在每个视角对应序列内每一帧中的关键关节点；K为关节点个数；v为特定视角的个数；T为输入的训练数据的帧数；为第j个视角下第t帧中第k个关节点的空域注意力权重。The second item is the regular term of the spatial attention module, so that the skeleton behavior recognition model can dynamically focus on the key joint points in each frame in the sequence corresponding to each view; K is the number of joint points; v is the number of specific view points number; T is the number of frames of the input training data; is the spatial attention weight of the k-th joint point in the t-th frame under the j-th viewpoint.

第三项为时域注意力模块的正则项，使得骨架行为识别模型能够动态地集中在关键帧上；为第j个视角下第t帧的时域注意力权重。The third term is the regular term of the temporal attention module, which enables the skeleton behavior recognition model to dynamically focus on key frames; is the temporal attention weight of the t-th frame under the j-th viewing angle.

在步骤S5中，判断损失函数是否收敛，若是则训练结束，否则，转至步骤S6。In step S5, it is judged whether the loss function is convergent, if yes, the training ends, otherwise, go to step S6.

在步骤S6中，调整骨架行为识别模型的参数，转至步骤S1。In step S6, adjust the parameters of the skeleton behavior recognition model, and go to step S1.

本发明的一种存储设备的实施例，其中存储有程序，所述程序适于由处理器加载并执行，以实现上面所述的基于角度无关性的骨架行为识别方法。An embodiment of a storage device of the present invention stores a program therein, and the program is suitable for being loaded and executed by a processor to realize the angle-independent-based skeleton behavior recognition method described above.

本发明的一种处理设备，包括：处理器和存储器。其中，处理器适于执行程序；存储设备适于存储该程序；所述程序适于由处理器加载并执行以实现上面所述的基于角度无关性的骨架行为识别方法。A processing device of the present invention includes: a processor and a memory. Wherein, the processor is suitable for executing the program; the storage device is suitable for storing the program; the program is suitable for being loaded and executed by the processor to realize the above-mentioned angle-independent-based skeleton behavior recognition method.

本实施例中，具体运行的硬件和编程语言为：实验基于Ubuntu 14.04LTS系统，所使用的服务器配置为Xeon E5-2630V4 2.2GHZ处理器，128G内存和具有12G显存的NVIDIATian-X GPU四个。实验采用Keras深度学习框架，TensorFlow后端，集成开发环境为Pycharm，使用随机梯度下降(SGD)算法训练我们的网络。In this embodiment, the specific operating hardware and programming language are: the experiment is based on the Ubuntu 14.04LTS system, and the server configuration used is Xeon E5-2630V4 2.2GHZ processor, 128G memory and four NVIDIA Tian-X GPUs with 12G video memory. The experiment uses the Keras deep learning framework, the TensorFlow backend, the integrated development environment is Pycharm, and uses the stochastic gradient descent (SGD) algorithm to train our network.

本实施例中选择目前最大的多视角骨架公开数据集NTU RGB+D数据集作为训练数据和测试数据。该数据集包含56880个视频样本，60个行为类，40个受试者，每帧人体数据用25个关节点的坐标表示。采用标准的cross-subject测试方式，将该数据集中20个演员所做的视频作为训练集，剩余的作为测试集。In this embodiment, the NTU RGB+D data set, the largest multi-view skeleton public data set at present, is selected as the training data and the test data. The dataset contains 56,880 video samples, 60 behavior classes, and 40 subjects. Each frame of human body data is represented by the coordinates of 25 joint points. Using the standard cross-subject test method, the videos made by 20 actors in the data set are used as the training set, and the rest are used as the test set.

图6是本发明的基于角度无关性的骨架行为识别系统实施例的构成示意图。如图6所示，本实施例的骨架行为识别系统包括：控制单元100，以及骨架行为识别模型200；FIG. 6 is a schematic diagram of an embodiment of the angle-independent-based skeleton behavior recognition system of the present invention. As shown in Figure 6, the skeleton behavior recognition system of this embodiment includes: a control unit 100, and a skeleton behavior recognition model 200;

控制单元100用于对骨架行为识别模型200进行训练，并利用训练好的骨架行为识别模型200计算待识别骨架时间序列的行为类别概率；The control unit 100 is used to train the skeleton behavior recognition model 200, and use the trained skeleton behavior recognition model 200 to calculate the behavior category probability of the skeleton time series to be recognized;

骨架行为识别模型200包括：预设数量的特定视角子网210，以及公共子网220；The skeleton behavior recognition model 200 includes: a preset number of specific perspective subnetworks 210, and a public subnetwork 220;

其中，in,

特定视角子网210包括：空域注意力模块211、时域注意力模块212、判别性特征提取模块213；公共子网220包括：双向长短时记忆网络221、视角注意力模块222、概率计算模块223。Specific view subnetwork 210 includes: spatial domain attention module 211, temporal domain attention module 212, and discriminative feature extraction module 213; public subnetwork 220 includes: bidirectional long-short-term memory network 221, perspective attention module 222, probability calculation module 223 .

控制单元100的功能配置参见步骤A1-A2和步骤S1-S6的叙述；空域注意力模块211、时域注意力模块212、判别性特征提取模块213、双向长短时记忆网络221、视角注意力模块222、概率计算模块223的结构与功能也请参看前面的相关叙述，此处不再赘述。For the functional configuration of the control unit 100, refer to the narration of steps A1-A2 and steps S1-S6; spatial domain attention module 211, time domain attention module 212, discriminative feature extraction module 213, bidirectional long-short-term memory network 221, perspective attention module 222. For the structure and functions of the probability calculation module 223, please also refer to the previous relevant descriptions, which will not be repeated here.

本领域技术人员应该能够意识到，结合本文中所公开的实施例描述的各示例的方法步骤、模块、单元，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明电子硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以电子硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art should be able to realize that the method steps, modules, and units of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the electronic hardware and software interchangeability, the composition and steps of each example have been generally described in terms of functions in the above description. Whether these functions are performed by electronic hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may implement the described functionality using different methods for each particular application, but such implementation should not be considered as exceeding the scope of the present invention.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the accompanying drawings, but those skilled in the art will easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and the technical solutions after these changes or substitutions will all fall within the protection scope of the present invention.

Claims

1. A skeleton behavior recognition method based on angle independence, is characterized in that, comprises:

Input the skeleton time series to be recognized into the trained skeleton behavior recognition model according to different perspectives;

Using the trained skeleton behavior recognition model to calculate the behavior category probability of the skeleton time series to be recognized;

in,

The skeleton behavior recognition model includes: a preset number of specific perspective subnets and public subnets;

The training method of described skeleton behavior recognition model, comprises the following steps:

Step S1, for each of the specific view subnets, input a frame of training data corresponding to the specific view, respectively calculate the spatial domain attention weight, the temporal domain attention weight, and then calculate the discriminative features of the specific view subnetwork ;

Step S2, concatenating the discriminative features of each of the specific view subnetworks into a view sequence, as the input of the public subnetwork, calculating the view independent features and view attention weights, and then calculating the behavior category of the training data The probability;

Step S3, judging whether all the training data has been input, if so, go to step S4; otherwise, go to step S1;

Step S4, calculating the loss function;

Step S5, judging whether the loss function is convergent, if so, the training ends, otherwise, go to step S6;

Step S6, adjust the parameters of the skeleton behavior recognition model, go to step S1.

2. skeleton behavior recognition method according to claim 1, is characterized in that,

The specific view subnet includes: a spatial attention module, a temporal attention module, and a discriminative feature extraction module;

The public subnet includes: a two-way long-short-term memory network, a perspective attention module, and a probability calculation module.

3. The skeleton behavior recognition method according to claim 2, characterized in that, in step S1, "calculate the spatial attention weight and the temporal attention weight, and then calculate the discriminative features of the specific viewing angle subnet", specifically including :

Assigning attention weights to each joint point through the spatial attention module;

Through the time domain attention module, assign time domain attention weights for each frame;

According to the training data and the spatial attention weight, the discriminative feature of the training data on the specific viewing angle is extracted by the discriminative feature extraction module;

Outputting the discriminative features of the specific view subnet according to the temporal attention weights and the discriminative features on the specific view.

4. The skeleton behavior recognition method according to claim 2, wherein, in step S2, "calculate the angle-independent feature and the attention weight of the angle of view, and then calculate the probability of the behavior category of the training data", specifically comprising:

Outputting angle-independent features through the bidirectional long-short-term memory network;

Assigning different perspective attention weights to each of the specific perspectives through the perspective attention module;

The probability of the behavior category of the training data is obtained through the probability calculation module according to the angle-independent feature and the view weight.

5. skeleton behavior recognition method according to claim 1, is characterized in that, described loss function is:

in,

The first item is the cross-entropy loss of the entire network; _yi is the true label of the training data; The probability that the training data predicted for the public subnetwork belongs to the i-th behavior category; C is the number of behavior categories;

λ ₁ , λ ₂ and λ ₃ are parameters for balancing the entire network;

The second item is the regular term of the spatial attention module; K is the number of joint points; v is the number of the specific viewing angle; T is the number of frames of the input training data; is the spatial attention weight of the k-th joint point in the t-th frame under the j-th viewing angle;

The third item is the regular term of the temporal attention module; Be the time-domain attention weight of the t-th frame under the j-th viewing angle;

The fourth item is the regular term of the parameter; W _sv is the connection matrix of the network, and the L ₁ norm is used to prevent the entire network from overfitting.

6. The skeleton behavior recognition method according to claim 3, wherein the spatial domain attention module is made up of an LSTM layer, two fully connected layers, and a tanh activation unit;

Correspondingly, the method for calculating the airspace attention weight includes:

The input data passes through the airspace attention module to obtain the corresponding score of the K joint point in the tth frame:

Normalize the obtained scores to obtain the spatial attention weight of each joint point:

in,

W _es , W _xs , W _hs are parameter matrices that need to be learned; is the input data of frame t; Indicates the input data of frame t-1 Spatial hidden output through the LSTM layer; b _s and be _es are both bias items;

is the spatial attention weight of the k-th joint point in the t-th frame under the j-th viewing angle; is the corresponding score of the k-th joint point in the t-th frame under the j-th viewing angle; is the corresponding score of the l-th joint point in the t-th frame under the j-th viewing angle.

7. the skeleton behavior recognition method according to claim 6, is characterized in that, described temporal domain attention module, is made up of LSTM layer, a fully connected layer, and a ReLU activation unit;

Correspondingly, the method for calculating the temporal domain attention weight includes:

in,

is the time-domain attention weight of the t-th frame; W _e1 and W _e2 are parameter matrices that need to be learned, Indicates the time-domain hidden output of the input data of the t-1th frame after passing through the LSTM layer; b _e is the bias item.

8. skeleton behavior recognition method according to claim 7, is characterized in that, described discriminant feature extraction module is made of 3 layers of LSTM;

The input of the discriminative feature extraction module The element-wise dot product of the spatial attention weight and the input data is obtained:

in,

Respectively, the spatial domain attention weight and input data of the kth joint point in the tth frame under the jth viewing angle;

The output of the discriminative feature extraction module Perform dot multiplication with the time-domain attention weight as the jth element of the public subnetwork input data:

9. The skeleton behavior recognition method according to claim 1, characterized in that, the training method of the skeleton behavior recognition model also includes the step of preprocessing the training data before step S1:

Step S0, group the skeleton sequences corresponding to the same behavior of the same subject in the same environment according to different viewing angles; keep the coordinates of each joint point to four decimal places; take the first 100 frames of the skeleton sequence for each viewing angle If there are less than 100 frames, the data of the last frame will be used to make up.

10. A storage device, wherein a program is stored, wherein the program is adapted to be loaded and executed by a processor, so as to realize the angle-independent skeleton behavior recognition based on any one of claims 1-9 method.

11. A processing device comprising:

a processor adapted to execute programs; and

a storage device adapted to store the program;

It is characterized in that the program is adapted to be loaded and executed by a processor to realize the angle-independent-based skeleton behavior recognition method according to any one of claims 1-9.

12. A skeleton behavior recognition system based on angle independence, comprising: a control unit, and a skeleton behavior recognition model;

The control unit is configured to train the skeleton behavior recognition model, and use the trained skeleton behavior recognition model to calculate the behavior category probability of the skeleton time series to be recognized;

in,