CN111340090B

CN111340090B - Image feature comparison method and device, equipment and computer readable storage medium

Info

Publication number: CN111340090B
Application number: CN202010107969.4A
Authority: CN
Inventors: 李玺; 吴昊潜; 田�健; 李斌; 吴飞; 董霖; 叶新江; 方毅
Original assignee: Zhejiang University ZJU; Merit Interactive Co Ltd
Current assignee: Zhejiang University ZJU; Merit Interactive Co Ltd
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2023-08-01
Anticipated expiration: 2040-02-21
Also published as: CN111340090A

Abstract

The application relates to an image feature comparison method, a gait recognition method and equipment. The method comprises the following steps: and calculating the weight of each image feature by using a weight calculation module obtained by combined training with the image feature extraction neural network, and carrying out weighting treatment on each image feature to obtain the distance between the compared videos. The method can improve the precision of gait recognition results.

Description

Image feature comparison method, device, device, and computer-readable storage medium

技术领域technical field

本申请涉及计算机技术领域，特别是涉及一种图像特征比对方法、步态识别方法和设备。The present application relates to the field of computer technology, in particular to an image feature comparison method, a gait recognition method and equipment.

背景技术Background technique

在视频步态识别及其他图像处理技术领域，需要进行图像特征比对，即根据提取得到的图像特征确定两个待比对的图像/视频之间的相似度。In the field of video gait recognition and other image processing technologies, image feature comparison is required, that is, the similarity between two images/videos to be compared is determined according to the extracted image features.

以视频步态识别领域为例，通过视频之间的图像特征比对，计算两个视频之间的距离(即相似度)，进而判断两个视频中的目标(通常为行人)是否为相同目标。Taking the field of video gait recognition as an example, through the comparison of image features between videos, the distance (ie, similarity) between two videos is calculated, and then it is judged whether the targets (usually pedestrians) in the two videos are the same target .

现有的实现方式中，是计算视频间相同图像特征的欧氏距离，将各个图像特征的欧式距离之和作为两个视频之间的距离。但对于不同的图像变量，鲁棒性的图像特征有所不同。例如，一个视频中的行人穿着厚外套，而另一个视频中的行人穿着薄外套，那么，上衣区域对应的图像特征的鲁棒性较差，按照上岁数实现方式进行图像特征提取及比对，会导致步态识别结果的精度较低。In the existing implementation, the Euclidean distance of the same image features between videos is calculated, and the sum of the Euclidean distances of each image feature is used as the distance between two videos. But for different image variables, the robust image features are different. For example, if a pedestrian in one video wears a thick coat, while a pedestrian in another video wears a thin coat, then the image features corresponding to the upper garment area are less robust, and the image feature extraction and comparison are performed according to the previous age implementation method. It will lead to lower accuracy of gait recognition results.

为解决提取鲁棒性较差的图像特征导致的问题，另一种现有的实现方式中，针对各种图像变量情况，提取不同的图像特征。例如，一个视频中的行人穿着厚外套，而另一个视频中的行人穿着薄外套，那么，不提取上衣区域对应的图像特征。这种实现方式虽然能够一定程度上避免提取鲁棒性差的图像特征，但需要针对各种图像变量情况需要提取不同的图像特征，并分别训练图像特征提取神经网络。In order to solve the problem caused by extracting image features with poor robustness, in another existing implementation, different image features are extracted for various image variables. For example, if a pedestrian in one video wears a thick coat, and a pedestrian in another video wears a thin coat, then the image features corresponding to the upper coat area are not extracted. Although this implementation method can avoid extracting image features with poor robustness to a certain extent, it needs to extract different image features for various image variables and train image feature extraction neural networks separately.

发明内容Contents of the invention

为解决上述技术问题，提出一种能够图像特征比对方法、步态识别方法和设备。In order to solve the above technical problems, a method capable of image feature comparison, a gait recognition method and equipment are proposed.

第一方面，本申请实施例提供一种图像特征比对方法，该方法包括：In the first aspect, the embodiment of the present application provides an image feature comparison method, the method comprising:

利用图像特征提取神经网络分别获取第一视频和第二视频的多个图像特征；Using the image feature extraction neural network to obtain multiple image features of the first video and the second video respectively;

分别确定所述第一视频与所述第二视频之间相同图像特征的距离；respectively determining the distance of the same image feature between the first video and the second video;

利用权重计算模块计算所述第一视频与所述第二视频之间相同图像特征的权重，所述权重计算模块是通过与所述图像特征提取神经网络联合训练得到的，所述权重计算模块用于根据所述第一视频与所述第二视频的目标图像变量计算图像特征的权重；Utilize the weight calculation module to calculate the weight of the same image feature between the first video and the second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, and the weight calculation module uses calculating the weight of image features according to the target image variables of the first video and the second video;

利用所述多个图像特征各自的权重对所述多个图像特征进行加权，根据加权结果确定所述第一视频与所述第二视频之间的距离，所述第一视频与所述第二视频之间的距离反映了所述第一视频的目标图像与所述第二视频的目标图像之间的相似度。Weighting the plurality of image features with respective weights of the plurality of image features, determining the distance between the first video and the second video according to the weighting result, and determining the distance between the first video and the second video The distance between videos reflects the similarity between the target image of the first video and the target image of the second video.

第二方面，本申请实施例提供一种计算机设备，包括处理器和存储器；In a second aspect, an embodiment of the present application provides a computer device, including a processor and a memory;

所述存储器用于存储执行第一方面所述方法的程序；The memory is used to store a program for executing the method described in the first aspect;

所述处理器被配置为执行所述存储器中存储的程序。The processor is configured to execute programs stored in the memory.

第三方面，本申请实施例提供一种步态识别方法，所述方法包括：In a third aspect, the embodiment of the present application provides a gait recognition method, the method comprising:

利用图像特征提取神经网络获取第一视频的多个图像特征，并获取多个第二视频各自的多个图像特征；Using the image feature extraction neural network to obtain a plurality of image features of the first video, and obtain a plurality of image features of each of the plurality of second videos;

分别确定所述第一视频与每个第二视频之间相同图像特征的距离；separately determining the distance of the same image feature between the first video and each second video;

利用权重计算模块计算所述第一视频与每个第二视频之间相同图像特征的权重，所述权重计算模块是通过与所述图像特征提取神经网络联合训练得到的，所述权重计算模块用于根据所述第一视频与第二视频的目标图像变量计算图像特征的权重；Utilize the weight calculation module to calculate the weight of the same image feature between the first video and each second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, and the weight calculation module uses calculating the weight of the image features according to the target image variables of the first video and the second video;

针对每个第二视频，利用所述多个图像特征各自的权重对所述多个图像特征进行加权，根据加权结果确定所述第一视频与对应的第二视频之间的距离，所述第一视频与第二视频之间的距离反映了所述第一视频的目标图像与所述第二视频的目标图像之间的步态相似度；For each second video, using the respective weights of the multiple image features to weight the multiple image features, and determining the distance between the first video and the corresponding second video according to the weighting result, the first video The distance between a video and the second video reflects the gait similarity between the target image of the first video and the target image of the second video;

根据所述第一视频与各个第二视频之间的距离，选择至少一个第二视频对应的目标作为所述第一视频的步态识别结果。According to the distance between the first video and each second video, at least one object corresponding to the second video is selected as the gait recognition result of the first video.

第四方面，本申请实施例提供一种计算机设备，包括处理器和存储器；In a fourth aspect, the embodiment of the present application provides a computer device, including a processor and a memory;

所述存储器用于存储执行第三方面所述方法的程序；The memory is used to store a program for executing the method described in the third aspect;

本申请实施例提供的技术方案，至少存在如下技术效果或优点：The technical solutions provided by the embodiments of the present application have at least the following technical effects or advantages:

利用与图像特征提取神经网络联合训练得到的权重计算模块计算各个图像特征的权重，对各个图像特征进行加权处理，得到对比视频之间的距离。由于根据目标图像变量计算图像特征的权重，因此，考虑到了不同目标图像变量下各个图像特征的重要程度，对于不同目标图像变量情况，仍然可以提取相同的图像特征，利用计算得到的权重对图像特征进行加权处理，从而提高重要图像特征的鲁棒性，应用到步态识别，则可以提高步态识别的精度。The weight calculation module obtained by joint training with the image feature extraction neural network is used to calculate the weight of each image feature, and carry out weighting processing on each image feature to obtain the distance between the comparison videos. Since the weight of the image feature is calculated according to the target image variable, the importance of each image feature under different target image variables is considered. For different target image variables, the same image feature can still be extracted, and the weight of the image feature is calculated using the calculated weight. Weighting processing is performed to improve the robustness of important image features, and when applied to gait recognition, the accuracy of gait recognition can be improved.

附图说明Description of drawings

图1为本申请实施例提供的图像特征比对方法的流程图；Fig. 1 is a flow chart of the image feature comparison method provided by the embodiment of the present application;

图2为本申请实施例提供的步态识别方法的流程图。Fig. 2 is a flow chart of the gait recognition method provided by the embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

在本申请的说明书和权利要求书及上述附图中的描述的一些流程中，包含了按照特定顺序出现的多个操作，但是应该清楚了解，这些流程可以包括更多或更少的操作，并且这些操作可以按顺序执行或并行执行。In some processes described in the specification and claims of the present application and the above-mentioned drawings, multiple operations appearing in a specific order are included, but it should be clearly understood that these processes may include more or fewer operations, and These operations can be performed sequentially or in parallel.

本发明实施例提供一种图像特征比对方法，如图1所示，该方法包括如下操作：An embodiment of the present invention provides an image feature comparison method, as shown in Figure 1, the method includes the following operations:

步骤101、利用图像特征提取神经网络分别获取第一视频和第二视频的多个图像特征。Step 101, using the image feature extraction neural network to obtain multiple image features of the first video and the second video respectively.

其中，图像特征具体是指视频中的目标图像的特征向量。Wherein, the image feature specifically refers to a feature vector of the target image in the video.

本发明实施例中，第一视频和第二视频可以是摄像机采集到的原始视频，也可以是经过处理后得到的人物轮廓视频。所谓人物轮廓视频，是指视频中的每帧图像均为人物轮廓图像。In the embodiment of the present invention, the first video and the second video may be original videos collected by a camera, or may be processed silhouette videos of people. The so-called person profile video means that each frame of image in the video is a person profile image.

本发明实施例提供的方法适用于图像特征提取神经网络训练阶段，那么，第一视频和第二视频均为训练图像特征提取神经网络的样本视频。本发明实施例提供的方法也适用于图像特征提取神经网络训练完成后的应用阶段，例如步态识别过程，那么，第一视频为待处理(例如待识别)的视频，第二视频可以为待处理的视频，也可以为已知识别结果的视频(如已知某具体人物的步态视频)。The method provided by the embodiment of the present invention is applicable to the image feature extraction neural network training stage, then, the first video and the second video are both sample videos for training the image feature extraction neural network. The method provided by the embodiment of the present invention is also applicable to the application stage after the image feature extraction neural network training is completed, such as the gait recognition process, then, the first video is a video to be processed (for example, to be recognized), and the second video can be a video to be processed. The processed video may also be a video with known recognition results (such as a known gait video of a specific person).

步骤102、分别确定所述第一视频与所述第二视频之间相同图像特征的距离。Step 102. Determine the distances of the same image features between the first video and the second video respectively.

作为举例而非限定，确定相同图像特征的欧氏距离。当然，在实际应用中，也可以利用其他已知距离代替欧氏距离，本发明实施例对此不作限定。By way of example and not limitation, the Euclidean distance of the same image features is determined. Of course, in practical applications, other known distances may also be used instead of the Euclidean distance, which is not limited in this embodiment of the present invention.

步骤103、利用权重计算模块计算所述第一视频与所述第二视频之间相同图像特征的权重，所述权重计算模块是通过与所述图像特征提取神经网络联合训练得到的，所述权重计算模块用于根据所述第一视频与所述第二视频的目标图像变量计算图像特征的权重。Step 103, using the weight calculation module to calculate the weight of the same image feature between the first video and the second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, the weight The calculation module is used for calculating the weight of image features according to the target image variables of the first video and the second video.

本发明实施例中，视频中的目标图像是指视频中行人的图像(可以但不仅限于是轮廓图像)。目标图像变量包括以下至少一项：目标图像的拍摄角度，目标图像的上衣衣着类型。In the embodiment of the present invention, the target image in the video refers to an image of a pedestrian in the video (it may be but not limited to a silhouette image). The variable of the target image includes at least one of the following: a shooting angle of the target image, and a type of clothing of the target image.

本发明实施例中，视频的图像变量也称作视频的条件。In the embodiment of the present invention, the image variable of the video is also referred to as the condition of the video.

其中，拍摄角度可以为精确数值，也可以为数值区间。若拍摄角度为精确数值，实际应用中，可以预先确定多个拍摄角度，例如0°、30°、60°……，确定目标图像的实际拍摄角度，并将该实际拍摄角度匹配到最接近的预定拍摄角度。例如，目标图像的实际拍摄角度为35°，则作为目标图像变量的拍摄角度为30°。Wherein, the shooting angle may be a precise numerical value, or may be a numerical interval. If the shooting angle is an accurate value, in practical applications, multiple shooting angles can be pre-determined, such as 0°, 30°, 60°..., to determine the actual shooting angle of the target image, and match the actual shooting angle to the closest Predetermined shooting angle. For example, if the actual shooting angle of the target image is 35°, then the shooting angle as a variable of the target image is 30°.

其中，上衣衣着类型可以在实际应用中，根据场景需要具体确定。作为距离而非限定，上衣衣着类型包括：薄外套、厚外套、长外套、短外套等。Wherein, the clothing type of the jacket may be specifically determined according to the needs of the scene in practical applications. As a distance and not a limitation, the types of tops include: thin coats, thick coats, long coats, short coats, etc.

步骤104、利用所述多个图像特征各自的权重对所述多个图像特征进行加权，根据加权结果确定所述第一视频与所述第二视频之间的距离，所述第一视频与所述第二视频之间的距离反映了所述第一视频的目标图像与所述第二视频的目标图像之间的相似度。Step 104: Use the respective weights of the multiple image features to weight the multiple image features, determine the distance between the first video and the second video according to the weighting result, and determine the distance between the first video and the second video. The distance between the second videos reflects the similarity between the target image of the first video and the target image of the second video.

作为举例而非限定，具体是将加权结果的和作为第一视频与第二视频之间的距离。As an example but not a limitation, specifically, the sum of the weighted results is used as the distance between the first video and the second video.

实际应用中，根据图像特征比对结果的应用场景，确定图像特征比对的具体内容。例如，在步态识别场景中，比较的是视频之间目标图像的步态相似度。In practical applications, the specific content of the image feature comparison is determined according to the application scenario of the image feature comparison result. For example, in the gait recognition scenario, what is compared is the gait similarity of target images between videos.

本发明实施例提供的方法，利用与图像特征提取神经网络联合训练得到的权重计算模块计算各个图像特征的权重，对各个图像特征进行加权处理，得到对比视频之间的距离。由于根据目标图像变量计算图像特征的权重，因此，考虑到了不同目标图像变量下各个图像特征的重要程度，对于不同目标图像变量情况，仍然可以提取相同的图像特征，利用计算得到的权重对图像特征进行加权处理，从而提高重要图像特征的鲁棒性，应用到步态识别，则可以提高步态识别的精度。In the method provided by the embodiment of the present invention, the weight calculation module obtained through joint training with the image feature extraction neural network is used to calculate the weight of each image feature, perform weighting processing on each image feature, and obtain the distance between the comparison videos. Since the weight of the image feature is calculated according to the target image variable, the importance of each image feature under different target image variables is considered. For different target image variables, the same image feature can still be extracted, and the weight of the image feature is calculated using the calculated weight. Weighting processing is performed to improve the robustness of important image features, and when applied to gait recognition, the accuracy of gait recognition can be improved.

下面分别对本发明实施例提供的图像特征比对方法的各个步骤的实现方式进行进一步说明。The implementation of each step of the image feature comparison method provided by the embodiment of the present invention will be further described below.

上述步骤101的实现方式有多种。在图像特征提取神经网络训练阶段，一种实现方式为，分别将第一视频和第二视频输入图像特征提取神经网络，分别得到图像特征提取神经网络输出的第一视频和第二视频的多个图像特征；另一种实现方式为，分别将第一视频和第二视频输入图像特征提取神经网络，得到图像特征提取神经网络输出的第一视频和第二视频的特征图，分别将第一视频和第二视频的特征图按照预定的规则进行切分，并对切分得到的每个子特征图进行池化操作，分别得到第一视频和第二视频的每个子特征图对应的图像特征。在图像特征提取神经网络应用阶段，一种实现方式中，将第一视频输入图像特征提取神经网络，得到图像特征提取神经网络输出的第一视频的多个图像特征，从预定的存储位置读取预先确定的第二视频的多个图像特征；另一种实现方式为，将第一视频输入图像特征提取神经网络，得到图像特征提取神经网络输出的第一视频的特征图，将第一视频特征图按照预定的规则进行切分，并对切分得到的每个子特征图进行池化操作，得到第一视频的每个子特征图对应的图像特征，从预定的存储位置读取预先确定的第二视频的多个图像特征。其中，第二视频的图像特征的确定方式可以参照第一视频的图像特征的确定方式。There are many ways to implement the above step 101. In the training phase of the image feature extraction neural network, an implementation method is to input the first video and the second video into the image feature extraction neural network respectively, and obtain multiple images of the first video and the second video output by the image feature extraction neural network respectively. Image feature; another kind of realization mode is, the first video and the second video input image feature extraction neural network respectively, obtain the feature map of the first video of the image feature extraction neural network output and the second video, first video respectively and the feature map of the second video are segmented according to predetermined rules, and a pooling operation is performed on each sub-feature map obtained by the segmentation to obtain image features corresponding to each sub-feature map of the first video and the second video respectively. In the image feature extraction neural network application stage, in one implementation, the first video is input to the image feature extraction neural network to obtain multiple image features of the first video output by the image feature extraction neural network, and read from a predetermined storage location A plurality of image features of the predetermined second video; another implementation is to input the first video into the image feature extraction neural network, obtain the feature map of the first video output by the image feature extraction neural network, and extract the first video feature The image is segmented according to predetermined rules, and each sub-feature map obtained by segmentation is pooled to obtain the image features corresponding to each sub-feature map of the first video, and the predetermined second video is read from a predetermined storage location. Multiple image features for video. Wherein, the manner of determining the image feature of the second video may refer to the manner of determining the image feature of the first video.

可见，无论是何种应用场景，图像特征的提取均是利用图像特征提取神经网络实现的。例如，将所述第一视频输入所述图像特征提取神经网络，得到所述图像特征提取神经网络输出的所述第一视频的特征图，所述第一视频的特征图的尺寸为N×C×H×W，N为所述第一视频的帧数，C为所述第一视频的单帧视频的通道数，H为所述第一视频中单帧图像的高度，W为所述第一视频中单帧图像的宽度；将所述第一视频的特征图沿水平方向切分得到M个子特征图；分别对所述第一视频的每个子特征图进行池化操作，得到所述第一视频的M个图像特征，所述第一视频的单个图像特征的尺寸为N×C×1。又例如，将所述第二视频输入所述图像特征提取神经网络，得到所述图像特征提取神经网络输出的所述第二视频的特征图，所述第二视频的特征图的尺寸为N×C×H×W，N为所述第二视频的帧数，C为所述第二视频的单帧视频的通道数，H为所述第二视频中单帧图像的高度，W为所述第二视频中单帧图像的宽度；将所述第二视频的特征图沿水平方向(H维度)切分得到M个子特征图；分别对所述第二视频的每个子特征图进行池化操作，得到所述第二视频的M个图像特征，所述第二视频的单个图像特征的尺寸为N×C×1。It can be seen that no matter what the application scenario is, the extraction of image features is realized by using the image feature extraction neural network. For example, inputting the first video into the image feature extraction neural network to obtain the feature map of the first video output by the image feature extraction neural network, the size of the feature map of the first video is N×C ×H×W, N is the number of frames of the first video, C is the number of channels of a single frame of the first video, H is the height of a single frame in the first video, W is the height of the first video The width of a single frame image in a video; the feature map of the first video is segmented along the horizontal direction to obtain M sub-feature maps; the pooling operation is performed on each sub-feature map of the first video respectively to obtain the first video There are M image features of a video, and the size of a single image feature of the first video is N×C×1. For another example, the second video is input into the image feature extraction neural network to obtain the feature map of the second video output by the image feature extraction neural network, and the size of the feature map of the second video is N× C×H×W, N is the frame number of the second video, C is the channel number of a single frame video of the second video, H is the height of a single frame image in the second video, W is the The width of a single frame image in the second video; the feature map of the second video is segmented along the horizontal direction (H dimension) to obtain M sub-feature maps; the pooling operation is performed on each sub-feature map of the second video respectively , to obtain M image features of the second video, where the size of a single image feature of the second video is N×C×1.

在一个具体是的实施例中，图像特征提取神经网络包括6个卷积层和2个池化层。In a specific embodiment, the image feature extraction neural network includes 6 convolutional layers and 2 pooling layers.

本发明实施例不对子特征图的池化操作进行限定，例如，可采用最大池化和平均池化。The embodiment of the present invention does not limit the pooling operation of the sub-feature map, for example, maximum pooling and average pooling may be used.

本发明实施例中，可以将特征图均分为M个子特征图，也可以按照预定的切分规则进行切分，本发明实施例对此不做限定。In the embodiment of the present invention, the feature map can be equally divided into M sub-feature maps, or can be divided according to a predetermined segmentation rule, which is not limited in the embodiment of the present invention.

应当指出的是，实际引用中，根据图像特征的条件不同，还可以有不同的特征图切分方式，本发明实施例对此不做限定。It should be noted that in actual reference, there may be different feature map segmentation methods according to different conditions of image features, which is not limited in this embodiment of the present invention.

上述步骤102的实现方式有多种，以确定相同图像特征的欧氏距离为例，可采用现有计算欧氏距离的方式计算视频间相同图像特征的欧氏距离。There are many ways to implement the above step 102. Taking determining the Euclidean distance of the same image feature as an example, the existing Euclidean distance calculation method can be used to calculate the Euclidean distance of the same image feature between videos.

上述步骤103的实现方式有多种。一种优选的实现方式为，分别获取所述第一视频的步态能量图与所述第二视频的步态能量图；将所述第一视频的步态能量图和所述第二视频的步态能量图输入所述权重计算模块，并获取所述权重计算模块输出的所述第一视频与所述第二视频之间相同图像特征的权重。其中，可采用现有实现方式获取步态能量图，本发明实施例对此不做限定。另一种实现方式中，将所述第一视频和所述第二视频输入所述权重计算模块，并获取所述权重计算模块输出的所述第一视频与所述第二视频之间相同图像特征的权重。又一种实现方式中，将诉讼第一视频的多个图像特征和所述第二视频的多个图像特征输入权重计算模块，并获取所述权重计算模块输出的所述第一视频与所述第二视频之间相同图像特征的权重。There are many ways to realize the above step 103 . A preferred implementation method is to obtain the gait energy map of the first video and the gait energy map of the second video respectively; the gait energy map of the first video and the gait energy map of the second video The gait energy map is input into the weight calculation module, and the weight of the same image feature between the first video and the second video output by the weight calculation module is obtained. Wherein, the gait energy map may be acquired in an existing implementation manner, which is not limited in this embodiment of the present invention. In another implementation manner, the first video and the second video are input to the weight calculation module, and the same image between the first video and the second video output by the weight calculation module is obtained The weight of the feature. In yet another implementation manner, multiple image features of the first video and multiple image features of the second video are input into the weight calculation module, and the first video and the output of the weight calculation module are obtained. Weights for the same image features between the second videos.

例如，有M个图像特征，则权重计算模块输出M个权重，每个权重对应一个图像特征。For example, if there are M image features, the weight calculation module outputs M weights, and each weight corresponds to an image feature.

其中，权重计算模块的具体实现方式有多种，本发明实施例对此不作限定。作为举例而非限定，通过训练，预先图像变量比对结果与权重映射关系，应用过程中，首先比对两个视频(或步态能量图、图像特征)中目标图像的图像变量，例如比对图像变量的拍摄角度和/或上衣衣着类型，查找比对结果对应的一组权重。其中，比对两个视频(或步态能量图、图像特征)中目标图像的图像变量的实现方式可以是，首先识别每个视频中目标图像的图像变量(例如识别拍摄角度、识别上衣衣着类型等)，然后比对图像变量。其中，可采用现有的图像处理技术进行拍摄角度的识别以及上衣衣着类型的识别。There are many specific implementation manners of the weight calculation module, which are not limited in this embodiment of the present invention. As an example and not limitation, through training, the pre-image variable comparison result and the weight mapping relationship, in the application process, first compare the image variables of the target image in the two videos (or gait energy maps, image features), such as comparing Take the shooting angle and/or top clothing type of the image variables, and find a set of weights corresponding to the comparison results. Among them, the implementation of comparing the image variables of the target image in the two videos (or gait energy map, image features) can be, firstly, identify the image variables of the target image in each video (such as identifying the shooting angle, identifying the type of clothing on the top) etc.), and then compare the image variables. Among them, the existing image processing technology can be used to identify the shooting angle and the type of clothing.

本发明实施例中，所述权重计算模块与所述图像特征提取神经网络的联合训练方式为：In the embodiment of the present invention, the joint training method of the weight calculation module and the image feature extraction neural network is:

利用样本视频之间的第一距离确定第一损失函数，利用所述第一损失函数和所述样本视频训练所述图像特征提取神经网络，所述样本视频之间的第一距离是利用样本视频之间相同图像特征的距离计算得到的；Using the first distance between sample videos to determine a first loss function, using the first loss function and the sample video to train the image feature extraction neural network, the first distance between the sample videos is using the sample video Calculated from the distance between the same image features;

利用样本视频之间的第二距离确定第二损失函数；利用所述第二损失函数和所述样本视频训练所述权重计算模块，或者，利用所述第二损失函数和所述样本视频训练所述权重计算模块和所述图像特征提取神经网络；所述样本视频之间的第二距离是利用样本视频之间相同图像特征的权重对本图像特征的距离加权后计算得到的。Using a second distance between sample videos to determine a second loss function; using the second loss function and the sample video to train the weight calculation module, or using the second loss function and the sample video to train the weight calculation module The weight calculation module and the image feature extraction neural network; the second distance between the sample videos is calculated by using the weight of the same image feature between the sample videos to weight the distance of the image feature.

通过上述联合训练方式，可以得到较为准确的权重结果。Through the above joint training method, more accurate weight results can be obtained.

应当指出的是，上述各个步骤间的实现方式可以任意组合，得到新的实施例。It should be noted that the implementation manners of the above steps can be combined arbitrarily to obtain new embodiments.

在上述任意方法实施例的基础上，所述目标图像变量包括目标图像的拍摄角度，当所述第一视频与所述第二视频的目标图像的拍摄角度一致时，所述第一视频与所述第二视频之间肩膀图像区域对应的图像特征的权重大于其他图像特征的权重，当所述第一视频与所述第二视频的目标图像的拍摄角度不一致时，所述第一视频与所述第二视频之间头部和脚部图像区域对应的图像特征的权重大于其他图像特征的权重。On the basis of any of the above method embodiments, the target image variable includes the shooting angle of the target image, and when the shooting angles of the target image of the first video and the second video are consistent, the first video and the second video have the same shooting angle. The weight of the image feature corresponding to the shoulder image area between the second video is greater than the weight of other image features, and when the shooting angles of the target image of the first video and the second video are inconsistent, the first video and the second video The weights of the image features corresponding to the head and foot image regions between the second video are greater than the weights of other image features.

在上述任意方法实施例的基础上，所述目标图像变量包括目标图像的上衣着类型，当所述第一视频与所述第二视频的目标图像的上衣着类型不同时，所述第一视频与所述第二视频之间上衣图像区域对应的图像特征的权重小于其他图像特征的权重。On the basis of any of the above method embodiments, the target image variable includes the upper clothing type of the target image, and when the upper clothing type of the target image in the first video and the second video are different, the first video The weights of the image features corresponding to the upper garment image area between the second video are smaller than the weights of other image features.

基于同样的发明构思，本发明实施例还提供一种计算机设备，包括处理器和存储器；所述存储器用于存储执行上述任意实施例所述方法的程序；所述处理器被配置为执行所述存储器中存储的程序。Based on the same inventive concept, an embodiment of the present invention also provides a computer device, including a processor and a memory; the memory is used to store a program for executing the method described in any of the above embodiments; the processor is configured to execute the programs stored in memory.

本发明实施例还提供一种步态识别方法，如图2所示，所述方法包括：The embodiment of the present invention also provides a gait recognition method, as shown in Figure 2, the method includes:

步骤201、利用图像特征提取神经网络获取第一视频的多个图像特征，并获取多个第二视频各自的多个图像特征。Step 201, using the image feature extraction neural network to acquire multiple image features of the first video, and acquire multiple image features of multiple second videos respectively.

在本实施例中，第一视频为待识别的视频，多个第二视频为已知识别结果的视频(如样本视频)。In this embodiment, the first video is a video to be recognized, and the plurality of second videos are videos of known recognition results (such as sample videos).

其中，第一视频的多个图像特征的获取方式参照上述方法实施例，此处不再赘述。其中，第二视频的多个图像特征是预先确定的，在此步骤中，读取预先确定的第二视频的图像特征。For the manner of acquiring multiple image features of the first video, refer to the above-mentioned method embodiments, which will not be repeated here. Wherein, multiple image features of the second video are predetermined, and in this step, the predetermined image features of the second video are read.

步骤202、分别确定所述第一视频与每个第二视频之间相同图像特征的距离。Step 202, respectively determine the distance of the same image feature between the first video and each second video.

假设有X个第二视频(V₁、V₂、……V_X)，M个图像特征，则分别确定第一视频与每个第二视频之间相同图像特征的距离，得到X组距离，每组距离对应一个第二视频，一组距离的数量为M个，分别为每个图像特征对应的距离。Assuming that there are X second videos (V ₁ , V ₂ , ... V _X ) and M image features, the distances of the same image features between the first video and each second video are respectively determined to obtain X groups of distances, Each group of distances corresponds to a second video, and the number of a group of distances is M, which are respectively the distances corresponding to each image feature.

步骤203、利用权重计算模块计算所述第一视频与每个第二视频之间相同图像特征的权重，所述权重计算模块是通过与所述图像特征提取神经网络联合训练得到的，所述权重计算模块用于根据所述第一视频与第二视频的目标图像变量计算图像特征的权重。Step 203, using the weight calculation module to calculate the weight of the same image feature between the first video and each second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, the weight The calculating module is used for calculating the weight of the image features according to the target image variables of the first video and the second video.

假设有X个第二视频(V₁、V₂、……V_X)，M个图像特征，则每次以第一视频和一个第二视频为一组输入权重计算模块，得到X组输出结果，每组输出结果包括M个权重，每个权重对应一个图像特征。Assuming that there are X second videos (V ₁ , V ₂ , ... V _X ) and M image features, each time the first video and a second video are used as a set of input weight calculation modules to obtain X sets of output results , each set of output results includes M weights, and each weight corresponds to an image feature.

其具体实现方式可以参照上述方法实施例的描述，此处不再赘述。For the specific implementation manner, reference may be made to the description of the foregoing method embodiments, and details are not repeated here.

步骤204、针对每个第二视频，利用所述多个图像特征各自的权重对所述多个图像特征进行加权，根据加权结果确定所述第一视频与对应的第二视频之间的距离，所述第一视频与第二视频之间的距离反映了所述第一视频的目标图像与所述第二视频的目标图像之间的步态相似度。Step 204. For each second video, weight the plurality of image features using their respective weights, and determine the distance between the first video and the corresponding second video according to the weighting result, The distance between the first video and the second video reflects the gait similarity between the target image of the first video and the target image of the second video.

分别计算第一视频与每个第二视频之间的距离，在计算第一视频与每个第二视频之间的距离时，其实现方式可以参照上述方法实施例的描述，此处不再赘述。Calculate the distance between the first video and each second video respectively. When calculating the distance between the first video and each second video, its implementation can refer to the description of the above method embodiment, and will not be repeated here .

步骤205、根据所述第一视频与各个第二视频之间的距离，选择至少一个第二视频对应的目标作为所述第一视频的步态识别结果。Step 205, according to the distance between the first video and each second video, select at least one object corresponding to the second video as the gait recognition result of the first video.

优选的，将距离最近的第二视频对应的目标作为第一视频的步态识别结果。Preferably, the target corresponding to the closest second video is used as the gait recognition result of the first video.

其中，目标是指目标对象对应的人物。Wherein, the target refers to a person corresponding to the target object.

本发明实施例提供的方法，利用与图像特征提取神经网络联合训练得到的权重计算模块计算各个图像特征的权重，对各个图像特征进行加权处理，得到对比视频之间的距离。由于根据目标图像变量计算图像特征的权重，因此，考虑到了不同目标图像变量下各个图像特征的重要程度，对于不同目标图像变量情况，仍然可以提取相同的图像特征，利用计算得到的权重对图像特征进行加权处理，从而提高重要图像特征的鲁棒性，应用到步态识别，则可以提高步态识别的精度及准确性。In the method provided by the embodiment of the present invention, the weight calculation module obtained through joint training with the image feature extraction neural network is used to calculate the weight of each image feature, perform weighting processing on each image feature, and obtain the distance between the comparison videos. Since the weight of the image feature is calculated according to the target image variable, the importance of each image feature under different target image variables is considered. For different target image variables, the same image feature can still be extracted, and the weight of the image feature is calculated using the calculated weight. Weighting processing is performed to improve the robustness of important image features, and when applied to gait recognition, the precision and accuracy of gait recognition can be improved.

其中，所述利用权重计算模块计算所述第一视频与每个第二视频之间相同图像特征的权重，包括：Wherein, the weight calculation module using the weight calculation module calculates the weight of the same image feature between the first video and each second video, including:

分别获取所述第一视频的步态能量图与每个第二视频的步态能量图；Obtaining the gait energy map of the first video and the gait energy map of each second video respectively;

将所述第一视频的步态能量图和单个所述第二视频的步态能量图输入所述权重计算模块，并获取所述权重计算模块输出的所述第一视频与单个第二视频之间相同图像特征的权重。input the gait energy map of the first video and the gait energy map of the single second video into the weight calculation module, and obtain the output of the weight calculation module between the first video and the single second video The weights of the same image features between them.

基于同样的发明构思，本发明实施例提供一种计算机设备，包括处理器和存储器；Based on the same inventive concept, an embodiment of the present invention provides a computer device, including a processor and a memory;

所述存储器用于存储执行上述任意步态识别方法实施例所述方法的程序；The memory is used to store a program for executing the method described in any of the above-mentioned embodiments of the gait recognition method;

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取存储器(RAM，RandomAccess Memory)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: Read Only Memory (ROM, Read Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

上述所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The embodiments described above are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.

Claims

1. An image feature comparison method, characterized in that the method comprises:

Using the image feature extraction neural network to obtain multiple image features of the first video and the second video respectively;

respectively determining the distance of the same image feature between the first video and the second video;

Utilize the weight calculation module to calculate the weight of the same image feature between the first video and the second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, and the weight calculation module uses calculating the weight of image features according to the target image variables of the first video and the second video;

Weighting the plurality of image features with respective weights of the plurality of image features, determining the distance between the first video and the second video according to the weighting result, and determining the distance between the first video and the second video The distance between videos reflects the similarity between the target image of the first video and the target image of the second video;

The target image variable includes the shooting angle of the target image and the upper clothing type of the target image. When the shooting angles of the target image of the first video and the second video are consistent, the first video and the second video The weight of the image feature corresponding to the shoulder image area between the videos is greater than the weight of other image features. When the shooting angles of the target image of the first video and the second video are inconsistent, the first video and the second video The weights of the image features corresponding to the head and foot image regions between the videos are greater than the weights of other image features. When the upper clothing type of the target image of the first video and the second video is different, the first video The weights of the image features corresponding to the upper garment image area between the second video are smaller than the weights of other image features.

2. The method according to claim 1, wherein the weight calculation module using the weight calculation module calculates the weight of the same image feature between the first video and the second video, comprising:

Obtaining the gait energy map of the first video and the gait energy map of the second video respectively;

Input the gait energy map of the first video and the gait energy map of the second video into the weight calculation module, and obtain the difference between the first video and the second video output by the weight calculation module The weights of the same image features between them.

3. method according to claim 1, is characterized in that, the joint training mode of described weight calculation module and described image feature extraction neural network is:

Using the first distance between sample videos to determine a first loss function, using the first loss function and the sample video to train the image feature extraction neural network, the first distance between the sample videos is using the sample video Calculated from the distance between the same image features;

Using a second distance between sample videos to determine a second loss function; using the second loss function and the sample video to train the weight calculation module, or using the second loss function and the sample video to train the weight calculation module The weight calculation module and the image feature extraction neural network; the second distance between the sample videos is calculated by using the weight of the same image feature between the sample videos to weight the distance of the image feature.

4. The method according to claim 1, wherein said utilizing image feature extraction neural network to obtain a plurality of image features of the first video and the second video respectively, comprising:

Inputting the first video into the image feature extraction neural network to obtain a feature map of the first video output by the image feature extraction neural network, the size of the feature map of the first video is N×C×H ×W, N is the frame number of the first video, C is the channel number of a single frame video of the first video, H is the height of a single frame image in the first video, W is the first video The width of a single frame image in the middle; the feature map of the first video is segmented along the horizontal direction to obtain M sub-feature maps; each sub-feature map of the first video is respectively pooled to obtain the first video M image features of the first video, the size of a single image feature of the first video is N×C×1;

Inputting the second video into the image feature extraction neural network to obtain a feature map of the second video output by the image feature extraction neural network, the size of the feature map of the second video is N×C×H ×W, N is the frame number of the second video, C is the channel number of a single frame video of the second video, H is the height of a single frame image in the second video, W is the second video the width of a single frame image; the feature map of the second video is segmented along the horizontal direction to obtain M sub-feature maps; each sub-feature map of the second video is respectively pooled to obtain the second video M image features of the second video, the size of a single image feature of the second video is N×C×1.

5. The method according to any one of claims 1 to 4, characterized in that:

The distance between the first video and the second video is the sum of weighted results;

and / or,

The distance of the same image feature between the first video and the second video is the Euclidean distance;

and / or,

The distance between the first video and the second video reflects the gait similarity between the target image of the first video and the target image of the second video.

6. A computer device, comprising a processor and a memory;

The memory is used to store a program for executing the method according to any one of claims 1 to 5;

The processor is configured to execute programs stored in the memory.

7. A gait recognition method, characterized in that the method comprises:

Using the image feature extraction neural network to obtain a plurality of image features of the first video, and obtain a plurality of image features of each of the plurality of second videos;

separately determining the distance of the same image feature between the first video and each second video;

Utilize the weight calculation module to calculate the weight of the same image feature between the first video and each second video, the weight calculation module is obtained through joint training with the image feature extraction neural network, and the weight calculation module uses calculating the weight of the image features according to the target image variables of the first video and the second video;

For each second video, using the respective weights of the multiple image features to weight the multiple image features, and determining the distance between the first video and the corresponding second video according to the weighting result, the first video The distance between a video and the second video reflects the gait similarity between the target image of the first video and the target image of the second video;

According to the distance between the first video and each second video, select at least one target corresponding to the second video as the gait recognition result of the first video;

8. The method according to claim 7, wherein the weight calculation module using the weight calculation module calculates the weight of the same image feature between the first video and each second video, comprising:

Obtaining the gait energy map of the first video and the gait energy map of each second video respectively;

input the gait energy map of the first video and the gait energy map of the single second video into the weight calculation module, and obtain the output of the weight calculation module between the first video and the single second video The weights of the same image features between them.

9. A computer device, comprising a processor and a memory;

The memory is used to store a program for executing the method according to claim 7 or 8;

The processor is configured to execute programs stored in the memory.