CN114841257B

CN114841257B - A small sample target detection method based on self-supervised contrast constraints

Info

Publication number: CN114841257B
Application number: CN202210421310.5A
Authority: CN
Inventors: 邢薇薇; 姚杰; 刘渭滨; 张顺利; 魏翔
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2023-09-22
Anticipated expiration: 2042-04-21
Also published as: CN114841257A

Abstract

The present invention provides a small sample target detection method based on self-supervised contrast constraints. The method includes: modeling the small-sample target detection problem into a mathematical optimization problem based on self-supervised learning, building a small-sample target detection model that is sensitive to data disturbances; designing an optimized objective function for the small-sample target detection model; based on the optimized objective function Use the deep learning update process to train the small sample target detection model to obtain the trained small sample target detection model, and use the trained small sample target detection model to perform target detection on the small sample to be detected. This invention is based on a two-stage learning process, uses transfer learning to learn domain knowledge, and fine-tunes the model on a small sample data set. Experimental results prove that the present invention has achieved good performance on the PASCAL-VOC public data set, can effectively improve the performance of the model on small sample target detection problems, and has strong practical application significance.

Description

A small sample target detection method based on self-supervised contrast constraints

技术领域Technical field

本发明涉及目标检测技术领域，尤其涉及一种基于自监督对比约束下的小样本目标检测方法。The invention relates to the technical field of target detection, and in particular to a small sample target detection method based on self-supervised contrast constraints.

背景技术Background technique

近年来，随着深度卷积神经网络的发展，目标检测取得了显著的进步，然而现有的目标检测方法严重依赖于大量带注释的数据，当标注的数据变得稀缺时，深度神经网络可能会出现严重过拟合和无法泛化的问题。然而现实中存在许多具有稀缺示例的对象类别或者类似医学数据等对象边界框注释难以获得的数据，小样本学习旨在通过提供的较少的示例训练模型，大多数现有的小样本学习工作都集中在图像分类问题上，只有少数集中在小样本目标检测问题上。由于目标检测不仅需要预测类别，还需要对目标进行定位，这使得它比小样本分类任务困难得多。具体来讲，基于自监督对比约束下的小样本目标检测方法是在可用训练数据较少的情况下，最大化不同类别对象之间的差异，最小化相同类别对象之间的差异，使得对象的类别预测和定位达到最佳的效果，这是一个基于自监督学习的数学优化问题。In recent years, with the development of deep convolutional neural networks, target detection has made significant progress. However, existing target detection methods rely heavily on a large amount of annotated data. When annotated data becomes scarce, deep neural networks may There will be serious overfitting and failure to generalize problems. However, in reality, there are many object categories with scarce examples or data where object bounding box annotations are difficult to obtain, such as medical data. Few-shot learning aims to train the model by providing fewer examples. Most existing few-shot learning works Focus on the image classification problem, and only a few focus on the small sample target detection problem. Since object detection requires not only predicting the category but also locating the object, this makes it much more difficult than the small sample classification task. Specifically, the small sample target detection method based on self-supervised contrast constraints maximizes the difference between objects of different categories and minimizes the difference between objects of the same category when available training data is small, so that the object Category prediction and positioning achieve the best results, which is a mathematical optimization problem based on self-supervised learning.

为了增强小样本对象类别预测和定位效果，需要设计合理的检测方法。目前基于两阶段微调的方法在改善小样本目标检测方面显示出极大的优势，基于两阶段微调的两大阶段为：第一阶段：在大规模数据上训练基类；第二阶段：冻结所有基类训练参数，使用少量的新数据微调分类器和边界框回归器。然而这种小样本目标检测方法仍然存在一些问题，模型在新数据上微调之后，目标对象往往会被错误标记为其他易混淆的类别。In order to enhance the prediction and positioning effect of small sample object categories, reasonable detection methods need to be designed. Currently, methods based on two-stage fine-tuning have shown great advantages in improving small sample target detection. The two major stages based on two-stage fine-tuning are: the first stage: training base classes on large-scale data; the second stage: freezing all Base class training parameters, using small amounts of new data to fine-tune the classifier and bounding box regressor. However, this small-sample target detection method still has some problems. After the model is fine-tuned on new data, the target objects are often mislabeled as other confusing categories.

发明内容Contents of the invention

本发明的实施例提供了一种基于自监督对比约束下的小样本目标检测方法，以实现有效地对小样本进行目标检测。Embodiments of the present invention provide a small-sample target detection method based on self-supervised contrast constraints to achieve effective target detection of small samples.

为了实现上述目的，本发明采取了如下技术方案。In order to achieve the above object, the present invention adopts the following technical solutions.

一种基于自监督对比约束下的小样本目标检测方法，包括：A small-sample target detection method based on self-supervised contrast constraints, including:

将小样本目标检测问题建模成一个基于自监督学习的数学优化问题，构建面向输入对数据扰动敏感的小样本目标检测模型；Model the small-sample target detection problem as a mathematical optimization problem based on self-supervised learning, and build an input-oriented small-sample target detection model that is sensitive to data disturbances;

设计小样本目标检测模型的优化目标函数；Design the optimized objective function of the small sample target detection model;

基于优化目标函数使用深度学习更新过程对小样本目标检测模型进行训练，得到训练好的小样本目标检测模型，利用训练好的小样本目标检测模型对待检测的小样本进行目标检测。Based on the optimized objective function, the deep learning update process is used to train the small sample target detection model to obtain the trained small sample target detection model, and the trained small sample target detection model is used to perform target detection on the small sample to be detected.

优选地，所述的将小样本目标检测问题建模成一个基于自监督学习的数学优化问题，构建面向输入对数据扰动敏感的小样本目标检测模型包括：Preferably, the small-sample target detection problem is modeled as a mathematical optimization problem based on self-supervised learning, and building an input-oriented small-sample target detection model that is sensitive to data disturbances includes:

(1)第一阶段构建数据集D_train，其中包含基础类的所有训练数据；(1) In the first stage, the data set D _train is constructed, which contains all training data of the basic class;

(2)第二阶段构建基础数据集D_base，数据集D_base的类别信息与数据集D_train相同，每一类训练数据的数量与小样本目标数据集D_novel的数量相同；(2) In the second stage, a basic data set D _base is constructed. The category information of the data set D _base is the same as the data set D _train . The number of training data of each category is the same as the number of small sample target data set D _novel ;

(3)第二阶段构建小样本目标数据集D_novel，其中，类别信息与第一阶段数据集D_train、第二阶段基础数据集D_base不同，每一类训练数据的样本数量与第二阶段基础数据集D_base相同；(3) In the second stage, a small sample target data set D _novel is constructed, in which the category information is different from the first stage data set D _train and the second stage basic data set D _base . The number of samples of each type of training data is different from that in the second stage. The basic data set D _base is the same;

(4)使用对比损失进行特征一致性约束，提出基于预测分布的对比损失对样本的预测分布进行一致性约束，通过构建正负样本对，其中，a表示样本对，a_p表示正样本对，a_n表示负样本对，y_a表示样本对的标签，S，S⁺，S^-表示用于构建正负样本对的样本特征，S表示基准样本所对应的特征，S⁺表示与基准样本类别相同，且IoU值最大的样本特征，S^-表示与基准样本类别不同的样本特征，即a_p＝{S，S⁺}，a_n＝{S，S^-}。(4) Use contrast loss for feature consistency constraints, and propose a contrast loss based on the prediction distribution to constrain the prediction distribution of the sample by constructing a positive and negative sample pair, where a represents the sample pair, a _p represents the positive sample pair, a _n represents the negative sample pair, y _a represents the label of the sample pair, S, S ⁺ , S ^- represents the sample features used to construct positive and negative sample pairs, S represents the features corresponding to the benchmark sample, S ⁺ represents the sample features that are in the same category as the benchmark sample and have the largest IoU value, S ^- represents the features that are consistent with the benchmark sample Sample characteristics of different sample categories, that is, a _p = {S, S ⁺ }, a _n = {S, S ^- }.

优选地，所述的设计小样本目标检测模型的优化目标函数，包括：Preferably, the optimization objective function of designing a small sample target detection model includes:

设定小样本目标检测模型的优化目标函数包括基类训练网络优化目标函数L_base＝L_rpn+L_cls+L_reg和微调网络优化目标函数L_{fine_tune}＝L_rpn+L_cls+L_reg+L_contrastive+L_{contrastive-JS}，微调网络在基类训练网络的基础上增加了对比优化目标函数；The optimization objective function for setting the small sample target detection model includes the base class training network optimization objective function L _base =L _rpn +L _cls +L _reg and the fine-tuning network optimization objective function L _{fine_tune} =L _rpn +L _cls +L _reg +L _contrastive +L _{contrastive-JS} , the fine-tuning network adds a contrastive optimization objective function based on the base training network;

1：L_rpn为区域提取网络损失函数，计算方法如公式(1)所示：1: L _rpn is the area extraction network loss function, and the calculation method is as shown in formula (1):

区域提取网络的损失函数分为分类损失函数L_{rpn_cls}和边界框回归损失函数L_{rpn_reg}两部分，L_{rpn_cls}用于分类锚框为正负样本的网络训练，其完整描述如公式(2)所示，L_{rpn_reg}用于边界框回归网络训练，完整公式描述如公式(3)所示，其中，N_{rpn_cls}表示在区域提取网络中训练样本的批量大小，N_{rpn_reg}表示区域提取网络所生成的锚框数量，表示第i个锚框对应的真实分类概率，/> The loss function of the region extraction network is divided into two parts: the classification loss function L _{rpn_cls} and the bounding box regression loss function L _{rpn_reg} . L _{rpn_cls} is used for network training in which the classification anchor boxes are positive and negative samples. Its complete description is shown in formula (2), L _{rpn_reg} is used for bounding box regression network training. The complete formula description is as shown in formula (3), where N _{rpn_cls} represents the batch size of training samples in the region extraction network, and N _{rpn_reg} represents the number of anchor boxes generated by the region extraction network. Represents the true classification probability corresponding to the i-th anchor box,/>

L_{rpn_cls}使用交叉熵计算锚框内是否包含目标的损失，是一个二分类损失，p_i表示第i个锚框的预测分类概率，表示第i个锚框对应的真实分类概率，该函数用来判断提取到的图像区域是否包含物体；L _{rpn_cls} uses cross entropy to calculate the loss of whether the anchor box contains the target. It is a two-class loss. p _i represents the predicted classification probability of the i-th anchor box. Represents the true classification probability corresponding to the i-th anchor box. This function is used to determine whether the extracted image area contains an object;

的一般化表示如公式(4)所示，t_i＝{t_x，t_y，t_w，t_h}表示第i个锚框的边界框预测回归参数，/>表示第i个锚框对应的真值框的回归参数，t_i，/>计算过程如公式(5)和公式(6)所示。 The general expression of is shown in formula (4), t _i ={t _x , t _y , t _w , t _h } represents the bounding box prediction regression parameter of the i-th anchor box,/> Indicates the regression parameters of the ground truth box corresponding to the i-th anchor box, t _i ,/> The calculation process is shown in formula (5) and formula (6).

t_x＝(x-x_anchor)/w_anchor，t_y＝(y-y_anchor)/h_anchor t _x = (xx _anchor )/w _anchor , t _y = (yy _anchor )/h _anchor

t_w＝log(w/w_anchor)，t_h＝log(h/h_anchor) (5)t _w =log(w/w _anchor ), t _h =log(h/h _anchor ) (5)

x，y表示预测边界框中心点坐标，w，h表示预测边界框的宽度和高度，x_anchor，y_anchor表示当前锚框的中心点坐标，w_anchor，h_anchor表示当前锚框的宽度和高度。x, y represent the coordinates of the center point of the predicted bounding box, w, h represent the width and height of the predicted bounding box, x _anchor , y _anchor represent the center point coordinates of the current anchor box, w _anchor , h _anchor represent the width and height of the current anchor box .

x^*，y^*表示图像中对象的真实边界框中心点坐标，w^*，h^*表示图像中对象的真实边界框的宽度和高度；x ^* , y ^* represent the center point coordinates of the real bounding box of the object in the image, w ^* , h ^* represent the width and height of the real bounding box of the object in the image;

2：分类损失函数L_cls的计算公式如下：2: The calculation formula of the classification loss function L _cls is as follows:

目标检测网络中使用交叉熵作为分类损失函数，其中，s_i表示第i个检测框，p_i表示第i个检测框的预测分类概率，表示第i个检测框的分类真值，该函数为网络的分类行为提供依据，通过该函数判断网络对于检测区域的物体类别分类是否准确，并对不准确物体通过计算损失值进行模型更新；Cross entropy is used as the classification loss function in the target detection network, where s _i represents the i-th detection frame, p _i represents the predicted classification probability of the i-th detection frame, Represents the true classification value of the i-th detection frame. This function provides a basis for the classification behavior of the network. Through this function, it is judged whether the network is accurate in classifying the object category in the detection area, and the model is updated by calculating the loss value for inaccurate objects;

3：边界框回归损失函数L_reg的计算公式如下：3: The calculation formula of the bounding box regression loss function L _reg is as follows:

t_i和分别表示第i个检测框的边界框参数化坐标的预测值和真实值，/>是平滑损失，通过该函数进一步对检测区域的位置信息进行调整；t _i and Respectively represent the predicted value and true value of the parameterized coordinates of the bounding box of the i-th detection box,/> is the smoothing loss, through which the position information of the detection area is further adjusted;

4：对比损失函数L_contrastive的计算公式如下：4: The calculation formula of the contrastive loss function L _contrastive is as follows:

构建s，S⁺，S^-样本特征，构建正样本对a_p＝{S，S⁺}，负样本对a_n＝{S，S^-}，D_a表示正样本对a_p或负样本对a_n之间的欧几里得距离，y_a表示样本对a的标注，即当前样本对为正样本对a_p时，模型会以最小化更新样本和正样本之间的距离；/>m表示样本对距离上界，当样本与负样本距离大于m时，损失值等于0，不更新模型；否则将更新模型，直到负样本对的距离达到m；Construct s, S ⁺ , S ^- sample features, construct a positive sample pair a _p = {S, S ⁺ }, a negative sample pair a _n = {S, S ^- }, D _a represents a positive sample pair a _p or a negative sample pair The Euclidean distance between a _n , y _a represents the labeling of a by the sample, That is, when the current sample pair is a positive sample pair a _p , the model will minimize the distance between the updated sample and the positive sample;/> m represents the upper bound of the distance between the sample pairs. When the distance between the sample and the negative sample is greater than m, the loss value is equal to 0 and the model is not updated; otherwise, the model will be updated until the distance between the negative sample pair reaches m;

5：Contrastive-JS损失函数L_{contrastive-JS}的计算公式如下：5: The calculation formula of Contrastive-JS loss function L _{contrastive-JS} is as follows:

其中，p_a是样本对a的预测分布，y_a是当前样本对的标注。p_a[i]表示样本对中的第i个预测分布，m′表示样本对距离上界，与公式(9)中m含义相同。Among them, p _a is the predicted distribution of sample pair a, and y _a is the label of the current sample pair. p _a [i] represents the i-th predicted distribution in the sample pair, m′ represents the upper bound of the sample pair distance, which has the same meaning as m in formula (9).

优选地，所述的基于优化目标函数使用深度学习更新过程对小样本目标检测模型进行训练，得到训练好的小样本目标检测模型，包括：Preferably, the small sample target detection model is trained using a deep learning update process based on the optimization objective function to obtain a trained small sample target detection model, including:

通过两阶段深度学习模型更新过程使用优化目标函数对小样本目标检测模型进行训练，两阶段深度学习过程由数据训练和小样本数据微调两个阶段组成，其中，第一阶段使用训练样本对整个检测框架进行训练，得到模型在基础样本上的模型参数；第二阶段首先使用第一阶段的模型参数对网络进行参数初始化，并将特征提取模块的参数进行固定，然后，使用小样本数据集对模型参数进行微调，在第二阶段引入基于自监督学习的一致性策略对样本的特征表达和分布表达进行约束，最终完成小样本目标检测模型的训练，得到训练好的小样本目标检测模型。The small sample target detection model is trained using an optimized objective function through a two-stage deep learning model update process. The two-stage deep learning process consists of two stages: data training and small sample data fine-tuning. In the first stage, the training sample is used to train the entire detection model. The framework is trained to obtain the model parameters of the model based on basic samples; in the second stage, the model parameters of the first stage are first used to initialize the network parameters, and the parameters of the feature extraction module are fixed. Then, the small sample data set is used to evaluate the model. The parameters are fine-tuned, and in the second stage, a consistency strategy based on self-supervised learning is introduced to constrain the feature expression and distribution expression of the sample. Finally, the training of the small-sample target detection model is completed, and the trained small-sample target detection model is obtained.

步骤3-1：使用PASCAL VOC数据集产生数据集D_train，D_base以及D_noval，PASCAL VOC数据集中有20个类别，将其15个类别划分为基本类别和5个新类别，用基本类别的所有实例构建D_train，从新类别和基本类别中随机抽样K＝1、2、3、5、10个实例作为K-shot的D_base和D_noval；Step 3-1: Use the PASCAL VOC data set to generate data sets D _train , D _base and D _noval . There are 20 categories in the PASCAL VOC data set. Divide its 15 categories into basic categories and 5 new categories. Use the basic categories All instances construct D _train , and K=1, 2, 3, 5, and 10 instances are randomly sampled from the new category and the basic category as the D _base and D _noval of K-shot;

步骤3-2：创建以Faster-RCNN为基础框架的基类训练网络，选择ResNet101和特征金字塔作为特征提取网络，初始化模型参数，设置超参数，标准批量大小为16，创建标准SGD优化器，动量为0.9，权重衰减为1e-4；Step 3-2: Create a base training network with Faster-RCNN as the basic framework, select ResNet101 and feature pyramid as the feature extraction network, initialize model parameters, set hyperparameters, the standard batch size is 16, create a standard SGD optimizer, momentum is 0.9, and the weight decays to 1e-4;

步骤3-3：构建D_train数据加载器，并对原始输入进行数据增强；Step 3-3: Build the D _train data loader and perform data enhancement on the original input;

步骤3-4：对基类训练网络进行训练，计算每个基类训练样本的输出值，计算损失L_base，使用梯度下降算法更新网络参数；Step 3-4: Train the base class training network, calculate the output value of each base class training sample, calculate the loss L _base , and use the gradient descent algorithm to update the network parameters;

步骤3-5：如果模型收敛或者达到要求的训练步数，则结束基类训练网络训练过程，保存模型参数；否则，回到步骤4-2；Step 3-5: If the model converges or reaches the required number of training steps, end the base class training network training process and save the model parameters; otherwise, return to step 4-2;

步骤3-6：构建D_base以及D_noval数据加载器，创建微调网络模型，使用基类训练网络得到的模型参数初始化网络，创建优化器；Step 3-6: Build D _base and D _noval data loaders, create a fine-tuned network model, initialize the network using the model parameters obtained from the base class training network, and create an optimizer;

步骤3-7：训练微调网络，在基类训练网络的基础上，得到训练样本在感兴趣区域池化操作后产生的候选框特征图，遍历候选框特征图列表，并为每一个候选框特征图匹配一个类别相同的候选框特征图作为正例，匹配一个类别不同的候选框特征图作为负例，选取两个正例与当前样本组成2个正样本对，选取两个负例与当前样本组成2个负样本对，对取得的正样本对和负样本对的候选框特征图计算L_contrastive；得到训练样本在分类操作后产生的类别概率分布，对正样本对和负样本对的类别概率分布计算L_contrastive，计算每个训练样本的输出值，计算损失L_{fine_tune}，使用梯度下降算法更新网络参数；Step 3-7: Train the fine-tuning network. Based on the base class training network, obtain the candidate box feature map generated by the pooling operation of the training sample in the area of interest, traverse the candidate box feature map list, and provide each candidate box feature The image matches a candidate frame feature map of the same category as a positive example, matches a candidate frame feature map of a different category as a negative example, selects two positive examples and the current sample to form 2 positive sample pairs, selects two negative examples and the current sample Form two negative sample pairs, calculate L _contrastive for the obtained candidate box feature maps of the positive sample pair and negative sample pair; obtain the category probability distribution generated by the training sample after the classification operation, and compare the category probabilities of the positive sample pair and negative sample pair Distribution calculation L _contrastive , calculate the output value of each training sample, calculate the loss L _{fine_tune} , and use the gradient descent algorithm to update the network parameters;

步骤3-8：在PASCAL VOC 2007测试集上使用AP50用于新预测(nAP50)作为模型性能评估指标，观察模型收敛情况，如果模型收敛或者达到要求的训练步数，则结束微调网络训练过程；否则，回到步骤3-7。Step 3-8: Use AP50 for new prediction (nAP50) as the model performance evaluation index on the PASCAL VOC 2007 test set, and observe the model convergence. If the model converges or reaches the required number of training steps, the fine-tuning network training process ends; Otherwise, return to steps 3-7.

由上述本发明的实施例提供的技术方案可以看出，本发明提出的方法以两阶段学习过程为基础，使用迁移学习对领域知识进行学习，并在小样本数据集上进行模型微调。实验结果证明，本发明提出的方法在PASCAL-VOC数据集公开数据集上取得了良好的性能，可以有效提高模型在小样本目标检测问题上的性能，具有较强的实际应用意义。It can be seen from the technical solutions provided by the above embodiments of the present invention that the method proposed by the present invention is based on a two-stage learning process, uses transfer learning to learn domain knowledge, and performs model fine-tuning on a small sample data set. Experimental results prove that the method proposed by the present invention has achieved good performance on the PASCAL-VOC data set public data set, can effectively improve the performance of the model on small sample target detection problems, and has strong practical application significance.

本发明附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of the drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本发明实施例提供的一种基类训练网络示意图。Figure 1 is a schematic diagram of a base class training network provided by an embodiment of the present invention.

图2为本发明实施例提供的一种微调网络示意图。Figure 2 is a schematic diagram of a fine-tuning network provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施方式，所述实施方式的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be construed as limitations of the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。Those skilled in the art will understand that, unless expressly stated otherwise, the singular forms "a", "an", "the" and "the" used herein may also include the plural form. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connections or couplings. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样定义，不会用理想化或过于正式的含义来解释。It will be understood by one of ordinary skill in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be understood that terms such as those defined in general dictionaries are to be understood to have meanings consistent with their meaning in the context of the prior art, and are not to be taken in an idealized or overly formal sense unless defined as herein. explain.

为便于对本发明实施例的理解，下面将结合附图以几个具体实施例为例做进一步的解释说明，且各个实施例并不构成对本发明实施例的限定。In order to facilitate understanding of the embodiments of the present invention, several specific embodiments will be further explained below with reference to the accompanying drawings, and each embodiment does not constitute a limitation to the embodiments of the present invention.

自监督学习是一类无监督学习方法，自监督学习主要是利用辅助任务从大规模的无监督数据中挖掘自身的监督信息，通过构造监督信息对网络进行训练，从而学习到一种通用的特征表达用于下游任务。基于对比约束的自监督学习的方法主要是学习如何对相似和不相似的事物构建表征，使样本和正样本之间的距离远远大于样本和负样本之间的距离，即通过构建正样本和负样本，度量正负样本的距离来实现自监督学习。Self-supervised learning is a type of unsupervised learning method. Self-supervised learning mainly uses auxiliary tasks to mine its own supervisory information from large-scale unsupervised data, and trains the network by constructing supervisory information to learn a universal feature. Expressions are used for downstream tasks. The self-supervised learning method based on contrastive constraints mainly learns how to construct representations of similar and dissimilar things, so that the distance between samples and positive samples is much greater than the distance between samples and negative samples, that is, by constructing positive samples and negative samples Sample, measure the distance between positive and negative samples to achieve self-supervised learning.

本发明实施例提供的一种基于自监督对比约束下的小样本目标检测方法包括下面的步骤：A small sample target detection method based on self-supervised contrast constraints provided by the embodiment of the present invention includes the following steps:

步骤S1：针对小样本目标检测问题的特点，将小样本目标检测问题建模成一个基于自监督学习的数学优化问题，构建面向输入对数据扰动敏感的小样本目标检测模型。Step S1: Based on the characteristics of the small-sample target detection problem, model the small-sample target detection problem as a mathematical optimization problem based on self-supervised learning, and build an input-oriented small-sample target detection model that is sensitive to data disturbances.

步骤S2：设定小样本目标检测模型的优化目标函数。Step S2: Set the optimization objective function of the small sample target detection model.

步骤S3：基于优化目标函数使用深度学习更新过程对小样本目标检测模型进行训练，得到训练好的小样本目标检测模型，利用训练好的小样本目标检测模型对待检测的小样本进行目标检测。Step S3: Use the deep learning update process to train the small sample target detection model based on the optimized objective function to obtain the trained small sample target detection model, and use the trained small sample target detection model to perform target detection on the small sample to be detected.

具体地，上述步骤S1包括：Specifically, the above step S1 includes:

上述小样本目标检测模型具体表示如下：The above small sample target detection model is specifically expressed as follows:

(1)第一阶段构建数据集D_train，其中包含基础类的所有训练数据。本阶段的核心是为第二阶段模型训练提供初始化参数，同时训练得到模型提取模块，供第二阶段进行特征提取。为获得较好的模型提取模块，在第一阶段需要保证数据的充足性，因此D_train包含基类的全部类别信息与较为完备的数据信息；(1) In the first stage, the data set D _train is constructed, which contains all training data of the base class. The core of this stage is to provide initialization parameters for the second stage model training, and at the same time, the model extraction module is trained for feature extraction in the second stage. In order to obtain a better model extraction module, the adequacy of data needs to be ensured in the first stage, so D _train contains all category information of the base class and relatively complete data information;

(2)第二阶段构建基础数据集D_base，数据集D_base的类别信息与数据集D_train相同，每一类训练数据的数量与小样本目标数据集D_novel的数量相同。第二阶段为微调阶段，此阶段的主要目标是要得到在小样本数据集上表现良好的模型。使用D_train完成第一阶段训练后，为平衡目标数据集和基础数据集，需要在第二阶段使用一定的基础数据进行训练，即D_base。此时，D_base的类别信息与D_train相同，但图像数据数量与最终小样本数据集相同。总结来说，D_base和D_train来源相同，是为了实现模型在小样本目标数据上表现良好所构建的辅助数据；(2) In the second stage, a basic data set D _base is constructed. The category information of the data set D _base is the same as the data set D _train . The number of training data for each category is the same as the number of small sample target data sets D _novel . The second stage is the fine-tuning stage. The main goal of this stage is to obtain a model that performs well on a small sample data set. After completing the first phase of training using D _train , in order to balance the target data set and the basic data set, it is necessary to use certain basic data for training in the second phase, namely D _base . At this time, the category information of D _base is the same as that of D _train , but the number of image data is the same as the final small sample data set. In summary, D _base and D _train come from the same source and are auxiliary data constructed to achieve good performance of the model on small sample target data;

(3)第二阶段构建小样本目标数据集D_novel，其中，类别信息与第一阶段训练集、第二阶段基础数据集不同，每一类训练数据的样本数量与第二阶段基础数据集D_base相同。D_novel为评价模型在小样本目标检测问题上的核心数据。为体现小样本学习特点，数据集中的样本数量十分有限，不同评价指标下的数量不同；(3) In the second stage, a small sample target data set D _novel is constructed, in which the category information is different from the first stage training set and the second stage basic data set, and the number of samples of each type of training data is different from the second stage basic data set D _base is the same. D _novel is the core data for evaluating the model on small sample target detection problems. In order to reflect the characteristics of small sample learning, the number of samples in the data set is very limited, and the number under different evaluation indicators is different;

(4)小样本问题最大的特点是训练数据不足，为使有限的数据得到充分的利用，本发明提出使用对比损失进行特征一致性约束，并提出基于预测分布的对比损失对样本的预测分布进行一致性约束，通过构建正负样本对，使模型在训练过程中，可以充分学习同类样本中的一致性特征，并且对不同类样本之间的特征进行有效区分。其中，a表示样本对，a_p表示正样本对，a_n表示负样本对，y_a表示样本对的标签，S，S⁺，S^-表示用于构建正负样本对的样本特征，S表示基准样本所对应的特征，S⁺表示与基准样本类别相同，且IoU值最大的样本特征，S^-表示与基准样本类别不同的样本特征，即a_p＝{S，S⁺}，a_n＝{S，S^-}。(4) The biggest feature of the small sample problem is insufficient training data. In order to make full use of the limited data, the present invention proposes to use contrast loss to constrain feature consistency, and proposes contrast loss based on the prediction distribution to perform prediction distribution of the sample. Consistency constraints, by constructing pairs of positive and negative samples, enable the model to fully learn the consistency features of similar samples during the training process, and effectively distinguish the features between samples of different types. Among them, a represents a sample pair, a _p represents a positive sample pair, a _n represents a negative sample pair, y _a represents the label of the sample pair, S, S ⁺ , S ^- represents the sample features used to construct positive and negative sample pairs, S represents the features corresponding to the benchmark sample, S ⁺ represents the sample features that are in the same category as the benchmark sample and have the largest IoU value, S ^- represents the features that are consistent with the benchmark sample Sample characteristics of different sample categories, that is, a _p = {S, S ⁺ }, a _n = {S, S ^- }.

上述特征一致性约束的具体约束形式如下面的公式(9)和(10)所示。The specific constraint forms of the above feature consistency constraints are shown in the following formulas (9) and (10).

具体的，上述步骤S2包括：Specifically, the above step S2 includes:

设定小样本目标检测模型的优化目标函数。本发明采用两阶段网络训练过程，首先，在大量的基类数据集上训练基类训练网络，然后，在一个平衡数据集上进行微调，因此，本发明设定的小样本目标检测模型的优化目标函数可分为基类训练网络优化目标函数L_base＝L_rpn+L_cls+L_reg和微调网络优化目标函数L_{fine_tune}＝L_rpn+L_cls+L_reg+L_contrastwe+L_{contrastwe-JS}。微调网络的目标是在可用训练数据较少的情况下，最大化不同类别对象之间的差异，最小化相同类别对象之间的差异，具体来说，微调网络在基类训练网络的基础上增加了对比优化目标函数。Set the optimization objective function of the small sample target detection model. The present invention adopts a two-stage network training process. First, the base class training network is trained on a large number of base class data sets, and then fine-tuned on a balanced data set. Therefore, the optimization of the small sample target detection model set by the present invention The objective function can be divided into the base class training network optimization objective function L _base =L _rpn +L _cls +L _reg and the fine-tuning network optimization objective function L _{fine_tune} =L _rpn +L _cls +L _reg +L _contrastwe +L _{contrastwe-JS} . The goal of the fine-tuning network is to maximize the differences between objects of different categories and minimize the differences between objects of the same category when there is less available training data. Specifically, the fine-tuning network is added on the basis of the base class training network Compare and optimize the objective function.

(1)区域提取网络损失函数(1)Region extraction network loss function

区域提取网络的作用是筛选出可能会有目标的锚框，具体来说，区域提取网络实现了两大功能：1)判断锚框内是物体还是背景，通过NMS(non maximum suppression，非极大值抑制)的方法，筛选出指定个数的锚框，设定IoU阈值，认为IoU大于给定阈值上限的锚框内包含目标，即为正样本，IoU小于给定给定阈值下限的锚框为背景，即为负样本。其他的锚框则不参与训练；2)坐标修正，回归问题，即找到锚框与真值框的映射关系，可以通过平移和缩放实现，当锚框和真值框比较接近时，认为预测边界框与真值框之间的变换是一种线性变换，可以使用线性回归模型对边界框参数坐标进行微调。在得到每一个锚框的修正参数之后，就可以计算出精确的锚框参数坐标。区域提取网络损失函数完整描述如公式(1)所示。The function of the region extraction network is to filter out anchor frames that may have targets. Specifically, the region extraction network implements two major functions: 1) Determine whether the anchor frame is an object or background, and use NMS (non maximum suppression, non-maximum suppression) to determine whether the anchor frame is an object or a background. value suppression) method, filter out a specified number of anchor boxes, set the IoU threshold, and consider that the anchor box with IoU greater than the upper limit of the given threshold contains the target, which is a positive sample, and the anchor box with IoU less than the lower limit of the given threshold. is the background, which is a negative sample. Other anchor boxes do not participate in training; 2) Coordinate correction and regression problems, that is, finding the mapping relationship between the anchor box and the ground truth box, can be achieved through translation and scaling. When the anchor box and the ground truth box are relatively close, the prediction boundary is considered The transformation between the box and the ground truth box is a linear transformation, and a linear regression model can be used to fine-tune the bounding box parameter coordinates. After obtaining the correction parameters of each anchor box, the precise anchor box parameter coordinates can be calculated. The complete description of the region extraction network loss function is shown in formula (1).

区域提取网络的损失函数分为分类损失函数L_{rpn_cls}和边界框回归损失函数L_{rpn_reg}两部分，L_{rpn_cls}用于分类锚框为正负样本的网络训练，其完整描述如公式(2)所示，L_{rpn_reg}用于边界框回归网络训练，完整公式描述如公式(3)所示。其中，N_{rpn_cls}表示在区域提取网络中训练样本的批量大小，N_{rpn_reg}表示区域提取网络所生成的锚框数量。表示第i个锚框对应的真实分类概率，/>λ为权重平衡参数。The loss function of the region extraction network is divided into two parts: the classification loss function L _{rpn_cls} and the bounding box regression loss function L _{rpn_reg} . L _{rpn_cls} is used for network training in which the classification anchor boxes are positive and negative samples. Its complete description is shown in formula (2), L _{rpn_reg} is used for bounding box regression network training, and the complete formula description is shown in formula (3). Among them, N _{rpn_cls} represents the batch size of training samples in the region extraction network, and N _{rpn_reg} represents the number of anchor boxes generated by the region extraction network. Represents the true classification probability corresponding to the i-th anchor box,/> λ is the weight balance parameter.

L_{rpn_cls}使用交叉熵计算锚框内是否包含目标的损失，是一个二分类损失，p_i表示第i个锚框的预测分类概率，表示第i个锚框对应的真实分类概率。该函数用来判断提取到的图像区域是否包含物体，主要用于区分当前区域是前景还是背景。L _{rpn_cls} uses cross entropy to calculate the loss of whether the anchor box contains the target. It is a two-class loss. p _i represents the predicted classification probability of the i-th anchor box. Indicates the true classification probability corresponding to the i-th anchor box. This function is used to determine whether the extracted image area contains objects. It is mainly used to distinguish whether the current area is the foreground or the background.

L_{rpn_reg}目的在于调整区域提取网络中提名区域的位置信息，以指导区域提取网络提取得到更加准确的物体位置，它使用计算预测边界框与真实边界框的差距，的一般化表示如公式(4)所示，t_i＝{t_x，t_y，t_w，t_h}表示第i个锚框的边界框预测回归参数，/>表示第i个锚框对应的真值框的回归参数，t_i，/>计算过程如公式(5)和公式(6)所示。The purpose of L _{rpn_reg} is to adjust the position information of the nominated area in the region extraction network to guide the region extraction network to extract more accurate object positions. It uses Calculate the difference between the predicted bounding box and the real bounding box, The general expression of is shown in formula (4), t _i ={t _x , t _y , t _w , t _h } represents the bounding box prediction regression parameter of the i-th anchor box,/> Indicates the regression parameters of the ground truth box corresponding to the i-th anchor box, t _i ,/> The calculation process is shown in formula (5) and formula (6).

x^*，y^*表示图像中对象的真实边界框中心点坐标，w^*，h^*表示图像中对象的真实边界框的宽度和高度。x ^* , y ^* represent the center point coordinates of the real bounding box of the object in the image, w ^* , h ^* represent the width and height of the real bounding box of the object in the image.

(2)分类损失函数(2)Classification loss function

目标检测网络中使用交叉熵作为分类损失函数，其中，s_i表示第i个检测框，p_i表示第i个检测框的预测分类概率，表示第i个检测框的分类真值，该函数为网络的分类行为提供依据，通过该函数可以判断网络对于检测区域的物体类别分类是否准确，并对不准确物体通过计算损失值进行模型更新。Cross entropy is used as the classification loss function in the target detection network, where s _i represents the i-th detection frame, p _i represents the predicted classification probability of the i-th detection frame, Represents the true classification value of the i-th detection frame. This function provides a basis for the classification behavior of the network. Through this function, it can be judged whether the network has accurately classified the object category in the detection area, and the model can be updated by calculating the loss value for inaccurate objects.

(3)边界框回归损失函数(3) Bounding box regression loss function

目标检测网络中使用的边界框回归损失函数与公式(3)相同，其中，t_i和分别表示第i个检测框的边界框参数化坐标的预测值和真实值。/>是平滑损失。通过该函数可以进一步对检测区域的位置信息进行调整。The bounding box regression loss function used in the target detection network is the same as formula (3), where t _i and Represents the predicted value and true value of the parameterized coordinates of the bounding box of the i-th detection box respectively. /> is the smoothing loss. This function can further adjust the position information of the detection area.

(4)对比损失函数(4)Contrast loss function

由于检测框可以看做真实目标值的扰动变体，因此对比损失函数通过构建检测框正样本对a_p和负样本对a_n，缩小正样本对a_p的距离，扩大负样本对a_n的距离。通过控制样本对在模型训练过程中的特征表达，使相同类别的物体在模型中的特征表达更加接近，不同类别物体的特征表达差异更加明显，以达到更好学习检测框特征表达的目的。Since the detection frame can be regarded as a perturbation variant of the real target value, the contrast loss function reduces the distance between the positive sample pair a _p and expands the distance between the negative sample pair a _n by constructing the detection frame positive sample pair a _p and the negative sample pair a _n distance. By controlling the feature expression of sample pairs in the model training process, the feature expressions of objects of the same category in the model are closer, and the feature expression differences of objects of different categories are more obvious, so as to achieve the purpose of better learning the feature expression of detection frames.

构建s，S⁺，S^-样本特征，构建正样本对a_p＝{S，S⁺}，负样本对a_n＝{S，S^-}，D_a表示正样本对a_p或负样本对a_n之间的欧几里得距离，y_a表示样本对a的标注，即当前样本对为正样本对a_p时，模型会以最小化更新样本和正样本之间的距离；/>m表示样本对距离上界，当样本与负样本距离大于m时，损失值等于0，不更新模型；否则将更新模型，直到负样本对的距离达到m。Construct s, S ⁺ , S ^- sample features, construct a positive sample pair a _p = {S, S ⁺ }, a negative sample pair a _n = {S, S ^- }, D _a represents a positive sample pair a _p or a negative sample pair The Euclidean distance between a _n , y _a represents the labeling of a by the sample, That is, when the current sample pair is a positive sample pair a _p , the model will minimize the distance between the updated sample and the positive sample;/> m represents the upper bound of the distance between the sample pairs. When the distance between the sample and the negative sample is greater than m, the loss value is equal to 0 and the model is not updated; otherwise, the model will be updated until the distance between the negative sample pair reaches m.

(5)Contrastive-JS损失函数(5)Contrastive-JS loss function

为了将对比约束的作用进行扩大，本发明除了使用对比损失函数来约束特征学习过程之外，还提出了Contrastive-JS损失函数为预测分布提供指导，从而对分类器生成的预测分布进行一致性约束，使模型对于物体的类别信息更加敏感，其具体形式如公式(10)所示。In order to expand the role of contrastive constraints, in addition to using the contrastive loss function to constrain the feature learning process, the present invention also proposes a Contrastive-JS loss function to provide guidance for the prediction distribution, thereby constraining the prediction distribution generated by the classifier. , making the model more sensitive to the category information of the object, its specific form is shown in formula (10).

其中，p_a是样本对a的预测分布，y_a是当前样本对的标注。p_a[i]表示样本对中的第i个预测分布。m′表示样本对距离上界，与公式(9)中m含义相同。Among them, p _a is the predicted distribution of sample pair a, and y _a is the label of the current sample pair. p _a [i] represents the i-th predicted distribution in the sample pair. m′ represents the upper bound of the sample pair distance, which has the same meaning as m in formula (9).

具体的，上述步骤S3包括：Specifically, the above step S3 includes:

针对小样本目标检测问题，本发明构建两阶段深度学习模型更新过程，使用优化目标函数对小样本目标检测模型进行训练。两阶段学习过程由充足数据训练和小样本数据微调两个阶段组成。其中，第一阶段使用充足的训练样本对整个检测框架进行训练，得到模型在基础样本上的模型参数；第二阶段首先使用第一阶段的模型参数对网络进行参数初始化，并将特征提取模块的参数进行固定，使用小样本数据集对模型参数进行微调，此外，在第二阶段引入本发明所提出的基于自监督学习的一致性策略对样本的特征表达和分布表达进行约束，最终完成模型训练。Aiming at the problem of small sample target detection, the present invention constructs a two-stage deep learning model update process and uses the optimization objective function to train the small sample target detection model. The two-stage learning process consists of two stages: sufficient data training and small sample data fine-tuning. Among them, the first stage uses sufficient training samples to train the entire detection framework to obtain the model parameters of the model on the basic samples; the second stage first uses the model parameters of the first stage to initialize the parameters of the network, and sets the parameters of the feature extraction module. The parameters are fixed, and the small sample data set is used to fine-tune the model parameters. In addition, in the second stage, the consistency strategy based on self-supervised learning proposed by the present invention is introduced to constrain the characteristic expression and distribution expression of the sample, and finally completes the model training. .

具体过程如下：The specific process is as follows:

步骤3-1：使用PASCAL VOC数据集产生数据集D_train，D_base以及D_noval，PASCAL VOC数据集中有20个类别，将其15个类别划分为基本类别和5个新类别。用基本类别的所有实例构建D_train，从新类别和基本类别中随机抽样K＝1、2、3、5、10个实例作为K-shot的D_base和D_noval。在PASCAL VOC上进行划分时，采用三种不同的随机分区方式，称为Split1、Split2和Split3。Step 3-1: Use the PASCAL VOC data set to generate data sets D _train , D _base and D _noval . There are 20 categories in the PASCAL VOC data set, and its 15 categories are divided into basic categories and 5 new categories. Construct D _train with all instances of the basic category, and randomly sample K = 1, 2, 3, 5, 10 instances from the new category and the basic category as D _base and D _noval of K-shot. When dividing on PASCAL VOC, three different random partitioning methods are used, called Split1, Split2 and Split3.

步骤3-2：创建以Faster-RCNN为基础框架的基类训练网络，选择ResNet101和特征金字塔作为特征提取网络，初始化模型参数，设置超参数，标准批量大小为16。创建标准SGD优化器，动量为0.9，权重衰减为1e-4。Step 3-2: Create a base training network with Faster-RCNN as the basic framework, select ResNet101 and feature pyramid as the feature extraction network, initialize model parameters, set hyperparameters, and the standard batch size is 16. Create a standard SGD optimizer with a momentum of 0.9 and a weight decay of 1e-4.

步骤3-3：构建D_train数据加载器，并对原始输入进行数据增强。Step 3-3: Build the D _train data loader and perform data augmentation on the original input.

步骤3-4：对基类训练网络进行训练，计算每个基类训练样本的输出值，计算损失L_base，使用梯度下降算法更新网络参数。Step 3-4: Train the base class training network, calculate the output value of each base class training sample, calculate the loss L _base , and use the gradient descent algorithm to update the network parameters.

步骤3-5：如果模型收敛或者达到要求的训练步数，则结束基类训练网络训练过程，保存模型参数；否则，回到步骤4-2。Step 3-5: If the model converges or reaches the required number of training steps, end the base class training network training process and save the model parameters; otherwise, return to step 4-2.

步骤3-6：构建D_base以及D_noval数据加载器，创建微调网络模型，使用基类训练网络得到的模型参数初始化网络，创建优化器。Step 3-6: Build D _base and D _noval data loaders, create a fine-tuned network model, initialize the network using the model parameters obtained from the base class training network, and create an optimizer.

步骤3-7：训练微调网络，在基类训练网络的基础上，得到训练样本在感兴趣区域池化操作后产生的候选框特征图，遍历候选框特征图列表，并为每一个候选框特征图匹配一个类别相同的候选框特征图作为正例，匹配一个类别不同的候选框特征图作为负例，选取两个正例与当前样本组成2个正样本对，选取两个负例与当前样本组成2个负样本对，对取得的正样本对和负样本对的候选框特征图计算L_contrastive；得到训练样本在分类操作后产生的类别概率分布，对正样本对和负样本对的类别概率分布计算L_contrastive，计算每个训练样本的输出值，计算损失L_{fine_tune}，使用梯度下降算法更新网络参数。Step 3-7: Train the fine-tuning network. Based on the base class training network, obtain the candidate box feature map generated by the pooling operation of the training sample in the area of interest, traverse the candidate box feature map list, and provide each candidate box feature The image matches a candidate frame feature map of the same category as a positive example, matches a candidate frame feature map of a different category as a negative example, selects two positive examples and the current sample to form 2 positive sample pairs, selects two negative examples and the current sample Form two negative sample pairs, calculate L _contrastive for the obtained candidate box feature maps of the positive sample pair and negative sample pair; obtain the category probability distribution generated by the training sample after the classification operation, and compare the category probabilities of the positive sample pair and negative sample pair Distribution calculation L _contrastive , calculate the output value of each training sample, calculate the loss L _{fine_tune} , and use the gradient descent algorithm to update the network parameters.

步骤3-8：在PASCAL VOC 2007测试集上使用AP50作为模型性能评估指标，用于评价模型在新类别上的性能表现，观察模型收敛情况，如果模型收敛或者达到要求的训练步数，则结束微调网络训练过程；否则，回到步骤3-7。Step 3-8: Use AP50 as the model performance evaluation index on the PASCAL VOC 2007 test set to evaluate the performance of the model on new categories and observe the model convergence. If the model converges or reaches the required number of training steps, it will end. Fine-tune the network training process; otherwise, return to steps 3-7.

实验结果Experimental results

本发明设计的方法与之前的小样本目标检测算法对比结果如表1所示。如表所示，本发明在不同基类与新类设置方式即不同数据集分割方式上均取得了最高的准确率，在新类实例个数为1时，相比于较好的小样本目标检测算法检测准确率最高提高了7.0％。The comparison results between the method designed in this invention and previous small sample target detection algorithms are shown in Table 1. As shown in the table, the present invention has achieved the highest accuracy in different base class and new class setting methods, that is, different data set segmentation methods. When the number of new class instances is 1, compared with the better small sample target The detection accuracy of the detection algorithm has increased by up to 7.0%.

表1本发明在不同数据划分下的对比实验结果Table 1 Comparative experimental results of the present invention under different data divisions

综上所述，本发明提供的基于自监督对比约束下的小样本目标检测方法使用自监督对比约束增强小样本目标检测效果，与通过全连接层间接调整区域生成网络和特征金字塔参数的传统算法相比，本发明直接影响特征提取，具体来说，本发明直接限制区域生成网络和特征金字塔的参数更新，没有将新的参数引入网络，没有增加额外的计算量。In summary, the small-sample target detection method based on self-supervised contrast constraints provided by the present invention uses self-supervised contrast constraints to enhance the small-sample target detection effect, and is different from the traditional algorithm that indirectly adjusts the region generation network and feature pyramid parameters through a fully connected layer. In contrast, the present invention directly affects feature extraction. Specifically, the present invention directly limits the parameter updates of the region generation network and feature pyramid, without introducing new parameters into the network and without adding additional calculations.

本发明使用自监督对比约束增强目标检测网络，提供了自监督学习在小样本目标检测的研究价值，与传统算法相比，本发明用对比损失增强小样本目标的分类和定位能力，可稳定的提高不同实例数目下类别的目标检测能力。The present invention uses self-supervised contrast constraints to enhance the target detection network, which provides the research value of self-supervised learning in small sample target detection. Compared with traditional algorithms, the present invention uses contrast loss to enhance the classification and positioning capabilities of small sample targets, which can stably Improve the target detection ability of categories under different number of instances.

本领域普通技术人员可以理解：附图只是一个实施例的示意图，附图中的模块或流程并不一定是实施本发明所必须的。Those of ordinary skill in the art can understand that the accompanying drawing is only a schematic diagram of an embodiment, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present invention.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。From the above description of the embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product can be stored in a storage medium, such as ROM/RAM, disk , optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of the present invention.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置或系统实施例而言，由于其基本相似于方法实施例，所以描述得比较简单，相关之处参见方法实施例的部分说明即可。以上所描述的装置及系统实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。Each embodiment in this specification is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, the device or system embodiments are described simply because they are basically similar to the method embodiments. For relevant details, please refer to the partial description of the method embodiments. The device and system embodiments described above are only illustrative, in which the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, It can be located in one place, or it can be distributed over multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above are only preferred specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or modifications within the technical scope disclosed in the present invention. All substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The small sample target detection method based on self-supervision comparison constraint is characterized by comprising the following steps of:

modeling a small sample target detection problem into a mathematical optimization problem based on self-supervision learning, and constructing a small sample target detection model which is sensitive to data disturbance and is input-oriented, wherein the method specifically comprises the following steps of:

(1) First phase construction data set D _train All training data of the base class are contained therein;

(2) Second phase building of basic dataset D _base Data set D _base Category information and data set D of (a) _train The same quantity of each type of training data as the small sample target data set D _novel The number of (3) is the same;

(3) Second stage construction of Small sample target dataset D _novel Wherein the category information is associated with the first stage data set D _train Second stage basic data set D _base Different, the number of samples of each type of training data is equal to the second stage basic data set D _base The same;

(4) Feature consistency constraint is carried out by using a contrast loss function, the consistency constraint is carried out on the prediction distribution of the sample by the contrast loss function based on the prediction distribution, and a positive and negative sample pair a= { a is constructed _p ，a _n Sum of corresponding labelsFeature set { S, S ⁺ ，S ^- Wherein a represents a sample pair, a _p Represents a positive sample pair, a _n Representing negative sample pairs, y _a A label representing a sample pair->S，S ⁺ ，S ^- Representing sample characteristics for constructing positive and negative sample pairs, S representing characteristics corresponding to a reference sample, S ⁺ Represents the sample characteristics of the same class as the reference sample and the maximum IoU value, S ^- Representing sample characteristics different from the reference sample class, i.e. a _p ＝{S，S ⁺ }，a _n ＝{S，S ^- }；

Designing an optimized objective function of a small sample objective detection model;

training the small sample target detection model by using a deep learning updating process based on an optimized target function to obtain a trained small sample target detection model, which specifically comprises the following steps: training a small sample target detection model by using an optimized objective function through a two-stage deep learning model updating process, wherein the two-stage deep learning process consists of two stages of data training and small sample data fine tuning, and the first stage uses a training sample to train the whole detection frame to obtain model parameters of the model on a basic sample; the second stage firstly uses the model parameters of the first stage to initialize the parameters of the network, fixes the parameters of the feature extraction module, then uses the small sample data set to fine tune the model parameters, introduces the consistency strategy based on self-supervision learning to restrain the feature expression and the distribution expression of the sample in the second stage, and finally completes the training of the small sample target detection model to obtain a trained small sample target detection model;

and carrying out target detection on the small sample to be detected by using the trained small sample target detection model.

2. The method of claim 1, wherein the designing the optimized objective function of the small sample objective detection model comprises:

the optimization objective function for setting the small sample target detection model comprises a basic class training network optimization objective function L _base ＝L _rpn +L _cls +L _reg Trimming network optimization objective function L _{fine_tune} ＝L _rpn +L _cls +L _reg +L _contrastive +L _{contrastive-JS} The fine tuning network increases a contrast optimization objective function on the basis of the basic class training network;

1：L _rpn for the area extraction network loss function, the calculation method is as shown in formula (1):

the loss function of the area extraction network is divided into a classification loss function L _{rpn_cls} And bounding box regression loss function L _{rpn_reg} Two parts, L _{rpn_cls} Network training for classifying positive and negative samples of anchor frames, wherein the complete description of the network training is shown in a formula (2), L _{rpn_reg} For bounding box regression network training, the complete formula description is shown as formula (3), where N _{rpn_cls} Representing the batch size, N, of training samples in a regional extraction network _{rpn_reg} Representing the number of anchor boxes generated by the area extraction network,representing the true classification probability corresponding to the ith anchor box,/->λ is the weight balance parameter:

L _{rpn_cls} using cross entropy to calculate whether the loss of object is contained in anchor frame is a two-class loss, p _i Representing the prediction classification probability of the ith anchor box,representing the true classification probability corresponding to the ith anchor frame, wherein the function is used for judging whether the extracted image area contains an object or not;

the generalized representation of (c) is shown in equation (4), t _i ＝{t _x ，t _y ，t _w ，t _h The boundary box predictive regression parameters of the ith anchor box, +.>Regression parameters, t, representing the truth box corresponding to the ith anchor box _i ，/>The calculation process is shown in a formula (5) and a formula (6);

t _x ＝(x-x _anchor )/w _anchor ，t _y ＝(y-y _anchor )/h _anchor

t _w ＝log(w/w _anchor )，t _h ＝log(h/h _anchor ) (5)

x, y represents the coordinates of the center point of the prediction boundary box, w, h represents the width and height of the prediction boundary box, and x _anchor ，y _anchor Represents the coordinates of the central point, w, of the current anchor frame _anchor ，h _anchor Representing the width and height of the current anchor frame;

x ^* ，y ^* representing the coordinates, w, of the center point of the true bounding box of an object in an image ^* ，h ^* Representing the width and height of the real bounding box of the object in the image;

2: classification loss function L _cls The calculation formula of (2) is as follows:

target inspectionCross entropy is used as a class-loss function in a test network, where s _i Represents the ith detection frame, p _i Representing the predictive classification probability of the ith detection box,the classification truth value of the ith detection frame is represented, the function provides basis for classification behavior of the network, whether the classification of the object class of the detection area is accurate or not is judged through the function, and model updating is carried out on an inaccurate object through calculation of a loss value;

3: bounding box regression loss function L _reg The calculation formula of (2) is as follows:

t _i andpredicted and actual values of the bounding box parameterized coordinates representing the ith detection box, respectively,/->Is a smooth loss, and the position information of the detection area is further adjusted through the function;

4: contrast loss function L _contrastive The calculation formula of (2) is as follows:

construction of S, S ⁺ ，S ^- Sample characteristics, constructing positive sample pair a _p ＝{S，S ⁺ Negative sample pair a _n ＝{S，S ^- }，D _a Representing positive sample pair a _p Or negative sample pair a _n Euclidean distance between, y _a Representing the labeling of a pair of samples a,i.e. the current sample pair is positive sample pair a _p When the model will update the distance between the sample and the positive sample with a minimum; />m represents the upper boundary of the sample pair distance, and when the distance between the sample and the negative sample is greater than m, the loss value is equal to 0, and the model is not updated; otherwise, updating the model until the distance between the negative sample pair reaches m;

5: contrastive-JS loss function L _{contrastive-JS} The calculation formula of (2) is as follows:

wherein p is _a Is the predictive distribution of a sample pair, y _a Is the label of the current sample pair, p _a [i]Representing the i-th prediction distribution in the sample pair,m' represents the upper bound of the sample pair distance, and is the same meaning as m in formula (9).

3. The method of claim 1, wherein training the small sample target detection model based on the optimized objective function using a deep learning update process to obtain a trained small sample target detection model comprises:

step 3-1: generating dataset D using PASCAL VOC dataset _train ，D _base D (D) _noval There are 20 categories in the pasal VOC dataset, and 15 categories are divided into a base category and 5 new categories, D is built with all instances of the base category _train Randomly sampling K=1, 2, 3, 5, 10 instances from the new class and the basic class as D of the K-shot _base And D _noval ；

Step 3-2: establishing a base class training network taking a fast-RCNN as a basic framework, selecting ResNet101 and a feature pyramid as a feature extraction network, initializing model parameters, setting super parameters, establishing a standard SGD optimizer with a standard batch size of 16, and setting the momentum of 0.9 and the weight attenuation of 1e-4;

step 3-3: construction D _train The data loader is used for carrying out data enhancement on the original input;

step 3-4: training the basic class training network, calculating the output value of each basic class training sample, and calculating the loss L _base Updating network parameters using a gradient descent algorithm;

step 3-5: if the model converges or reaches the required training step number, ending the basic class training network training process and storing the model parameters; otherwise, returning to the step 4-2;

step 3-6: construction D _base D (D) _noval The data loader creates a fine tuning network model, uses model parameters obtained by a base class training network to initialize the network, and creates an optimizer;

step 3-7: training a fine-tuning network, obtaining candidate frame feature images generated by training samples after the pooling operation of the region of interest on the basis of a basic training network, traversing a candidate frame feature image list, matching one candidate frame feature image with the same category for each candidate frame feature image as a positive example, matching one candidate frame feature image with different categories as a negative example, selecting two positive examples and a current sample to form 2 positive sample pairs, selecting two negative examples and the current sample to form 2 negative sample pairs, and calculating L for the obtained candidate frame feature images of the positive sample pairs and the negative sample pairs _contrastive The method comprises the steps of carrying out a first treatment on the surface of the Obtaining class probability distribution generated after classification operation of training sample, calculating L for class probability distribution of positive sample pair and negative sample pair _contrastive Calculating the output value of each training sample and calculating the loss L _{fine_tune} Updating network parameters using a gradient descent algorithm;

step 3-8: using the AP50 as a model performance evaluation index on the PASCAL VOC 2007 test set, observing model convergence conditions, and ending the fine tuning network training process if the model converges or reaches the required training step number; otherwise, go back to step 3-7.