Nothing Special   »   [go: up one dir, main page]

CN109858565B - Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information - Google Patents

Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information Download PDF

Info

Publication number
CN109858565B
CN109858565B CN201910151241.9A CN201910151241A CN109858565B CN 109858565 B CN109858565 B CN 109858565B CN 201910151241 A CN201910151241 A CN 201910151241A CN 109858565 B CN109858565 B CN 109858565B
Authority
CN
China
Prior art keywords
scene
num
picture
max
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910151241.9A
Other languages
Chinese (zh)
Other versions
CN109858565A (en
Inventor
蒋倩
朱博
王彬
高翔
郑有祺
王翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201910151241.9A priority Critical patent/CN109858565B/en
Publication of CN109858565A publication Critical patent/CN109858565A/en
Application granted granted Critical
Publication of CN109858565B publication Critical patent/CN109858565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a family indoor scene recognition method based on deep learning and integrating global features and local article information. The method comprises the following steps: building a training set and a testing set of the family indoor scene picture, and sending the training set into three convolutional neural networks of Alexnet, Googlnet and VGG for training and testing respectively to obtain scene characteristics; giving corresponding weights to the three types of scene features, and taking the weighted average as a global feature; training by utilizing an SSD convolutional neural network and obtaining local features of common articles in a family indoor scene; fusing global and local article characteristics by adopting a matrix splicing mode; processing the fusion result by a clustering algorithm to generate a scene classification center vector; and judging and outputting the scene category of the picture to be detected by taking the scene classification center vector as a classification standard. By using the method, the home service robot can automatically recognize scene semantics contained in the environment, and the intelligent level of the robot is improved.

Description

基于深度学习的融合全局特征和局部物品信息的家庭室内场 景识别方法A family indoor scene recognition method based on deep learning integrating global features and local item information

技术领域technical field

本发明涉及场景识别领域,具体涉及一种基于深度学习的融合全局特征和局部物品信息的家庭室内场景识别方法。The invention relates to the field of scene recognition, in particular to a family indoor scene recognition method based on deep learning integrating global features and local item information.

背景技术Background technique

在机器人领域,机器人如何识别当前环境是计算机视觉领域极其重要的问题,家庭服务机器人的场景识别研究有助于获取机器人所在家庭场景的实时位姿信息,是家庭服务机器人对当前环境构建地图并完成后续工作的关键。而目前的家庭服务机器人智能化水平有限,并不能准确、快速的判断所处的工作环境。In the field of robotics, how the robot recognizes the current environment is an extremely important issue in the field of computer vision. The research on scene recognition of home service robots helps to obtain real-time pose information of the home scene where the robot is located. It is the home service robot to build a map of the current environment and complete the The key to follow-up work. However, the current level of intelligence of home service robots is limited, and cannot accurately and quickly judge the working environment in which they are located.

将深度学习中的卷积神经网络模型应用于家庭服务机器人工作场景识别,可以自动从大量图像数据中学习到隐藏在其内部的特征数据,并将特征与标签一一对应,实现了图像特征的有效提取。同时,以场景中的物品作为识别的基础特征,与人对环境的认知逻辑相吻合。将场景的全局特征和局部物品信息结合,送入卷积神经网络,机器人通过学习获得判断经验,便可以自动对当前工作环境做出判断。Applying the convolutional neural network model in deep learning to the work scene recognition of home service robots, it can automatically learn the feature data hidden in it from a large amount of image data, and map the features to the labels one by one, realizing the image feature recognition. Effective extraction. At the same time, the objects in the scene are used as the basic features of recognition, which is consistent with the cognitive logic of human beings to the environment. Combining the global features of the scene with the local item information and feeding it into the convolutional neural network, the robot can automatically make judgments on the current working environment by acquiring judgment experience through learning.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于提供了一种基于深度学习的融合全局特征和局部物品信息的家庭室内场景识别方法,解决当前家用服务机器人智能化水平不高,无法对工作环境及时作出正确响应,场景识别能力差的问题,以实现服务机器人对家庭场景的自动分类识别。The technical problem to be solved by the present invention is to provide a family indoor scene recognition method based on deep learning that integrates global features and local item information, so as to solve the problem that the current domestic service robot has a low level of intelligence and cannot respond correctly to the working environment in a timely manner. The problem of poor scene recognition ability to realize automatic classification and recognition of family scenes by service robots.

基于深度学习的融合全局特征和局部物品信息的家庭室内场景识别方法,包括以下步骤:A family indoor scene recognition method based on deep learning integrating global features and local item information, including the following steps:

步骤1,构建家庭室内场景图片训练集和测试集,将训练集同时送入Alexnet、Googlnet、VGG三个卷积神经网络分别训练生成相应的网络模型,调用模型识别训练集,输出判断每张图片所属各个场景的置信度,作为训练集的三类场景特征;Step 1: Build a training set and a test set of home indoor scene pictures, send the training set to the three convolutional neural networks of Alexnet, Googlnet, and VGG at the same time to train and generate corresponding network models, call the model to identify the training set, and output to judge each picture. The confidence of each scene to which it belongs is used as the three types of scene characteristics of the training set;

步骤2,给步骤1中得到的三类场景特征赋特定权重做加权平均,结果作为训练数据集的全局特征矩阵;Step 2, assign specific weights to the three types of scene features obtained in step 1 to do a weighted average, and the result is used as the global feature matrix of the training data set;

步骤3,使用图片标注工具框出训练集图片中的物品并标记物品标签,将生成的标注文件送入SSD卷积神经网络训练生成物品检测模型,调用模型识别训练集,输出每张图片中出现的各个种类物品标签及其置信度,作为训练集的物品局部特征矩阵;Step 3, use the image annotation tool to frame the items in the training set images and mark the item labels, send the generated annotation files to the SSD convolutional neural network to train the generated item detection model, call the model to identify the training set, and output the appearance in each image. The labels of various types of items and their confidences are used as the local feature matrix of the items in the training set;

步骤4,融合全局和局部物品特征,将步骤二中得到的全局特征矩阵和步骤三中得到的物品局部特征矩阵水平拼接生成综合特征矩阵,矩阵的一个行向量对应训练集中一张图片的综合特征,行向量根据场景类别数和物品数目划分为两部分,前半部对应图片的全局特征,后半部对应图片的局部物体特征;Step 4, fuse the global and local item features, and splicing the global feature matrix obtained in step 2 and the item local feature matrix obtained in step 3 horizontally to generate a comprehensive feature matrix, a row vector of the matrix corresponds to the comprehensive feature of a picture in the training set , the row vector is divided into two parts according to the number of scene categories and the number of items, the first half corresponds to the global feature of the picture, and the second half corresponds to the local object feature of the picture;

步骤5,利用聚类算法,将训练集按照场景类型分类,随机取每一类场景下某一张图片的综合特征作为初始向量,分别计算综合特征中的每一个特征与中心向量组中每一特征的向量相似度,根据计算结果,按照一定的规则更新初始向量,迭代至预先设定的轮数,得到代表各个场景综合特征的中心向量,组成场景分类中心向量组;Step 5: Use the clustering algorithm to classify the training set according to the scene type, randomly select the comprehensive feature of a certain picture in each type of scene as the initial vector, and calculate each feature in the comprehensive feature and each feature in the center vector group separately. The vector similarity of the feature, according to the calculation result, update the initial vector according to certain rules, iterate to the preset number of rounds, obtain the center vector representing the comprehensive features of each scene, and form the scene classification center vector group;

步骤6,对待检测图片做相应处理后,获得综合向量,分别计算其与场景分类中心向量组中的各个向量的欧氏距离,取距离最小的场景分类中心向量对应的场景类别标签作为识别结果输出。Step 6: After the image to be detected is processed accordingly, a comprehensive vector is obtained, the Euclidean distance between it and each vector in the scene classification center vector group is calculated respectively, and the scene category label corresponding to the scene classification center vector with the smallest distance is taken as the output of the recognition result. .

进一步地,所述步骤1中,所述获得三类场景特征,具体步骤为:Further, in the step 1, the three types of scene features are obtained, and the specific steps are:

步骤1-1,将家庭室内场景分为卫生间、卧室、餐厅等共Num_scene个类别,为方便后续计算,将第j个场景类别命名为type_j,网络检索若干张各个场景类别下的各个视角的彩色图片按照一定比例划分为训练数据集共Num_train张图片,和测试数据集共Num_test张图片,其中,Num_scene∈N*,Num_test∈N*,Num_train∈N*,i={i∈[1,Num_train]∧i∈N*},j={j∈[1,Num_scene]∧j∈N*};Step 1-1: Divide the home indoor scene into Num_scene categories, such as bathroom, bedroom, and dining room. In order to facilitate subsequent calculations, the jth scene category is named type_j, and the network retrieves several color images of each viewing angle under each scene category. The pictures are divided into Num_train pictures in the training data set and Num_test pictures in the test data set according to a certain proportion, among which, Num_scene∈N * , Num_test∈N * , Num_train∈N * , i={i∈[1,Num_train] ∧i∈N * }, j={j∈[1,Num_scene]∧j∈N * };

步骤1-2,为训练集和测试集添加场景类别标签集合:Steps 1-2, add a set of scene category labels for the training set and test set:

List_train={list_train1,list_train2,......,list_trainNum_train}List_train={list_train 1 ,list_train 2 ,...,list_train Num_train }

List_test=(list_test1,list_test2,......,list_testNum_test)List_test=(list_test 1 ,list_test 2 ,......,list_test Num_test )

Num_train∈N*,Num_test∈N* Num_train∈N * , Num_test∈N *

步骤1-3,依照Alexnet、Googlnet、VGG卷积神经网络的要求将训练集处理成相应数据格式后,将训练集同时送入Alexnet、Googlnet、VGG卷积神经网络,分别训练生成网络模型Model_Alexnet,Model_Googlnet,Model_VGG;Steps 1-3, after processing the training set into the corresponding data format according to the requirements of Alexnet, Googlnet, and VGG convolutional neural networks, send the training set to Alexnet, Googlnet, and VGG convolutional neural networks at the same time, and train and generate the network model Model_Alexnet respectively. Model_Googlnet, Model_VGG;

步骤1-4,调用生成的网络模型识别训练集,其中第i张图片被判别为第j个场景类别的置信度分别为PAlexnet_i_j,PGooglnet_i_j,PVGG_i_j,由置信度构成该图的场景置信度向量:Steps 1-4, call the generated network model to identify the training set, in which the confidences of the i-th picture judged as the j-th scene category are P Alexnet_i_j , P Googlnet_i_j , P VGG_i_j , and the confidence levels constitute the scene confidence of the picture. Degree vector:

Figure BDA0001981599720000031
Figure BDA0001981599720000031

Figure BDA0001981599720000041
Figure BDA0001981599720000041

Figure BDA0001981599720000042
Figure BDA0001981599720000042

由三类置信度向量形成的矩阵分别作为场景特征矩阵:The matrices formed by the three types of confidence vectors are respectively used as scene feature matrices:

Figure BDA0001981599720000043
Figure BDA0001981599720000043

Figure BDA0001981599720000044
Figure BDA0001981599720000044

Figure BDA0001981599720000045
Figure BDA0001981599720000045

Num_train∈N*,Num_scene∈N*,i={i∈[1,Num_train]∧i∈N*},Num_train∈N * , Num_scene∈N * , i={i∈[1,Num_train]∧i∈N * },

j={j∈[1,Num_scene]∧j∈N*}。j={j∈[1,Num_scene]∧j∈N * }.

进一步地,所述步骤2中,所述步骤2对三类场景特征加权平均,具体包括如下步骤:Further, in the step 2, the weighted average of the three types of scene features in the step 2 specifically includes the following steps:

步骤2-1,调用步骤1-3中得到的网络模型,检测测试集图片获取每一张图片所属Num_scene个场景的置信度,取其中最大者对应的场景类别作为判别结果,与图片真实标签对比,若相同,则识别正确;累计Alexnet、Googlnet、VGG每种卷积神经网络的正确识别数目记作:Num_Alexnet,Num_Googlnet,Num_VGG,Num_Alexnet∈N*,Num_Googlnet∈N*,Num_VGG∈N*Step 2-1, call the network model obtained in step 1-3, detect the pictures in the test set to obtain the confidence of Num_scene scenes to which each picture belongs, take the scene category corresponding to the largest one as the discrimination result, and compare it with the real label of the picture , if the same, the identification is correct; the cumulative number of correct identifications of each convolutional neural network of Alexnet, Googlnet, and VGG is recorded as: Num_Alexnet, Num_Googlnet, Num_VGG, Num_Alexnet∈N * , Num_Googlnet∈N * , Num_VGG∈N * ;

步骤2-2,对场景特征Matrix_Alexnet,Matrix_Googlnet,Matrix_VGG分别赋予权重Weight_Alexnet,Weight_Googlnet,Weight_VGG,其中,Step 2-2, assign weights Weight_Alexnet, Weight_Googlnet, Weight_VGG to the scene features Matrix_Alexnet, Matrix_Googlnet, and Matrix_VGG respectively, among which,

Figure BDA0001981599720000051
Figure BDA0001981599720000051

Figure BDA0001981599720000052
Figure BDA0001981599720000052

Figure BDA0001981599720000053
Figure BDA0001981599720000053

加权平均后训练集中第i张图片被判别为第j个场景类别的置信度可表示为:After the weighted average, the confidence of the i-th picture in the training set being judged as the j-th scene category can be expressed as:

PGlobal_i_j=Weight_Alexnet×PAlexnet_i_j+Weight_Googlnet×PGooglnet_i_j+Weight_VGG×PVGG_i_j P Global_i_j = Weight_Alexnet×P Alexnet_i_j +Weight_Googlnet×P Googlnet_i_j +Weight_VGG×P VGG_i_j

利用新的置信度得到全局特征矩阵Matrix_Global:Use the new confidence to get the global feature matrix Matrix_Global:

Figure BDA0001981599720000054
Figure BDA0001981599720000054

进一步地,所述步骤3中,所述获取家庭室内场景下常见物品局部特征,具体包括如下步骤:Further, in the step 3, the acquiring local features of common items in the home indoor scene specifically includes the following steps:

步骤3-1,选择家庭场景下常见的物品,分别设立物品种类及数目最大值Max_category,Max_num;用图片标注工具框出训练集图片中出现的物品,记录图片中物体的标签、数目和位置,获得物品标记;Max_category∈N*,k={k∈[1,Max_category]∧k∈N*},Max_num∈N*Step 3-1, select the common items in the family scene, set up the item type and maximum number Max_category, Max_num respectively; use the image annotation tool to frame the items that appear in the training set pictures, record the label, number and position of the objects in the picture, Get item tag; Max_category∈N * , k={k∈[1,Max_category]∧k∈N * }, Max_num∈N * ;

步骤3-2,设置每种物品类别的最大数目Max_numk,Max_numk∈N*,利用SSD卷积神经网络训练物品标记生成物品检测模型Model_SSD;k={k∈[1,Max_category]∧k∈N*},rk={rk∈[1,Max_numk]∧rk∈N*};Step 3-2, set the maximum number of each item category Max_num k , Max_num k ∈ N * , use SSD convolutional neural network to train item labels to generate item detection model Model_SSD; k={k∈[1,Max_category]∧k∈ N * }, r k = {r k ∈[1,Max_num k ]∧r k ∈ N * };

步骤3-3,调用步骤3-2生成的模型识别训练集,其中第i张图片被识别出第rk个k类物品置信度可表示为:

Figure BDA0001981599720000063
则第i张图片被识别出k类物品的置信度向量可表示为:Step 3-3, call the model recognition training set generated in step 3-2, in which the confidence of the r kth k -th item being identified in the i-th picture can be expressed as:
Figure BDA0001981599720000063
Then the confidence vector that the i-th image is recognized as the k-type item can be expressed as:

Figure BDA0001981599720000061
Figure BDA0001981599720000061

其中每项置信度按照从大到小排列,若图片中物品并未达到最大数目Max_numk,则相应位置置零,若物品数目超限,则只保留置信度较大的前Max_numk项;Each item of confidence is arranged in descending order. If the items in the picture do not reach the maximum number Max_num k , the corresponding position will be set to zero. If the number of items exceeds the limit, only the first Max_num k items with higher confidence will be retained;

由物品置信度向量构成的局部物品特征Matrix_object:Local item feature Matrix_object composed of item confidence vector:

Figure BDA0001981599720000062
Figure BDA0001981599720000062

Max_category∈N*k={k∈[1,Max_category]∧k∈N*}Max_category∈N * k={k∈[1,Max_category]∧k∈N * }

rk={rk∈[1,Max_numk]∧rk∈N*}r k ={r k ∈[1,Max_num k ]∧r k ∈N * }

进一步地,所述步骤4中,所述步骤4融合全局和局部物品特征,具体为:Further, in the step 4, the step 4 fuses the global and local item features, specifically:

将步骤2中得到的全局特征矩阵Matrix_Global和步骤3中得到的物品局部特征矩阵Matrix_object水平拼接生成综合特征矩阵Matrix_combination:The global feature matrix Matrix_Global obtained in step 2 and the item local feature matrix Matrix_object obtained in step 3 are horizontally spliced to generate a comprehensive feature matrix Matrix_combination:

Figure BDA0001981599720000071
Figure BDA0001981599720000071

综合特征矩阵的第i个行向量

Figure BDA0001981599720000072
表示训练集中第i张图片的综合特征,行向量的前半部对应图片的全局特征,后半部对应图片的局部物体特征:The ith row vector of the synthetic feature matrix
Figure BDA0001981599720000072
Represents the comprehensive features of the ith picture in the training set. The first half of the row vector corresponds to the global features of the picture, and the second half corresponds to the local object features of the picture:

Figure BDA0001981599720000073
Figure BDA0001981599720000073

进一步地,所述步骤5中,所述自定义场景分类标准,具体包括如下步骤:Further, in the step 5, the custom scene classification standard specifically includes the following steps:

步骤5-1,将训练数据集按场景划分为Num_scene个部分,每个场景类型对应的图片数目为Num_j,同样按照场景类型划分综合特征矩阵,得到每个场景类型对应的综合特征。在场景类别为type_j的子数据集对应的综合特征中,第itype_j张图片被识别为第j个场景类型的置信度与该张图片中识别出第rk个k类物品的置信度可分别表示为

Figure BDA0001981599720000075
Figure BDA0001981599720000076
则图片中识别出第k类物品的置信度向量可表示为:Step 5-1, divide the training data set into Num_scene parts according to the scene, the number of pictures corresponding to each scene type is Num_j, and also divide the comprehensive feature matrix according to the scene type to obtain the comprehensive feature corresponding to each scene type. In the comprehensive feature corresponding to the sub-dataset whose scene category is type_j, the confidence of the i-th type_j picture being identified as the j-th scene type and the confidence of identifying the rk-th k -th item in this picture can be respectively Expressed as
Figure BDA0001981599720000075
and
Figure BDA0001981599720000076
Then the confidence vector of identifying the k-th item in the picture can be expressed as:

Figure BDA0001981599720000074
Figure BDA0001981599720000074

整理得场景类别为type_j的子数据集对应的综合特征矩阵Matrix_combinationtype_j可表示为:The comprehensive feature matrix Matrix_combination type_j corresponding to the sub-dataset whose scene category is type_j can be expressed as:

Figure BDA0001981599720000081
Figure BDA0001981599720000081

Figure BDA0001981599720000082
Figure BDA0001981599720000082

itype_j={itype_j∈[1,Num_j]∧itype_j∈N*},Max_category∈N*i type_j ={i type_j∈ [1,Num_j]∧i type_j∈N * },Max_category∈N * ;

步骤5-2,在步骤5-1中得到的Num_scene个子综合特征Matrix_combinationtype_j中各随机取一个行向量,本发明中示例性地选取每一类场景下第一个综合特征作为初始中心向量,则代表type_j场景的中心向量可表示为:Step 5-2, in each of the Num_scene sub-combination features Matrix_combination type_j obtained in step 5-1, a row vector is randomly selected. In the present invention, the first comprehensive feature under each type of scene is exemplarily selected as the initial center vector, then The center vector representing the type_j scene can be expressed as:

Figure BDA0001981599720000083
Figure BDA0001981599720000083

由各个场景对应的中心向量构成中心向量组Matrix_center:The center vector group Matrix_center is formed by the center vector corresponding to each scene:

Figure BDA0001981599720000084
Figure BDA0001981599720000084

Num_scene∈N*,itype_j={itype_j∈[1,Num_j]∧itype_j∈N*},Num_scene∈N * , i type_j ={i type_j∈ [1,Num_j]∧i type_j∈N * },

Max_category∈N*Max_category∈N * ;

步骤5-3,分别计算综合特征中的每一个特征与中心向量组中每一特征的欧氏距离,会得到Num_scene个距离,取其中最小值所对应的场景类别作为场景分类获得的标签List_detect集合:Step 5-3: Calculate the Euclidean distance between each feature in the comprehensive feature and each feature in the center vector group, and get Num_scene distances, and take the scene category corresponding to the minimum value as the label List_detect set obtained by scene classification :

List_detect={list_detect1,list_detect2,......,list_detectNum_train};List_detect={list_detect 1 ,list_detect 2 ,...,list_detect Num_train };

将List_detect与图片原本场景标签List_train对照,根据以下规则更新中心向量:Compare List_detect with the original scene label List_train of the picture, and update the center vector according to the following rules:

若list_detecti=list_traini=j,则:If list_detect i =list_train i =j, then:

Figure BDA0001981599720000091
Figure BDA0001981599720000091

若list_detecti≠list_traini=j,则:If list_detect i ≠list_train i =j, then:

Figure BDA0001981599720000092
Figure BDA0001981599720000092

其中,i={i∈[1,Num_train]∧i∈N*},j={j∈[1,Num_scene]∧j∈N*},γ为更新系数;Among them, i={i∈[1,Num_train]∧i∈N * }, j={j∈[1,Num_scene]∧j∈N * }, γ is the update coefficient;

步骤5-4,设置迭代更新次数为Max_interation,重复步骤5-2和5-3,迭代Max_interation次得到最终结果Matrix_center_Max_interation作为场景分类标准。Step 5-4, set the iteration update times as Max_interation, repeat steps 5-2 and 5-3, and iterate Max_interation times to obtain the final result Matrix_center_Max_interation as the scene classification standard.

Max_interation∈N* Max_interation∈N *

iiteration={iiteration∈[1,Max_iteration]∧iiteration∈N*}i iteration = { i iteration ∈ [1, Max_iteration] ∧ i iteration ∈ N * }

进一步地,所述步骤6中,所述测试获得场景分类结果,具体包括如下步骤:Further, in the step 6, the test obtains the scene classification result, which specifically includes the following steps:

步骤6-1,将待识别图片依次经过步骤1-4的步骤处理,获取综合特征;Step 6-1, the image to be recognized is processed through the steps of steps 1-4 in turn to obtain comprehensive features;

步骤6-2,分别计算步骤6-1中获得的综合特征与步骤5-4中获得的Matrix_center_Max_interation中的各个向量的欧氏距离集合:Step 6-2, calculate the Euclidean distance set of each vector in the Matrix_center_Max_interation obtained in step 5-4 and the comprehensive feature obtained in step 6-1:

Distance={d1,d2,...dj...,dNum_scene},Num_scene∈N* Distance={d 1 ,d 2 ,...d j ...,d Num_scene }, Num_scene∈N *

取Num_scene个距离的最小值所对应的场景类别标签作为识别结果输出。The scene category label corresponding to the minimum value of Num_scene distances is taken as the recognition result output.

本发明考虑场景中物体对场景类型的影响,引入深度学习卷积神经网络,让家庭服务机器人自主学习家庭场景,获得识别经验,实现对场景的自动判断。具备环境认知能力的家庭服务机器人,能根据工作环境切换工作模式选择工作内容,完成人机交互等工作,满足家庭场景下人机共融的需求。解决当前家用服务机器人智能化水平不高,无法对工作环境及时作出正确响应,场景识别能力差的问题,以实现服务机器人对家庭场景的自动分类识别。The invention considers the influence of objects in the scene on the scene type, and introduces a deep learning convolutional neural network, so that the home service robot can autonomously learn the home scene, obtain recognition experience, and realize automatic judgment of the scene. A home service robot with environmental awareness can switch the work mode to select work content according to the work environment, complete human-computer interaction and other tasks, and meet the needs of human-computer integration in the family scene. Solve the problem that the current level of intelligence of home service robots is not high, unable to respond correctly to the working environment in a timely manner, and the scene recognition ability is poor, so as to realize the automatic classification and recognition of home scenes by service robots.

附图说明Description of drawings

图1为本发明所述的系统框架图。FIG. 1 is a system frame diagram of the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本发明的技术方案做进一步的详细说明。The technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings.

基于深度学习的融合全局特征和局部物品信息的家庭室内场景识别方法,包括以下步骤:A family indoor scene recognition method based on deep learning integrating global features and local item information, including the following steps:

步骤1,构建家庭室内场景图片训练集和测试集,将训练集同时送入Alexnet、Googlnet、VGG三个卷积神经网络分别训练生成相应的网络模型,调用模型识别训练集,输出判断每张图片所属各个场景的置信度,作为训练集的三类场景特征。Step 1: Build a training set and a test set of home indoor scene pictures, send the training set to the three convolutional neural networks of Alexnet, Googlnet, and VGG at the same time to train and generate corresponding network models, call the model to identify the training set, and output to judge each picture. The confidence of each scene is used as the three types of scene features of the training set.

所述步骤1中,所述获得三类场景特征,具体步骤为:In the step 1, the three types of scene features are obtained, and the specific steps are:

步骤1-1,将家庭室内场景分为卫生间、卧室、餐厅等共Num_scene个类别,为方便后续计算,将第j个场景类别命名为type_j,网络检索若干张各个场景类别下的各个视角的彩色图片按照一定比例划分为训练数据集共Num_train张图片,和测试数据集共Num_test张图片,其中,Num_scene∈N*,Num_test∈N*,Num_train∈N*,i={i∈[1,Num_train]∧i∈N*},j={j∈[1,Num_scene]∧j∈N*}。Step 1-1: Divide the home indoor scene into Num_scene categories, such as bathroom, bedroom, and dining room. In order to facilitate subsequent calculations, the jth scene category is named type_j, and the network retrieves several color images of each viewing angle under each scene category. The pictures are divided into Num_train pictures in the training data set and Num_test pictures in the test data set according to a certain proportion, among which, Num_scene∈N * , Num_test∈N * , Num_train∈N * , i={i∈[1,Num_train] ∧i∈N * }, j={j∈[1,Num_scene]∧j∈N * }.

步骤1-2,为训练集和测试集添加场景类别标签集合:Steps 1-2, add a set of scene category labels for the training set and test set:

List_train={list_train1,list_train2,......,list_trainNum_train}List_train={list_train 1 ,list_train 2 ,...,list_train Num_train }

List_test=(list_test1,list_test2,......,list_testNum_test)List_test=(list_test 1 ,list_test 2 ,......,list_test Num_test )

Num_train∈N*,Num_test∈N* Num_train∈N * , Num_test∈N *

步骤1-3,依照Alexnet、Googlnet、VGG卷积神经网络的要求将训练集处理成相应数据格式后,将训练集同时送入Alexnet、Googlnet、VGG卷积神经网络,分别训练生成网络模型Model_Alexnet,Model_Googlnet,Model_VGG。Steps 1-3, after processing the training set into the corresponding data format according to the requirements of Alexnet, Googlnet, and VGG convolutional neural networks, send the training set to Alexnet, Googlnet, and VGG convolutional neural networks at the same time, and train and generate the network model Model_Alexnet respectively. Model_Googlnet, Model_VGG.

步骤1-4,调用生成的网络模型识别训练集,其中第i张图片被判别为第j个场景类别的置信度分别为PAlexnet_i_j,PGooglnet_i_j,PVGG_i_j,由置信度构成该图的场景置信度向量:Steps 1-4, call the generated network model to identify the training set, in which the confidences of the i-th picture judged as the j-th scene category are P Alexnet_i_j , P Googlnet_i_j , P VGG_i_j , and the confidence levels constitute the scene confidence of the picture. Degree vector:

Figure BDA0001981599720000111
Figure BDA0001981599720000111

Figure BDA0001981599720000112
Figure BDA0001981599720000112

Figure BDA0001981599720000113
Figure BDA0001981599720000113

由三类置信度向量形成的矩阵分别作为场景特征矩阵:The matrices formed by the three types of confidence vectors are respectively used as scene feature matrices:

Figure BDA0001981599720000114
Figure BDA0001981599720000114

Figure BDA0001981599720000121
Figure BDA0001981599720000121

Figure BDA0001981599720000122
Figure BDA0001981599720000122

Num_train∈N*,Num_scene∈N*,i={i∈[1,Num_train]∧i∈N*},j={j∈[1,Num_scene]∧j∈N*}。Num_train∈N * , Num_scene∈N * , i={i∈[1,Num_train]∧i∈N * }, j={j∈[1,Num_scene]∧j∈N * }.

步骤2,给步骤1中得到的三类场景特征赋特定权重做加权平均,结果作为训练数据集的全局特征矩阵。Step 2, assign specific weights to the three types of scene features obtained in step 1 to do a weighted average, and the result is used as the global feature matrix of the training data set.

所述步骤2中,所述步骤2对三类场景特征加权平均,具体包括如下步骤:In the step 2, the weighted average of the three types of scene features in the step 2 specifically includes the following steps:

步骤2-1,调用步骤1-3中得到的网络模型,检测测试集图片获取每一张图片所属Num_scene个场景的置信度,取其中最大者对应的场景类别作为判别结果,与图片真实标签对比,若相同,则识别正确;累计Alexnet、Googlnet、VGG每种卷积神经网络的正确识别数目记作:Num_Alexnet,Num_Googlnet,Num_VGG,Num_Alexnet∈N*,Num_Googlnet∈N*,Num_VGG∈N*Step 2-1, call the network model obtained in step 1-3, detect the pictures in the test set to obtain the confidence of Num_scene scenes to which each picture belongs, take the scene category corresponding to the largest one as the discrimination result, and compare it with the real label of the picture , if the same, the identification is correct; the cumulative number of correct identifications of each convolutional neural network of Alexnet, Googlnet and VGG is recorded as: Num_Alexnet, Num_Googlnet, Num_VGG, Num_Alexnet∈N * , Num_Googlnet∈N * , Num_VGG∈N * .

步骤2-2,对场景特征Matrix_Alexnet,Matrix_Googlnet,Matrix_VGG分别赋予权重Weight_Alexnet,Weight_Googlnet,Weight_VGG,其中,Step 2-2, assign weights Weight_Alexnet, Weight_Googlnet, Weight_VGG to the scene features Matrix_Alexnet, Matrix_Googlnet, and Matrix_VGG respectively, among which,

Figure BDA0001981599720000131
Figure BDA0001981599720000131

Figure BDA0001981599720000132
Figure BDA0001981599720000132

Figure BDA0001981599720000133
Figure BDA0001981599720000133

加权平均后训练集中第i张图片被判别为第j个场景类别的置信度可表示为:After the weighted average, the confidence of the i-th picture in the training set being judged as the j-th scene category can be expressed as:

PGlobal_i_j=Weight_Alexnet×PAlexnet_i_j+Weight_Googlnet×PGooglnet_i_j+Weight_VGG×PVGG_i_j P Global_i_j = Weight_Alexnet×P Alexnet_i_j +Weight_Googlnet×P Googlnet_i_j +Weight_VGG×P VGG_i_j

利用新的置信度得到全局特征矩阵Matrix_Global:Use the new confidence to get the global feature matrix Matrix_Global:

Figure BDA0001981599720000134
Figure BDA0001981599720000134

步骤3,使用图片标注工具框出训练集图片中的物品并标记物品标签,将生成的标注文件送入SSD卷积神经网络训练生成物品检测模型,调用模型识别训练集,输出每张图片中出现的各个种类物品标签及其置信度,作为训练集的物品局部特征矩阵。Step 3, use the image annotation tool to frame the items in the training set images and mark the item labels, send the generated annotation files to the SSD convolutional neural network to train the generated item detection model, call the model to identify the training set, and output the appearance in each image. Each category of item labels and their confidences are used as the item local feature matrix of the training set.

所述步骤3中,所述获取家庭室内场景下常见物品局部特征,具体包括如下步骤:In the step 3, the acquiring local features of common items in the home indoor scene specifically includes the following steps:

步骤3-1,选择家庭场景下常见的物品,分别设立物品种类及数目最大值Max_category,Max_num;用图片标注工具框出训练集图片中出现的物品,记录图片中物体的标签、数目和位置,获得物品标记。Max_category∈N*,k={k∈[1,Max_category]∧k∈N*},Max_num∈N*Step 3-1, select the common items in the family scene, set up the item type and maximum number Max_category, Max_num respectively; use the image annotation tool to frame the items that appear in the training set pictures, record the label, number and position of the objects in the picture, Get item tags. Max_category∈N * , k={k∈[1,Max_category]∧k∈N * }, Max_num∈N * ;

步骤3-2,设置每种物品类别的最大数目Max_numk,Max_numk∈N*,利用SSD卷积神经网络训练物品标记生成物品检测模型Model_SSD。k={k∈[1,Max_category]∧k∈N*},rk={rk∈[1,Max_numk]∧rk∈N*}Step 3-2, set the maximum number of each item category Max_num k , Max_num k ∈ N * , use SSD convolutional neural network to train item labels to generate item detection model Model_SSD. k={k∈[1,Max_category]∧k∈N * }, r k ={r k ∈[1,Max_num k ]∧r k ∈N * }

步骤3-3,调用步骤3-2生成的模型识别训练集,其中第i张图片被识别出第rk个k类物品置信度可表示为:

Figure BDA0001981599720000143
则第i张图片被识别出k类物品的置信度向量可表示为:Step 3-3, call the model recognition training set generated in step 3-2, in which the confidence of the r kth k -th item being identified in the i-th picture can be expressed as:
Figure BDA0001981599720000143
Then the confidence vector that the i-th image is recognized as the k-type item can be expressed as:

Figure BDA0001981599720000141
Figure BDA0001981599720000141

其中每项置信度按照从大到小排列,若图片中物品并未达到最大数目Max_numk,则相应位置置零,若物品数目超限,则只保留置信度较大的前Max_numk项。Each item of confidence is arranged in descending order. If the items in the picture do not reach the maximum number Max_num k , the corresponding position will be set to zero. If the number of items exceeds the limit, only the first Max_num k items with higher confidence will be retained.

由物品置信度向量构成的局部物品特征Matrix_object:Local item feature Matrix_object composed of item confidence vector:

Figure BDA0001981599720000142
Figure BDA0001981599720000142

Max_category∈N*k={k∈[1,Max_category]∧k∈N*}Max_category∈N * k={k∈[1,Max_category]∧k∈N * }

rk={rk∈[1,Max_numk]∧rk∈N*}r k ={r k ∈[1,Max_num k ]∧r k ∈N * }

步骤4,融合全局和局部物品特征,将步骤二中得到的全局特征矩阵和步骤三中得到的物品局部特征矩阵水平拼接生成综合特征矩阵,矩阵的一个行向量对应训练集中一张图片的综合特征,行向量根据场景类别数和物品数目划分为两部分,前半部对应图片的全局特征,后半部对应图片的局部物体特征。Step 4, fuse the global and local item features, and splicing the global feature matrix obtained in step 2 and the item local feature matrix obtained in step 3 horizontally to generate a comprehensive feature matrix, a row vector of the matrix corresponds to the comprehensive feature of a picture in the training set , the row vector is divided into two parts according to the number of scene categories and the number of items. The first half corresponds to the global feature of the picture, and the second half corresponds to the local object feature of the picture.

所述步骤4中,所述步骤4融合全局和局部物品特征,具体为:In the step 4, the step 4 fuses the global and local item features, specifically:

将步骤2中得到的全局特征矩阵Matrix_Global和步骤3中得到的物品局部特征矩阵Matrix_object水平拼接生成综合特征矩阵Matrix_combination:The global feature matrix Matrix_Global obtained in step 2 and the item local feature matrix Matrix_object obtained in step 3 are horizontally spliced to generate a comprehensive feature matrix Matrix_combination:

Figure BDA0001981599720000151
Figure BDA0001981599720000151

综合特征矩阵的第i个行向量

Figure BDA0001981599720000152
表示训练集中第i张图片的综合特征,行向量的前半部对应图片的全局特征,后半部对应图片的局部物体特征:The ith row vector of the synthetic feature matrix
Figure BDA0001981599720000152
Represents the comprehensive features of the ith picture in the training set. The first half of the row vector corresponds to the global features of the picture, and the second half corresponds to the local object features of the picture:

Figure BDA0001981599720000153
Figure BDA0001981599720000153

i={i∈[1,Num_train]∧i∈N*},Max_category∈N* i={i∈[1,Num_train]∧i∈N * }, Max_category∈N *

步骤5,利用聚类算法,将训练集按照场景类型分类,随机取每一类场景下某一张图片的综合特征作为初始向量,分别计算综合特征中的每一个特征与中心向量组中每一特征的向量相似度,根据计算结果,按照一定的规则更新初始向量,迭代至预先设定的轮数,得到代表各个场景综合特征的中心向量,组成场景分类中心向量组。Step 5: Use the clustering algorithm to classify the training set according to the scene type, randomly select the comprehensive feature of a certain picture in each type of scene as the initial vector, and calculate each feature in the comprehensive feature and each feature in the center vector group separately. The vector similarity of the feature, according to the calculation result, update the initial vector according to certain rules, iterate to the preset number of rounds, obtain the center vector representing the comprehensive features of each scene, and form the scene classification center vector group.

所述步骤5中,所述自定义场景分类标准,具体包括如下步骤:In the step 5, the custom scene classification standard specifically includes the following steps:

步骤5-1,将训练数据集按场景划分为Num_scene个部分,每个场景类型对应的图片数目为Num_j,同样按照场景类型划分综合特征矩阵,得到每个场景类型对应的综合特征。在场景类别为type_j的子数据集对应的综合特征中,第itype_j张图片被识别为第j个场景类型的置信度与该张图片中识别出第rk个k类物品的置信度可分别表示为

Figure BDA0001981599720000166
Figure BDA0001981599720000167
则图片中识别出第k类物品的置信度向量可表示为:Step 5-1, divide the training data set into Num_scene parts according to the scene, the number of pictures corresponding to each scene type is Num_j, and also divide the comprehensive feature matrix according to the scene type to obtain the comprehensive feature corresponding to each scene type. In the comprehensive feature corresponding to the sub-dataset whose scene category is type_j, the confidence of the i-th type_j picture being identified as the j-th scene type and the confidence of identifying the rk-th k -th item in this picture can be respectively Expressed as
Figure BDA0001981599720000166
and
Figure BDA0001981599720000167
Then the confidence vector of identifying the k-th item in the picture can be expressed as:

Figure BDA0001981599720000161
Figure BDA0001981599720000161

整理得场景类别为type_j的子数据集对应的综合特征矩阵Matrix_combinationtype_j可表示为:The comprehensive feature matrix Matrix_combination type_j corresponding to the sub-dataset whose scene category is type_j can be expressed as:

Figure BDA0001981599720000162
Figure BDA0001981599720000162

Figure BDA0001981599720000163
Figure BDA0001981599720000163

itype_j={itype_j∈[1,Num_j]∧itype_j∈N*},Max_category∈N* i type_j = {i type_j ∈ [1, Num_j] ∧ i type_j ∈ N * }, Max_category ∈ N *

步骤5-2,在步骤5-1中得到的Num_scen个e子综合特征Matri_x

Figure BDA0001981599720000165
中各随机取一个行向量,本发明中示例性地选取每一类场景下第一个综合特征作为初始中心向量,则代表type_j场景的中心向量可表示为:Step 5-2, the Num_scen e sub-synthetic features Matri_x obtained in step 5-1
Figure BDA0001981599720000165
A row vector is randomly selected in the present invention, and the first comprehensive feature under each type of scene is exemplarily selected as the initial center vector, then the center vector representing the type_j scene can be expressed as:

Figure BDA0001981599720000164
Figure BDA0001981599720000164

由各个场景对应的中心向量构成中心向量组Matrix_center:The center vector group Matrix_center is formed by the center vector corresponding to each scene:

Figure BDA0001981599720000171
Figure BDA0001981599720000171

Num_scene∈N*,itype_j={itype_j∈[1,Num_j]∧itype_j∈N*},Num_scene∈N * , i type_j ={i type_j∈ [1,Num_j]∧i type_j∈N * },

Max_category∈N* Max_category∈N *

步骤5-3,分别计算综合特征中的每一个特征与中心向量组中每一特征的欧氏距离,会得到Num_scene个距离,取其中最小值所对应的场景类别作为场景分类获得的标签List_detect集合:Step 5-3: Calculate the Euclidean distance between each feature in the comprehensive feature and each feature in the center vector group, and get Num_scene distances, and take the scene category corresponding to the minimum value as the label List_detect set obtained by scene classification :

List_detect={list_detect1,list_detect2,......,list_detectNum_train}List_detect={list_detect 1 ,list_detect 2 ,......,list_detect Num_train }

将List_detect与图片原本场景标签List_train对照,根据以下规则更新中心向量:Compare List_detect with the original scene label List_train of the picture, and update the center vector according to the following rules:

若list_detecti=list_traini=j,则:If list_detect i =list_train i =j, then:

Figure BDA0001981599720000172
Figure BDA0001981599720000172

若list_detecti≠list_traini=j,则:If list_detect i ≠list_train i =j, then:

Figure BDA0001981599720000173
Figure BDA0001981599720000173

其中,i={i∈[1,Num_train]∧i∈N*},j={j∈[1,Num_scene]∧j∈N*},γ为更新系数。Among them, i={i∈[1,Num_train]∧i∈N * }, j={j∈[1,Num_scene]∧j∈N * }, and γ is the update coefficient.

步骤5-4,设置迭代更新次数为Max_interation,重复步骤5-2和5-3,迭代Max_interation次得到最终结果Matrix_center_Max_interation作为场景分类标准。Step 5-4, set the iteration update times as Max_interation, repeat steps 5-2 and 5-3, and iterate Max_interation times to obtain the final result Matrix_center_Max_interation as the scene classification standard.

Max_interation∈N* Max_interation∈N *

iiteration={iiteration∈[1,Max_iteration]∧iiteration∈N*}i iteration = { i iteration ∈ [1, Max_iteration] ∧ i iteration ∈ N * }

步骤6,对待检测图片做相应处理后,获得综合向量,分别计算其与场景分类中心向量组中的各个向量的欧氏距离,取距离最小的场景分类中心向量对应的场景类别标签作为识别结果输出。Step 6: After the image to be detected is processed accordingly, a comprehensive vector is obtained, the Euclidean distance between it and each vector in the scene classification center vector group is calculated respectively, and the scene category label corresponding to the scene classification center vector with the smallest distance is taken as the output of the recognition result. .

所述步骤6中,所述测试获得场景分类结果,具体包括如下步骤:In the step 6, the test obtains the scene classification result, which specifically includes the following steps:

步骤6-1,将待识别图片依次经过步骤1-4的步骤处理,获取综合特征。In step 6-1, the pictures to be recognized are processed through the steps of steps 1-4 in sequence to obtain comprehensive features.

步骤6-2,分别计算步骤6-1中获得的综合特征与步骤5-4中获得的Matrix_center_Max_interation中的各个向量的欧氏距离集合:Step 6-2, calculate the Euclidean distance set of each vector in the Matrix_center_Max_interation obtained in step 5-4 and the comprehensive feature obtained in step 6-1:

Distance={d1,d2,...dj...,dNum_scene},Num_scene∈N* Distance={d 1 ,d 2 ,...d j ...,d Num_scene }, Num_scene∈N *

取Num_scene个距离的最小值所对应的场景类别标签作为识别结果输出。The scene category label corresponding to the minimum value of Num_scene distances is taken as the recognition result output.

以上所述仅为本发明的较佳实施方式,本发明的保护范围并不以上述实施方式为限,但凡本领域普通技术人员根据本发明所揭示内容所作的等效修饰或变化,皆应纳入权利要求书中记载的保护范围内。The above descriptions are only the preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, but any equivalent modifications or changes made by those of ordinary skill in the art based on the contents disclosed in the present invention should be included in the within the scope of protection described in the claims.

Claims (4)

1. The method for identifying the family indoor scene based on the deep learning and integrating the global features and the local article information is characterized in that: the method comprises the following steps:
step 1, constructing a training set and a test set of pictures of a family indoor scene, simultaneously sending the training set into three convolutional neural networks of Alexnet, Googlnet and VGG for respective training to generate corresponding network models, calling the model to identify the training set, and outputting and judging confidence coefficients of each scene to which each picture belongs to as three types of scene features of the training set;
step 2, weighting and averaging the three types of scene features obtained in the step 1, and taking the result as a global feature matrix of a training data set;
step 3, framing the articles in the pictures of the training set by using a picture labeling tool, labeling article labels, sending the generated labeling file into an SSD convolutional neural network for training to generate an article detection model, calling the model to identify the training set, and outputting the article labels of various types appearing in each picture and the confidence thereof as an article local feature matrix of the training set;
step 4, fusing global and local article characteristics, horizontally splicing the global characteristic matrix obtained in the step two and the article local characteristic matrix obtained in the step three to generate a comprehensive characteristic matrix, wherein one row vector of the matrix corresponds to the comprehensive characteristic of one picture in the training set, the row vector is divided into two parts according to the scene category number and the article number, the first half corresponds to the global characteristic of the picture, and the second half corresponds to the local object characteristic of the picture;
In the step 4, the step 4 fuses global and local article features, specifically:
horizontally splicing the Global feature Matrix _ Global obtained in the step 2 and the article local feature Matrix _ object obtained in the step 3 to generate a comprehensive feature Matrix _ combination:
Figure FDA0003726646830000021
ith row vector of synthetic feature matrix
Figure FDA0003726646830000022
The comprehensive characteristics of the ith picture in the training set are represented, the first half of the line vector corresponds to the global characteristics of the picture, and the second half corresponds to the local object characteristics of the picture:
Figure FDA0003726646830000023
i={i∈[1,Num_train]∧i∈N * },Max_category∈N *
step 5, classifying the training set according to scene types by using a clustering algorithm, randomly taking the comprehensive features of a certain picture under each type of scene as initial vectors, respectively calculating the vector similarity between each feature in the comprehensive features and each feature in the central vector group, updating the initial vectors according to the calculation results, iterating to preset turns to obtain central vectors representing the comprehensive features of each scene, and forming a scene classification central vector group;
in the step 5, the user-defined scene classification standard specifically includes the following steps:
step 5-1, dividing the training data set into Num _ scene parts according to scenes, wherein the number of pictures corresponding to each scene type is Num _ j, and dividing the comprehensive characteristic matrix according to the scene types to obtain comprehensive characteristics corresponding to each scene type; in the comprehensive characteristics corresponding to the subdata set with the scene type of type _ j, the ith type_j Confidence of j scene type identified in picture and r-th scene type identified in picture k The confidence of each k-type article can be respectively expressed as
Figure FDA0003726646830000024
And
Figure FDA0003726646830000025
the confidence vector identifying the kth class of item in the picture can be expressed as:
Figure FDA0003726646830000026
sorting out a comprehensive characteristic Matrix _ combination corresponding to a sub data set with a scene type of type _ j type_j Can be expressed as:
Figure FDA0003726646830000031
Figure FDA0003726646830000032
i type_j ={i type_j ∈[1,Num_j]∧i type_j ∈N * },Max_category∈N *
step 5-2, the Num _ scene sub-synthesis feature Matrix _ combination obtained in the step 5-1 type_j In the method, a row vector is randomly selected, the first comprehensive feature under each type of scene is selected as an initial center vector, and then the center vector representing the type _ j scene can be represented as follows:
Figure FDA0003726646830000033
and forming a central vector group Matrix _ center by the central vectors corresponding to the scenes:
Figure FDA0003726646830000034
Num_scene∈N * ,i type_j ={i type_j ∈[1,Num_j]∧i type_j ∈N * },
Max_category∈N *
step 5-3, respectively calculating the Euclidean distance between each feature in the comprehensive features and each feature in the central vector group to obtain Num _ scene distances, and taking the scene category corresponding to the minimum value as a label List _ detect set obtained by scene classification:
List_detect={list_detect 1 ,list_detect 2 ,......,list_detect Num_train };
comparing the List _ detect with the original scene tag List _ train of the picture, and updating the central vector according to the following rules:
if list _ detect i =list_train i J, then:
Figure FDA0003726646830000041
if list _ detect i ≠list_train i J, then:
Figure FDA0003726646830000042
wherein i ═ { i ∈ [1, Num _ train [ ] ]∧i∈N * },j={j∈[1,Num_scene]∧j∈N * Gamma is an updating coefficient;
step 5-4, setting the iteration updating times as Max _ interaction, repeating the steps 5-2 and 5-3, and iterating the Max _ interaction times to obtain a final result Matrix _ center _ Max _ interaction as a scene classification standard;
Max_interation∈N *
i iteration ={i iteration ∈[1,Max_iteration]∧i iteration ∈N * step 6, after the picture to be detected is correspondingly processed, obtaining a comprehensive vector, respectively calculating Euclidean distances between the comprehensive vector and each vector in the scene classification center vector group, and outputting a scene classification label corresponding to the scene classification center vector with the minimum distance as an identification result;
in the step 6, the test obtains a scene classification result, and specifically includes the following steps:
step 6-1, processing the picture to be identified in the steps 1-4 in sequence to obtain comprehensive characteristics;
step 6-2, respectively calculating Euclidean distance sets of the comprehensive characteristics obtained in the step 6-1 and each vector in Matrix _ center _ Max _ interaction obtained in the step 5-4:
Distance={d 1 ,d 2 ,...d j ...,d Num_scene },Num_scene∈N *
and taking the scene category label corresponding to the minimum value of the Num _ scene distances as a recognition result to output.
2. The deep learning based home indoor scene recognition method fusing global features and local item information according to claim 1, characterized in that: in the step 1, three types of scene characteristics are obtained, and the specific steps are as follows:
Step 1-1, dividing the family indoor scenes into total Num _ scene categories, naming the jth scene category as type _ j for facilitating subsequent calculation, and dividing a plurality of color pictures of each view angle under each scene category, which are searched by a network, into a training data set of total Num _ train pictures and a test data set of total Num _ test pictures, wherein Num _ scene belongs to N * ,Num_test∈N * ,Num_train∈N * ,i={i∈[1,Num_train]∧i∈N * },j={j∈[1,Num_scene]∧j∈N * };
Step 1-2, adding a scene category label set for the training set and the test set:
List_train={list_train 1 ,list_train 2 ,......,list_train Num_train }
List_test=(list_test 1 ,list_test 2 ,......,list_test Num_test )
Num_train∈N * ,Num_test∈N *
step 1-3, processing the training set into a corresponding data format according to the requirements of Alexnet, Googlnet and VGG convolutional neural networks, simultaneously sending the training set into the Alexnet, Googlnet and VGG convolutional neural networks, and respectively training and generating network models of Model _ Alexnet, Model _ Googlnet and Model _ VGG;
step 1-4, calling the generated network model identificationTraining set, wherein the confidence of the ith picture being judged as the jth scene category is P Alexnet_i_j ,P Googlnet_i_j ,P VGG_i_j And forming a scene confidence coefficient vector of the graph by using the confidence coefficients:
Figure FDA0003726646830000051
Figure FDA0003726646830000052
Figure FDA0003726646830000053
the matrixes formed by the three types of confidence coefficient vectors are respectively used as scene characteristic matrixes:
Figure FDA0003726646830000061
Figure FDA0003726646830000062
Figure FDA0003726646830000063
Num_train∈N * ,Num_scene∈N * ,i={i∈[1,Num_train]∧i∈N * },
j={j∈[1,Num_scene]∧j∈N * }。
3. the deep learning based home indoor scene recognition method fusing global features and local item information according to claim 1, characterized in that: in the step 2, the step 2 of weighted averaging the three types of scene features specifically includes the following steps:
Step 2-1, calling the network model obtained in the step 1-3, detecting the pictures of the test set to obtain the confidence coefficient of each Num _ scene to which each picture belongs, taking the scene type corresponding to the largest picture as a judgment result, comparing the judgment result with the picture real label, and if the judgment result is the same as the picture real label, judging that the identification is correct; the number of correct identifications of each convolutional neural network accumulated by Alexnet, Googlnet, VGG is recorded as: num _ Alexnet, Num _ Googlnet, Num _ VGG, Num _ Alexnet belonging to N * ,Num_Googlnet∈N * ,Num_VGG∈N *
Step 2-2, respectively assigning weights Weight _ alexne, Weight _ google, and Weight _ VGG to the scene features Matrix _ alexne, Matrix _ google, and Matrix _ VGG, wherein,
Figure FDA0003726646830000071
Figure FDA0003726646830000072
Figure FDA0003726646830000073
the confidence level that the ith picture in the training set is judged as the jth scene class after the weighted average can be expressed as:
P Global_i_j =Weight_Alexnet×P Alexnet_i_j +Weight_Googlnet×P Googlnet_i_j +Weight_VGG×P VGG_i_j
and obtaining a Global feature Matrix _ Global by using the new confidence coefficient:
Figure FDA0003726646830000074
i={i∈[1,Num_train]∧i∈N * }j={j∈[1,Num_scene]∧j∈N * }。
4. the deep learning-based home indoor scene recognition method integrating global features and local article information according to claim 1, wherein: in the step 3, local features of common articles in a family indoor scene are obtained, and the method specifically comprises the following steps:
step 3-1, selecting common articles in a family scene, and respectively setting maximum values of article types and numbers, namely Max _ category and Max _ num; using the picture marking tool frame to frame the objects appearing in the pictures of the training set, recording the labels, the number and the positions of the objects in the pictures, and obtaining object marks; max _ category belongs to N * ,k={k∈[1,Max_category]∧k∈N * },Max_num∈N *
Step 3-2, setting the maximum number of each article type Max _ num k ,Max_num k ∈N * Training the article label by utilizing an SSD convolutional neural network to generate an article detection Model _ SSD; k ∈ { k ∈ [1, Max _ category [ ]]∧k∈N * },r k ={r k ∈[1,Max_num k ]∧r k ∈N * };
Step 3-3, calling the model recognition training set generated in the step 3-2, wherein the ith picture is recognized as the r < th > picture k The individual class k item confidence may be expressed as:
Figure FDA0003726646830000081
the confidence vector that the ith picture is recognized as a k-type item can be expressed as:
Figure FDA0003726646830000082
wherein each item of confidence coefficient is arranged from large to small, if the articles in the picture do not reach the maximum number Max _ num k If the number of articles is over-limit, only the first Max _ num with high confidence coefficient is reserved k An item;
local item feature Matrix _ object composed of item confidence vectors:
Figure FDA0003726646830000083
i={i∈[1,Num_train]∧i∈N * }i={i∈[1,Num_train]∧i∈N * }
Max_category∈N * k={k∈[1,Max_category]∧k∈N * }
r k ={r k ∈[1,Max_num k ]∧r k ∈N * }。
CN201910151241.9A 2019-02-28 2019-02-28 Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information Active CN109858565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910151241.9A CN109858565B (en) 2019-02-28 2019-02-28 Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910151241.9A CN109858565B (en) 2019-02-28 2019-02-28 Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information

Publications (2)

Publication Number Publication Date
CN109858565A CN109858565A (en) 2019-06-07
CN109858565B true CN109858565B (en) 2022-08-12

Family

ID=66899355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910151241.9A Active CN109858565B (en) 2019-02-28 2019-02-28 Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information

Country Status (1)

Country Link
CN (1) CN109858565B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751218B (en) * 2019-10-22 2023-01-06 Oppo广东移动通信有限公司 Image classification method, image classification device and terminal equipment
CN113947707B (en) * 2020-07-16 2025-02-07 宁波方太厨具有限公司 A scene recognition method for a cleaning robot and a cleaning robot
CN112633064B (en) * 2020-11-19 2023-12-15 深圳银星智能集团股份有限公司 Scene recognition method and electronic equipment
CN112632378B (en) * 2020-12-21 2021-08-24 广东省信息网络有限公司 Information processing method based on big data and artificial intelligence and data server
CN113177133B (en) * 2021-04-23 2024-03-29 深圳依时货拉拉科技有限公司 Image retrieval method, device, equipment and storage medium
CN116797781B (en) * 2023-07-12 2024-08-20 北京斯年智驾科技有限公司 Target detection method, device, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165682A (en) * 2018-08-10 2019-01-08 中国地质大学(武汉) A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics
CN109255364A (en) * 2018-07-12 2019-01-22 杭州电子科技大学 A kind of scene recognition method generating confrontation network based on depth convolution

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679189B (en) * 2012-09-14 2017-02-01 华为技术有限公司 Method and device for recognizing scene

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255364A (en) * 2018-07-12 2019-01-22 杭州电子科技大学 A kind of scene recognition method generating confrontation network based on depth convolution
CN109165682A (en) * 2018-08-10 2019-01-08 中国地质大学(武汉) A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics

Also Published As

Publication number Publication date
CN109858565A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN109858565B (en) Home Indoor Scene Recognition Method Based on Deep Learning Fusion of Global Features and Local Item Information
US11763550B2 (en) Forming a dataset for fully-supervised learning
CN114241260B (en) Open set target detection and identification method based on deep neural network
Naseer et al. Multimodal objects categorization by fusing GMM and multi-layer perceptron
CN106897738B (en) A pedestrian detection method based on semi-supervised learning
CN106203318B (en) Pedestrian recognition method based on multi-level deep feature fusion in camera network
CN104992142B (en) A kind of pedestrian recognition method being combined based on deep learning and attribute study
CN105590102A (en) Front car face identification method based on deep learning
CN105574550A (en) Vehicle identification method and device
CN105426908B (en) A kind of substation&#39;s attributive classification method based on convolutional neural networks
CN106709449A (en) Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN108921107A (en) Pedestrian&#39;s recognition methods again based on sequence loss and Siamese network
CN107818343A (en) Method of counting and device
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
CN103810500A (en) Place image recognition method based on supervised learning probability topic model
CN107341440A (en) Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN104966052A (en) Attributive characteristic representation-based group behavior identification method
CN106845513A (en) Staff detector and method based on condition random forest
CN105160285A (en) Method and system for recognizing human body tumble automatically based on stereoscopic vision
CN107085729A (en) A Correction Method of Person Detection Result Based on Bayesian Inference
CN115082963B (en) Human attribute recognition model training and human attribute recognition method and related device
CN110826392B (en) Cross-modal pedestrian detection method combined with context information
CN110659585A (en) Pedestrian detection method based on interactive attribute supervision
CN115050044B (en) Cross-modal pedestrian re-identification method based on MLP-Mixer
Hashemi A survey of visual attention models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant