Nothing Special   »   [go: up one dir, main page]

CN107330437B - Feature extraction method based on convolutional neural network target real-time detection model - Google Patents

Feature extraction method based on convolutional neural network target real-time detection model Download PDF

Info

Publication number
CN107330437B
CN107330437B CN201710532424.6A CN201710532424A CN107330437B CN 107330437 B CN107330437 B CN 107330437B CN 201710532424 A CN201710532424 A CN 201710532424A CN 107330437 B CN107330437 B CN 107330437B
Authority
CN
China
Prior art keywords
sliding window
neural network
real
model
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710532424.6A
Other languages
Chinese (zh)
Other versions
CN107330437A (en
Inventor
杨观赐
杨静
盛卫华
陈占杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN201710532424.6A priority Critical patent/CN107330437B/en
Publication of CN107330437A publication Critical patent/CN107330437A/en
Application granted granted Critical
Publication of CN107330437B publication Critical patent/CN107330437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于卷积神经网络目标实时检测模型的特征提取方法,包括下述步骤:图片数据预处理;构建并加载改进的卷积神经网络目标实时检测模型;生成区域矩阵向量并执行池化操作;然后采用一个滑动窗口扫描网格,进行卷积与池化操作计算出滑动窗口内单元格的特征向量;将特征向量进行卷积操作;最后输入分类函数Softmax,计算图片数据的预测概率估计值,并采用滑动窗口合并方法获得滑动窗口与真实检测对象区域最大重叠面积对应的目标区域的特征;输出特征模型。本发明具有能提高较小目标的识别能力、且在特征提取过程中信息不易丢失的特点。

Figure 201710532424

The invention discloses a feature extraction method based on a convolutional neural network target real-time detection model, comprising the following steps: image data preprocessing; constructing and loading an improved convolutional neural network target real-time detection model; generating a region matrix vector and executing Pooling operation; then use a sliding window to scan the grid, perform convolution and pooling operations to calculate the feature vector of the cells in the sliding window; perform convolution operation on the feature vector; finally input the classification function Softmax to calculate the prediction of the picture data The probability estimation value is obtained, and the sliding window merging method is used to obtain the characteristics of the target area corresponding to the maximum overlapping area of the sliding window and the real detection object area; the output feature model. The present invention has the characteristics of improving the identification ability of smaller targets and not easily losing information in the process of feature extraction.

Figure 201710532424

Description

Feature extraction method based on convolutional neural network target real-time detection model
Technical Field
The invention belongs to the field of image recognition, and particularly relates to a feature extraction method based on a convolutional neural network target real-time detection model.
Background
The feature extraction method is a research hotspot in the field of image recognition. Yolo (Real-Time Object Detection) is a target Real-Time Detection model based on a convolutional neural network, and has a learning capability of mass data, a point-to-point feature extraction capability, and a good Real-Time identification effect, so that the model is of great interest. In the prior art, a pedestrian detection algorithm based on a Gaussian mixture model and a YOLO is provided by using the Gaussian mixture model to simulate background characteristics, and a good effect is obtained when pedestrians in a monitoring video of a transformer substation are detected. The method comprises the steps of extracting context information characteristics of the gray level image by using an alternating direction multiplier method, combining the information into a 2D input channel to be used as input of a YOLO neural network model, and forming a real-time target detection algorithm based on the YOLO. A text detection method for natural images is designed with a mechanism for extracting text characters in the images, and YOLO is adopted for text detection and bounding box regression. The above researches have done much work on the aspects of improving the performance of YOLO, expanding the application thereof, and the like, but when the YOLO neural network is adopted to solve the problem of feature extraction of images, the following defects exist:
1) in the identification process, YOLO divides the image to be identified into 7 × 7 grids, and the neurons in the cells for predicting the target can belong to a plurality of sliding windows belonging to the same category, which makes the model have strong spatial constraint. If multiple objects of different classes are covered in the sliding window, the system cannot detect all the target objects simultaneously.
2) During the training process, the data set features are extracted, and the cells in the network are at most responsible for predicting a real target, which results in poor effect when the YOLO detects a relatively close and small target.
3) In the image pre-processing stage, YOLO processes the high resolution images of the training dataset into low resolution data and for final classification feature extraction. After convolution for many times, the characteristics of small targets in the distribution area of the original picture are difficult to store.
Disclosure of Invention
The invention aims to overcome the defects and provide a feature extraction method based on a convolutional neural network target real-time detection model, which can improve the identification capability of a smaller target and is not easy to lose information in the feature extraction process.
The invention discloses a feature extraction method based on a convolutional neural network target real-time detection model, which comprises the following steps of:
(1) preprocessing picture data: acquiring a rectangular area coordinate of a real target for each picture, and generating a coordinate information file of the real target in each picture;
(2) constructing and loading an improved convolutional neural network target real-time detection model (YOLO): the model comprises 18 convolution layers for extracting image features, 6 pooling layers for reducing picture pixels, 1 Softmax output layer and 1 full-connection layer; after the image is input, a maximum pooling layer is added;
(3) generating an area matrix vector: generating a plurality of target candidate area matrix vectors of each picture according to the coordinate information file;
(4) taking the candidate region matrix vector as the input of a first layer, and taking the result as the input of a second layer;
(5) performing pooling operations;
(6) taking the result in the step (5) as input, adopting a sliding window to scan the grid, and performing convolution and pooling operation to calculate the feature vector of the unit cell in the sliding window;
(7) taking the feature vector obtained in the step (6) as the input of the 18 th convolution layer, and performing convolution operation by using 2 x 2 steps;
(8) taking the output of the step (7) as the input of a full connection layer, and performing convolution operation by adopting steps of 1 multiplied by 1;
(9) taking the output of the step (8) as the input of a classification function Softmax, calculating a prediction probability estimation value of the picture data, and obtaining the characteristics of a target area corresponding to the maximum overlapping area of a sliding window and a real detection object area by adopting a sliding window merging method;
(10) storing the characteristics of the corresponding target area to the position corresponding to each category in the characteristic model;
(11) and outputting the characteristic model.
The feature extraction method based on the convolutional neural network target real-time detection model comprises the following steps: in the step (7), a maximum pooling layer of 2 × 2 is applied to reduce the picture size, and a network feature map of 14 × 14 is output.
The feature extraction method based on the convolutional neural network target real-time detection model is characterized by comprising the following steps: the sliding window merging method described in step (9) is a nearest neighbor based target detection method (RPN), and includes the following steps:
1) dividing picture data into n unit grids by using a grid division method to generate a set R ═ S1,S2,...,Sn};
2) Initializing cell SiSimilar set of
Figure GDA0002401186610000021
And initializing a sliding window of 14 x 14 specification;
3) for a pair of adjacent regions in a sliding window (S)i,Sj)do
a) Calculating the sum of S in a sliding window by using a nearest neighbor target detection method (RPN)iAll the cells S adjacent to each otherjFeature similarity F (S) ofi,Sj);
b) Find the maximum similarity value Fmax(Si,Sj);
c) Updating cell SiSimilar set m ofi=mi∪{Fmax(Si,Sj)};
e) while (for each cell S)iSimilar set of
Figure GDA0002401186610000031
)
f) Find set miRemoving all cells corresponding to the elements in the table, and removing cells not including the detection object;
g) the obtained cells and cells SiCombine to form a new SiAnd as an element of the set L;
h) outputting a target position detection sliding window set;
i) ending while;
j) and ending the for.
Compared with the prior art, the method has obvious beneficial effects, and the scheme shows that an improved convolutional neural network target real-time detection model (YOLO) is constructed and loaded: the model comprises 18 convolution layers for extracting image features, 6 pooling layers for reducing picture pixels, 1 Softmax output layer and 1 full-connection layer; and a maximum pooling layer is added after the image is input. In the structure, a full connection layer is adopted to reduce the loss of characteristic information; after an image is input, a 2 × 2 maximum pooling layer is added to reduce the size of the image and simultaneously store information of an original image as much as possible, and a plurality of layers of meshes after rolling and pooling are output to be 14 × 14 to improve the size of a network feature map, so that the identification accuracy of a system is improved. The sliding window merging method based on the nearest neighbor target detection method (RPN) can determine the frame of the sliding window after convolution and pooling, and can reduce redundancy and time overhead by merging similar areas. In a word, the method has the advantages that the recognition capability of the smaller target can be improved, and information is not easy to lose in the characteristic extraction process.
The advantageous effects of the present invention will be further described below by way of specific embodiments.
Drawings
FIG. 1 is an improved YOLO neural network structure model of the present invention;
FIG. 2 is a diagram of a service robot platform in an embodiment;
FIG. 3 is an overall workflow diagram of the context detection system in an embodiment;
FIG. 4 is a sample exemplary diagram of a data set in an embodiment;
FIG. 5 is a graph of the trend of model performance at different steps in the example;
FIG. 6 is a diagram of a model's statistical box of predicted probability estimates for different training steps in an embodiment;
FIG. 7 is a graph of the performance trend of the model at different learning rates in the example;
FIG. 8 is a diagram of predicted estimate statistics box at different learning rates in an example;
FIG. 9 is a diagram of a statistical box of predicted probability estimates in an embodiment;
FIG. 10 is a sample of context pictures with errors identified in an embodiment.
Detailed Description
The following detailed description will be made of specific embodiments, features and effects of the feature extraction method based on the convolutional neural network target real-time detection model according to the present invention with reference to the accompanying drawings and preferred embodiments.
The invention discloses a feature extraction method based on a convolutional neural network target real-time detection model, which comprises the following steps of:
(1) and preprocessing picture data. Obtaining the rectangular area coordinates of the real target aiming at each picture of the picture data set X, and generating a coordinate information file F of the real target in each picturec
(2) Loading a picture classification training model of YOLO (YOLO), and simultaneously initializing a feature model M of picture data XweightsIf the prediction rectangular area coordinate of each picture is null, initializing the prediction rectangular area coordinate of each picture as null; the model comprises 18 convolution layers for extracting image features, 6 pooling layers for reducing picture pixels, 1 Softmax output layer and 1 full-connection layer; and after inputting the image, a max pooling layer is added (see fig. 1).
(3) According to the coordinate information file FcGenerating a plurality of target candidate area matrix vectors of each picture based on a nearest neighbor target detection method (RPN);
(4) taking the candidate region matrix vector as the input of a first layer, and taking the result as the input of a second layer;
(5) a pooling operation is performed.
(6) And (5) taking the result in the step (5) as input, scanning the grid by adopting a sliding window, and performing convolution and pooling operation to calculate the feature vector of the unit cell in the sliding window.
(7) Taking the feature vector obtained in the step (6) as the input of the 18 th convolution layer, and performing convolution operation by using 2 x 2 steps;
(8) taking the output of the step (7) as the input of a full connection layer, and performing convolution operation by adopting steps of 1 multiplied by 1;
(9) calculating picture data X by taking the output of the step (8) as the input of a classification function SoftmaxpicAnd storing the P with the largest overlapping area obtained by applying a sliding window merging algorithm based on RPNIOUCharacteristics of the corresponding target region; wherein P isIOUThe expression is the overlapping area (unit is pixel) of the sliding window and the real detection object region);
(10) Saving the characteristics of the corresponding target area to the characteristic model MweightsThe location corresponding to each category;
(11) output feature model Mweights
The LabelImg tool was used to obtain coordinate information for the selected region in step (1) above. And (7) applying a maximum pooling layer of 2 × 2 to reduce the size of the picture and simultaneously saving as much information of the original picture as possible, and outputting a 14 × 14 network feature map. In step (8) above, the sliding window operates on 17 convolutional layers for extracting image features and 6 pooling layers for reducing the image size. In the process, when the convolution operation is carried out on the sliding window every time, the P with the largest overlapping area is calculated by using the sliding window merging algorithm based on the RPNIOUSubstituting the loss function calculation formula of YOLO to calculate the minimum value of the loss function. In the application system, application judgment can be performed according to the feature models Mweights output in step (11).
In the step (8), based on the sliding window merging algorithm of the RPN, when the target object is detected by the YOLO, one cell relates to a plurality of sliding windows, and the finally output window for identifying the target object is less than or equal to the classification number of the picture data. When applying YOLO to context detection, it is not necessary to identify all targets, but rather it is necessary to feed back whether the object that needs to be detected is present in the current view. Therefore, a sliding window merging algorithm based on the RPN is designed:
the algorithm is as follows: RPN-based sliding window merging algorithm
Inputting: picture data Xpic
And (3) outputting: target position detection sliding window set L
1) Using gridding method to divide XpicDividing the cell into n cells to generate a set R ═ S1,S2,...,Sn};
2) Initializing cell SiSimilar set of
Figure GDA0002401186610000051
And initialize the 14 x 14 specificationThe sliding window of (2);
3) for a pair of adjacent regions in a sliding window (S)i,Sj)do
a) Calculating the sum of S in the sliding window by using RPN methodiAll the cells S adjacent to each otherjFeature similarity F (S) ofi,Sj);
b) Find the maximum similarity value Fmax(Si,Sj);
c) Updating cell SiSimilar set m ofi=mi∪{Fmax(Si,Sj)};
e) while (for each cell S)iSimilar set of
Figure GDA0002401186610000052
)
f) Find set miRemoving all cells corresponding to the elements in the table, and removing cells not including the detection object;
g) the obtained cells and cells SiCombine to form a new SiAnd as an element of the set L;
h) outputting a target position detection sliding window set;
i) ending while;
j) and ending the for.
The examples are as follows:
the method is applied to the service robot to carry out the privacy context detection test. Firstly, a service robot situation detection platform is built, and the overall work flow of the service robot situation detection is given. Six types of situations in a home environment are designed, and a training data set with 2580 pictures, a verification data set with 360 pictures and a test data set consisting of 4 types of 960 samples related to privacy content are established. The test analyzes the relationship between the training step and the prediction probability estimation value, the learning rate and the recognition accuracy, and finds out the experience value of the training step and the learning rate suitable for the proposed algorithm. The method is applied to the service robot to carry out the privacy context detection test.
1 privacy detection service robot hardware platform
Fig. 2 is a built service robot platform, which includes a mobile base, a data processor, a data acquisition device, and a mechanical support, and fig. 3 is a general work flow of the system. The touch display screen for inputting and displaying data is a 16-inch industrial touch screen supporting a l inux system; the vision system adopts an ORBBEC 3D somatosensory camera which can collect RGB depth images. The auditory system is formed by expanding a voice module based on science news, and can recognize voice and position the voice direction in a noisy environment. The development board is an NvidiaJetson Tx1 development board with 256-core GPUs; the mobile base is iRobot Create 2. The operating system of the system is Ubuntu16.04, and a Kinect version of ROS (Robot Operation System) system is installed. The workstation used for reducing the operation load of the service robot is ThinkPad T550 (the GPU is NVIDA GeForce 940M), and is mainly used for data analysis. Meanwhile, both the service robot and the workstation are provided with OpenCV 3.1 and TensorFlow 0.9[22]The YOLO, ROS system. The service robot is provided with a wireless communication module, and end-to-end communication between the service robot and the working station can be realized.
In fig. 3, upon collecting the training data set, the workstation with the GPU trains the data set using the RPN-based sliding window merging algorithm to obtain the feature model. And then, transmitting the obtained feature model to a service robot, starting a camera after the service robot receives the model, and reading pictures from the camera according to a given frequency (10 seconds) to perform context detection. And finally, determining the action of the robot according to the detection result. If the privacy situation is detected, the robot adjusts the angle of the camera, meanwhile, information forming an abstract according to the identified privacy content is stored in a text file, and after every 30 seconds, the robot inquires whether the camera can be used for observing the behavior of the person again or not in a voice consultation mode. If the reply is negative, the camera of the system keeps the non-working state, thereby achieving the purpose of protecting the privacy information. For example, when the system detects that the user is bathing, the camera is rotated 90 degrees and the text information "3 month, 29 month, 8:00 user is bathing in 2017" is stored. Meanwhile, the system starts timing, and after 30 seconds, the system inquires whether bathing is finished. If the content of the human response is positive, the camera returns to the observation angle at the previous moment to continue collecting data, and then the action of the service robot is determined according to the identified data.
2 data set and experimental design
2.1 training dataset and validation dataset
The training data set is composed of picture data under different situations, and the feature model is obtained by applying the proposed algorithm to be used for an application system. And the verification data set is used for testing the recognition performance of the feature model under different parameters in the feature model extraction process so as to refine the feature model.
The household scenarios considered include category 6: c1: bathing; c2: naked or semi-naked sleep; c3: going to the toilet; c4: changing clothes resulting in naked body; c5: someone but not related to the above private content; c6: no one is present in the home environment. The data sources include 2 ways: 1) in the constructed home environment, the picture automatically acquired by the ORBBEC 3D somatosensory camera on the constructed service robot platform accounts for about 81% of the whole data set. 2) Pictures in the home environment are collected, screened and appropriately processed from the network, and have different scenes, objects, brightness, angles and pixels so as to enrich a data set.
The 6 classes of contexts of the training dataset comprise 2580 samples in total, each class comprising 430 samples.
The class 6 context of the validation dataset consists of 360 samples, each class comprising 60 samples.
Fig. 4 is a sample example of a data set.
2.2 System Performance test design and test data set
To test the performance of the system, 3 experiments were designed:
experiment 1: the home environment includes privacy context detection in a training dataset. The test data a and b are obtained in the following way: pictures taken by subjects in the training set and subjects not in the training set, respectively, in the home environment. During testing, the system collects data at different angles in real time through the camera of the system. The experiment aims to test the detection robustness of the system to different detection objects.
Experiment 2: when the detection object (person) is the same, the detection environment does not include privacy detection in the training data set. After the situation in the training set is checked to change through the experiment, the accuracy of the system for the privacy detection content is checked. Test data c is: pictures of subjects in the training set in other home environments. During testing, the system collects data at different angles in real time through the camera of the system. The experiment examines the detection performance of the system to different detection environments.
Experiment 3: neither the detection object nor the home environment context includes privacy detection in the training dataset. In order to reflect the objectivity and diversity of the data, the test data d is collected and sorted from the network. During testing, data are provided for the detection system in a real-time acquisition mode through a simulation system camera. The performance of the experiment detection system is completely different from that of training data when both a detection object and an environment are different from each other.
The system performance test data set is used for testing the performance of the proposed algorithm and the constructed platform in practical application. The four types of test data a, b, c and d are tested for 40 pictures in each situation, and each type of data is tested for 240 pictures different from each other in 6 situations. The 4 experiments were completed and a total of 960 pictures were involved. The test data set and the training set have no duplicate data.
3 training model parameter optimization results and analysis
Considering that training of the model takes a lot of time, different training scales have an impact on the performance of the model. In order to make the proposed training model have better performance, the influence of the training step on the prediction probability estimation value is studied, so as to find out the scale of the better (or feasible) training step. On the other hand, different learning rates also have an influence on the identification accuracy of the model, so that the identification accuracy of the model under different learning rates is researched through experimental tests.
3.1 analysis of the relationship between the training step size and the predicted probability estimate
10 different step scales are designed (see table 1), and for 360 samples of a given verification data set, when the learning rate of a given model is set to be 0.001 by using the YOLO, the statistical results of the prediction probability estimation value, the recognition accuracy rate and the average value of the single-chart recognition time of the model are shown in table 1, the variation trend is shown in fig. 5, and the statistical box diagram of the class estimation value of the model in different training steps is shown in fig. 6.
As can be seen from fig. 5 and table 1, when the training step is 1000, the average prediction probability estimation value is 0.588, and the recognition accuracy is 0.733; with the increase of the training steps, the prediction probability estimation value and the privacy situation identification accurate value of the model are in an ascending trend, when the scale of the training steps is 9000, the average prediction probability estimation value of the model reaches the maximum value of 0.830, and the average value of the identification accuracy rate also reaches the maximum value of 0.967. When the training step is increased to 20000, the average prediction probability estimation value of the model is reduced to 0.568, and the average prediction accuracy value is 0.417. Meanwhile, as can be seen from fig. 6, when the training steps are 1000 to 7000, although the abnormal values outside the rectangle are less, the corresponding rectangular area of the box diagram is longer and the median line is lower. When the training steps are 8000 and 10000, although the median line of the data is high, there are many abnormal points outside the rectangular frame, and there are predicted estimation value singular points close to 0. When the training step is 9000, the rectangular area of the box map is narrower in area and has the highest median line in other cases, and although there are abnormal points outside the rectangular frame, the lowest abnormal points are all higher than the lowest rectangular areas corresponding to the training steps of 2000, 3000 and 4000; further checking the corresponding data to find that the abnormal point data is only 2 and is greater than 0.45.
TABLE 1 model Performance at different steps
Figure GDA0002401186610000081
As can be seen from the time overhead statistical results in Table 1, the average overhead time of the system is between 2.1ms and 2.6ms, and the model has shorter recognition time and meets the real-time detection application requirement with lower real-time requirement.
The analysis can be concluded in summary: the proposed model is able to achieve the best predictive estimates and recognition accuracy when the training step is set to 9000.
3.2 identification Performance test results and analysis at different learning rates
To obtain the learning rate setting that allows the model to perform the best performance, the learning rates of 1 and 10 were examined when the training step is 9000, in combination with the conclusions described in the previous section-1、10-2、10-3、10-4、10-5、10-6、10-7、10-8、10-9And 10-10The model performance of time. For 360 samples of the designed validation dataset, the statistical results of the prediction probability estimation value and the recognition accuracy average value of the model are shown in table 2, fig. 7 and fig. 8.
From the data of table 2 and fig. 7, it is shown that when the learning rate is greater than 0.1, the average probability prediction estimation value and the recognition accuracy of the model both tend to increase as the learning rate decreases. When the learning rate is 10-1Then, the prediction probability estimation value reaches the maximum value of 0.911, and the average recognition accuracy reaches 1. When the learning rate is from 10-1Reduced to 10-4In the process, the prediction probability estimation value is above 0.8, the recognition accuracy average value is about 0.94, and the change of the learning rate has small influence on the two performance indexes. When the learning rate is from 10-4Reduced to 10-10The mean values of the prediction probability estimation value and the recognition accuracy rate show a significant drop as the learning rate becomes smaller, and their lowest mean values are 0.315 and 0.417, respectively.
Observing fig. 8, it can be further found that: when the learning rate is 1, the area of the corresponding rectangular box is the largest, and although the corresponding average value in table 2 is only 0.67, the corresponding rectangular box in the box diagram extends to 0.9 scale on the vertical axis, which indicates that there are a certain number of predicted estimated values greater than 0.9. When the learning rate is 0.1, although there are some outliers, the rectangular area thereof is small, which indicates that the system can output a large prediction estimation class value in a large number of cases. At a learning rate of 10-10~10-1When the pattern is internal, the corresponding pattern has more abnormal points and the output is largeA smaller amount of predictive probability estimates.
TABLE 2 statistical results of model Performance at different learning rates
Figure GDA0002401186610000091
In summary, it can be concluded that: the proposed model has better performance when the learning rate is set to 0.1, which can be used when applied.
4 application system performance testing
4.1 System Performance test results and analysis
The designed algorithm is deployed on the built service robot platform, the learning rate and the training step are respectively set to be 0.01 and 9000, the four types of data in the test data set are tested, the system situation recognition accuracy, the category estimation value and the time overhead statistical result are respectively shown in tables 3 and 4, and a prediction probability estimation value statistical box diagram is shown in FIG. 9. From these data it can be seen that:
at the same time, the data in Table 4 shows that for the class a test data, the average values for the C1-C6 context category estimates are: 0.82, 0.968, 0.971, 0.972, 0.920 and 0.972, the standard deviations corresponding to which are: 0.275, 0.006, 0.168, 0.038, 0.141, 0.152, their high class estimates and small variance indicate that the system can be classified into the corresponding class with very high probability for the tested data, and for data where both the object and the background are included in the training set, the system has strong recognition capability for new contexts composed of the object and the background at different perspectives. The results corresponding to the class b test data are slightly worse than the results of the class a numbers as a whole, and the class estimation values in each case are 0.789, 0.849, 0.922, 0.977, 0.918, and 0.869, respectively, and the recognition accuracy is reduced by 0.05, 0.025, 0.05, and 0.025 for C1, C2, C4, and C6, respectively. This indicates that changes in the object have some effect on the recognition performance of the system.
2) The result of experiment 2 shows that the system has very good performance on the situations of C4 and C5, and the identification accuracy can reach 1; the recognition accuracy rates for the C1-C3, C6 scenarios are 0.850, 0.950 and 0.925. The corresponding predicted probability estimates, compared to the results for class a and b test data, are reduced by 0.069, 0.194, 0.034, 0.066, and 0.108, respectively, for the mean of the C1-C3, C5, and C6 scenarios.
TABLE 3 privacy recognition accuracy of the System for different test data sets
Figure GDA0002401186610000101
TABLE 4 statistical table of privacy class estimation values of system for different test data
Figure GDA0002401186610000102
This indicates that: by means of the characteristics obtained by the limited training set, the new situation formed by the objects in the training set and the home environment not in the training set can be predicted with high recognition accuracy, but the situation recognition performance of the system is reduced due to the change of the home environment.
3) From the data of experiment 3, although the recognition accuracy of the system can be 0.975 at the highest and 0.85 at the lowest, the mean distribution of the predicted estimated values is in a relatively low interval [0.713,0.89 ]. This indicates that, when the home environment and the object are changed, the recognition accuracy and the category estimation value of the system are reduced. However, it is worth noting that the d-class data is derived from the network, and the background theme, the object and the acquisition angle of the d-class data have larger differences from the data of the training set, and the system can still obtain an identification accuracy rate of more than 0.85, which indicates that the system has stronger robustness for identifying new samples with larger differences.
4) As can be seen from the data in the box fig. 9, although the system as a whole has a recognition accuracy of 94.48%, there are outliers outside the rectangle, especially the points where the prediction estimation value is very small, which indicates that the system recognizes certain situations that the decision is made in the case that the prediction estimation probability is very low, and the recognition robustness of the system for this kind of data needs to be improved.
4.2 System identification of erroneous data analysis
From the above analysis, the constructed system has 5.52% of error of context recognition, and the author finds 53 pictures with error recognition from 960 test pictures, and fig. 10 is a sample example of such data. Analysis of these pictures revealed that:
1) the data collected by the camera in the system has the characteristics of darker light and bright area with overexposure. At the same time, we examined the training data and found that there was no such training data.
2) Pictures from the network are characterized by low resolution or single color, which introduces strong noise.
Therefore, in order to improve the recognition performance of the system, the sample diversity of the training set should be expanded, and the samples with recognition errors should be placed into the corresponding training number set to obtain a more universal feature model.
In a word, the structure and the feature extraction process of the YOLO neural network are improved, the image grid division size is improved, meanwhile, a sliding window merging algorithm based on RPN is designed, and a feature extraction method based on the improved YOLO is formed, namely the method is disclosed by the invention. Through experimental analysis on the privacy context data set and the built service robot platform, experimental results show that: the provided feature extraction algorithm can better identify the privacy-related situation in the intelligent home environment in the service robot system, the average identification accuracy is 94.48%, the system identification time range is 1.62ms-3.32ms, the algorithm has better robustness, and the privacy situation in the home environment can be detected in real time.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are within the scope of the present invention without departing from the technical spirit of the present invention.

Claims (6)

1. A feature extraction method based on a convolutional neural network target real-time detection model comprises the following steps:
(1) preprocessing picture data: acquiring the coordinates of the real target area for each picture, and generating a coordinate information file of the real target in each picture;
(2) constructing and loading an improved convolutional neural network target real-time detection model: the model adds a maximum pooling layer after the image is input;
(3) generating an area matrix vector: generating a plurality of target candidate area matrix vectors of each picture according to the coordinate information file;
(4) taking the candidate region matrix vector as the input of a first layer of an improved convolutional neural network target real-time detection model, and taking the output result of the first layer as the input of a second layer of the model;
(5) performing pooling operations;
(6) taking the result in the step (5) as input, adopting a sliding window to scan the grid, and performing convolution and pooling operation to calculate the feature vector of the unit cell in the sliding window;
(7) taking the characteristic vector obtained in the step (6) as an input of a convolution layer, and performing convolution operation;
(8) taking the output of the step (7) as the input of a full connection layer, and performing convolution operation;
(9) taking the output of the step (8) as the input of a classification function Softmax, calculating a prediction probability estimation value of the picture data, and obtaining the characteristics of a target area corresponding to the maximum overlapping area of a sliding window and a real detection object area by adopting a sliding window merging method;
(10) storing the characteristics of the corresponding target area to the position corresponding to each category in the characteristic model;
(11) and outputting the characteristic model.
2. The feature extraction method based on the convolutional neural network target real-time detection model as claimed in claim 1, characterized in that: the improved convolutional neural network target real-time detection model constructed in the step (2): the model includes 18 convolutional layers for extracting image features, 6 pooling layers for reducing picture pixels, 1 Softmax output layer, and 1 fully-connected layer.
3. The feature extraction method based on the convolutional neural network target real-time detection model as claimed in claim 2, characterized in that: the feature vector obtained in the step (7) is used as the input of the 18 th convolution layer, and convolution operation is carried out by applying 2 multiplied by 2 steps; and applying a maximum pooling layer of 2 x 2 to reduce the picture size, and outputting a network feature map of 14 x 14.
4. The feature extraction method based on the convolutional neural network target real-time detection model as claimed in claim 1 or 2, characterized in that: the real target area coordinate obtained for the picture in the step (1) is a rectangle.
5. The feature extraction method based on the convolutional neural network target real-time detection model as claimed in claim 1 or 2, characterized in that: in the step (8), convolution operation is performed by adopting 1 × 1 steps.
6. The feature extraction method based on the convolutional neural network target real-time detection model as claimed in claim 1 or 2, characterized in that: the sliding window merging method in the step (9) is a nearest neighbor-based target detection method, and includes the following steps:
1) dividing picture data into n unit grids by using a grid division method to generate a set R ═ S1,S2,...,Sn};
2) Initializing cell SiSimilar set of
Figure FDA0002765293520000021
And initializing a sliding window of 14 x 14 specification;
3) for a pair of adjacent regions in a sliding window (S)i,Sj)do
a) Calculating the sum of S in a sliding window by adopting a nearest neighbor target detection methodiAll the cells S adjacent to each otherjFeature similarity F (S) ofi,Sj);
b) Find the maximum similarity value Fmax(Si,Sj);
c) Updating cell SiSimilar set m ofi=mi∪{Fmax(Si,Sj)};
e) while (for each cell S)iSimilar set of
Figure FDA0002765293520000022
);
f) Find set miRemoving all cells corresponding to the elements in the table, and removing cells not including the detection object;
g) the obtained cells and cells SiMerge to form a new cell SiAnd using the same as an element of a sliding window set L;
h) ending while;
i) ending for;
j) and outputting the target position detection sliding window set.
CN201710532424.6A 2017-07-03 2017-07-03 Feature extraction method based on convolutional neural network target real-time detection model Active CN107330437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710532424.6A CN107330437B (en) 2017-07-03 2017-07-03 Feature extraction method based on convolutional neural network target real-time detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710532424.6A CN107330437B (en) 2017-07-03 2017-07-03 Feature extraction method based on convolutional neural network target real-time detection model

Publications (2)

Publication Number Publication Date
CN107330437A CN107330437A (en) 2017-11-07
CN107330437B true CN107330437B (en) 2021-01-08

Family

ID=60199759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710532424.6A Active CN107330437B (en) 2017-07-03 2017-07-03 Feature extraction method based on convolutional neural network target real-time detection model

Country Status (1)

Country Link
CN (1) CN107330437B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886128A (en) * 2017-11-10 2018-04-06 广东工业大学 A kind of shuttlecock recognition methods, system, medium and equipment
CN108198168A (en) * 2017-12-26 2018-06-22 合肥泰禾光电科技股份有限公司 material analyzing method and device
CN108416797A (en) * 2018-02-27 2018-08-17 鲁东大学 A method, device and storage medium for detecting behavior changes
CN108388927B (en) * 2018-03-26 2021-10-29 西安电子科技大学 Small sample polarimetric SAR ground object classification method based on deep convolutional Siamese network
CN108734117A (en) * 2018-05-09 2018-11-02 国网浙江省电力有限公司电力科学研究院 Cable machinery external corrosion failure evaluation method based on YOLO
CN108985314A (en) * 2018-05-24 2018-12-11 北京飞搜科技有限公司 Object detection method and equipment
CN108874137B (en) * 2018-06-15 2021-01-12 北京理工大学 General model for gesture action intention detection based on electroencephalogram signals
CN109063559B (en) * 2018-06-28 2021-05-11 东南大学 Pedestrian detection method based on improved region regression
CN109409278A (en) * 2018-10-19 2019-03-01 桂林电子科技大学 Image target positioning method based on estimation network
CN109753885B (en) * 2018-12-14 2020-10-16 中国科学院深圳先进技术研究院 Target detection method and device and pedestrian detection method and system
CN109948501A (en) * 2019-03-13 2019-06-28 东华大学 A detection method for personnel and safety helmets in surveillance video
CN110210400B (en) * 2019-06-03 2020-11-17 上海眼控科技股份有限公司 Table file detection method and equipment
CN119904867A (en) * 2019-09-17 2025-04-29 同方威视技术股份有限公司 Semantic-based image recognition system and method
CN110705440B (en) * 2019-09-27 2022-11-01 贵州大学 Capsule endoscopy image recognition model based on neural network feature fusion
CN111275082A (en) * 2020-01-14 2020-06-12 中国地质大学(武汉) Indoor object target detection method based on improved end-to-end neural network
WO2021217340A1 (en) * 2020-04-27 2021-11-04 Li Jianjun Ai-based automatic design method and apparatus for universal smart home scheme
CN112270624A (en) * 2020-10-26 2021-01-26 链盟智能科技(广州)有限公司 Artificial intelligence bill discernment generation system based on fast food trade
CN114049898A (en) * 2021-11-10 2022-02-15 北京声智科技有限公司 Audio extraction method, device, equipment and storage medium
CN114445954B (en) * 2022-04-08 2022-06-21 深圳市润璟元信息科技有限公司 Entrance guard's device with sound and facial dual discernment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036323A (en) * 2014-06-26 2014-09-10 叶茂 Vehicle detection method based on convolutional neural network
CN106408015A (en) * 2016-09-13 2017-02-15 电子科技大学成都研究院 Road fork identification and depth estimation method based on convolutional neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275308B2 (en) * 2013-05-31 2016-03-01 Google Inc. Object detection using deep neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036323A (en) * 2014-06-26 2014-09-10 叶茂 Vehicle detection method based on convolutional neural network
CN106408015A (en) * 2016-09-13 2017-02-15 电子科技大学成都研究院 Road fork identification and depth estimation method based on convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Vision based real-time fish detection using convolutional neural network》;Minsung Sung 等;《OCEANS 2017》;20170622;全文 *
《图像理解中的卷积神经网络》;常亮 等;《自动化学报》;20160930;第42卷(第9期);全文 *

Also Published As

Publication number Publication date
CN107330437A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN107330437B (en) Feature extraction method based on convolutional neural network target real-time detection model
CN109902715B (en) Infrared dim target detection method based on context aggregation network
CN106874894B (en) A Human Object Detection Method Based on Regional Fully Convolutional Neural Networks
CN112200045B (en) Remote sensing image target detection model establishment method based on context enhancement and application
CN106846362A (en) A kind of target detection tracking method and device
CN109410192B (en) A fabric defect detection method and device for multi-texture grading fusion
CN111079518B (en) Ground-falling abnormal behavior identification method based on law enforcement and case handling area scene
CN114066857A (en) Infrared image quality evaluation method and device, electronic equipment and readable storage medium
CN113361352A (en) Student classroom behavior analysis monitoring method and system based on behavior recognition
CN110458165A (en) A Natural Scene Text Detection Method Introducing Attention Mechanism
CN108960404B (en) Image-based crowd counting method and device
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN102692600A (en) Method and device for rapidly evaluating electrical durability of relay contact based on machine vision
CN101996308A (en) Human face identification method and system and human face model training method and system
CN103745453B (en) Urban residential areas method based on Google Earth remote sensing image
CN108876776B (en) Classification model generation method, fundus image classification method and device
CN104077609A (en) Saliency detection method based on conditional random field
CN112287802A (en) Face image detection method, system, storage medium and equipment
CN113205136A (en) Real-time high-precision detection method for appearance defects of power adapter
CN117392111A (en) Network and method for detecting surface defects of strip steel camouflage
CN109583500A (en) A kind of aesthetic images quality prediction system and method based on depth drift-diffusion method
CN116935494B (en) Multi-person sitting posture identification method based on lightweight network model
CN116543001B (en) Color image edge detection method and device, equipment and storage medium
CN106228577A (en) A kind of dynamic background modeling method and device, foreground detection method and device
Liu et al. A novel image segmentation algorithm based on visual saliency detection and integrated feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant