CN111626128B - Pedestrian detection method based on improved YOLOv3 in orchard environment - Google Patents
Pedestrian detection method based on improved YOLOv3 in orchard environment Download PDFInfo
- Publication number
- CN111626128B CN111626128B CN202010341941.7A CN202010341941A CN111626128B CN 111626128 B CN111626128 B CN 111626128B CN 202010341941 A CN202010341941 A CN 202010341941A CN 111626128 B CN111626128 B CN 111626128B
- Authority
- CN
- China
- Prior art keywords
- network
- box
- pedestrian
- predicted
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pedestrian detection method in an orchard environment based on improved YOLOv 3. The method comprises the following steps: s1, acquiring images in an orchard environment, and preprocessing to manufacture an orchard pedestrian sample set; s2, generating an anchor box number by using a K-means clustering algorithm to calculate pedestrian candidate frames; s3, adding a finer feature extraction layer in the YOLOv3 network, and increasing the detection output of the network in the large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network to perform multiple environmental training, and then storing a weight file of the training set; s5, introducing a Kalman filtering algorithm and carrying out corresponding improvement to improve the robustness of the model, solve the problem of missed detection and improve the detection speed. The invention solves the dilemma of low real-time detection speed and low accuracy of pedestrians in an orchard environment, realizes multitask training, and ensures the detection speed and accuracy of pedestrians in the orchard environment.
Description
Technical Field
The invention relates to a pedestrian detection method in an orchard environment based on improved YOLOv3, which aims at pedestrian detection of unmanned agricultural machinery in the orchard environment and belongs to the technical field of deep learning and pedestrian detection.
Background
With the rapid development of artificial intelligence, agricultural intelligent equipment also enters historic moment, and unmanned agricultural machinery is a heavy weight of the agricultural intelligent equipment. Obstacle detection is a primary problem faced when unmanned agricultural machinery is operated in the field, where pedestrian detection is more critical. The methods commonly used for pedestrian detection at present include a method based on motion characteristics, a method based on shape information, a method based on pedestrian models, a method based on stereoscopic vision, a method based on neural networks, a method based on wavelets and a support vector machine, and the like
Pedestrian detection in an orchard environment faces a series of problems: (1) pedestrian multi-pose problem. The pedestrian target is severely non-rigid and the pedestrian may take on a variety of different poses, either resting or walking, or standing or squatting. (2) detect complexity problems of the scene. Pedestrians are mixed with the background and are difficult to separate. And (3) the problem of real-time performance of the pedestrian detection and tracking system. In practical application, a certain requirement is often made on the reaction speed of the detection tracking system, the construction of a pedestrian detection algorithm is often complex, and the real-time resistance of the system is further improved. (4) occlusion problem. In a practical environment, there are a large number of occlusions from person to person. The method adopts computer vision to combine with deep learning to detect pedestrians, and provides a research foundation for realizing pedestrian detection.
Disclosure of Invention
In order to solve the above requirements of intelligent unmanned agricultural machinery in an orchard environment on pedestrian detection, the invention provides a pedestrian detection method in the orchard environment based on improved YOLOv3, detection is regarded as regression problem, the whole image is directly processed by using a convolution network structure, and the detection type and position are predicted.
The invention discloses a pedestrian detection method in an orchard environment based on improved YOLOv3, which comprises the following steps:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
step 3: putting the training set processed in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate a predicted pedestrian boundary frame, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction, wherein the method comprises the following specific steps of:
(3.1): randomly selecting the width and height of a coordinate frame as a first clustering center;
(3.2): the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
(3.3): cycling (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU (Intersection over Union) the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
(3.5): after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
(3.6): repeating (3.4) and (3.5) until the Total IoU loss difference of adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible.
Step 4: in the more detailed feature extraction layer of the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained, specifically as follows:
(4.1): the training set image size obtained in step 2 is adjusted to 608×608, and the IOU threshold is set to 0.45, and the confidence threshold is set to 0.5. Each lattice predicts B bounding boxes, each bounding box containing 1 confidence score, 4 coordinate values and C class probabilities, where B is the number of output feature layers anchor boxes where the lattice is located. Then, for the output feature layer of size, the final output dimension is;
the clustering uses the formula d (box, centroid) =1-IOU (box, centroid)
Wherein, box is a priori frame, centroid is cluster center, IOU (box, centroid) is the ratio of the intersection of two regions, when d (box, centroid) is less than or equal to the measurement threshold value, confirm the width and height of the anchor box.
The formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively.
Confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the ratio of intersection between the predicted bounding box and the actual bounding box, the confidence score reflects whether the target is contained and the accuracy of the predicted location if the target is contained. If the confidence threshold is set to 0.5, deleting the predicted bounding box when the confidence of the predicted bounding box is less than 0.5; and when the confidence of the predicted boundary frame is greater than 0.5, reserving the predicted boundary frame.
(4.2): the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
the YOLOv3 network adopts a large number of convolutions every time it performs downsampling, and according to the receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by more information fusion, i.e. the deeper the network, the more concerned the global information. The pedestrian occupies smaller proportion in the picture, belongs to small-size object detection, and in a deep feature map, the influence of information of the small-size object on the feature map is smaller, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
(4.3): then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) formula used for class prediction is
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Refers to judging whether the jth binding box in the ith grid is responsible for the objectThe body and the IOU maximum bound box of the real existing target frame group_trunk of the object; />Representing the largest boundingbox of the IOU; lambda (lambda) coord Weight coefficients for the bounding box coordinate prediction error; lambda (lambda) noobj Weights representing classification errors classification error; />Judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, a convolution layer is added, finer feature extraction is obtained, and small targets are detected in a shallow layer, so that a pedestrian detection model under an orchard is obtained. The prior knowledge of the data set is utilized, the width and height of the candidate frames are obtained through a K-means clustering algorithm, the influence of different candidate frame numbers on the performance of the model is analyzed, the model with optimal performance is obtained under limited computing resources, and training parameters are optimized for improving the positioning accuracy of the model.
Step 6: the Kalman filtering algorithm is introduced and the corresponding improvement is carried out to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the specific steps are as follows:
the Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: prediction and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize the state of prediction estimation.
The state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i The observation state of the system at the moment i is shown, H is an observation matrix, and the observation matrix and the observation value are related. W (W) i-1 Corresponding to system noise, v i The measurement noise of the corresponding system is subjected to normal distribution, and the covariance is Q, R respectively.
The invention has the following advantages:
1. the improved K-means clustering algorithm is used for optimizing the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible, the clustering time can be effectively shortened, and the clustering effect of the algorithm is improved;
2. a convolution layer is added on a shallow layer of a network to obtain finer feature extraction, and small targets are detected on the shallow layer, so that the detection accuracy of the obtained YOLO-Z model is greatly improved, the detection speed is also remarkably improved, and the requirement of real-time detection is met;
3. the YOLO-Z model is combined with a Kalman filtering algorithm, so that the omission ratio of a place where shielding is obvious can be improved, and the detection speed of the place can be further increased.
Drawings
Fig. 1 is a flowchart of an overall implementation process of a pedestrian detection method in an orchard environment based on improved YOLOv3 in an embodiment of the present invention.
FIG. 2 is a diagram of network-based coordinate prediction in multitasking training in accordance with an embodiment of the present invention.
FIG. 3 is a YOLOv3 network-based shallow-layer added convolution feature extractor in an embodiment of the present invention.
FIG. 4 is an effect diagram of an orchard pedestrian detection method based on improved Yolov3 in an embodiment of the present invention; (a) is in a resting state; (b) being in a mobile state; (c) is in a normal posture; (d) an abnormal posture; (e) is a large target; (f) is a mid-target; (g) is a small target.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the invention provides a pedestrian detection method in an orchard environment based on improved YOLOv3, which comprises the following steps:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
as shown in fig. 2-3, step 3: putting the training set processed in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate a predicted pedestrian boundary frame, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction, wherein the method comprises the following specific steps of:
(3.1): randomly selecting the width and height of a coordinate frame as a first clustering center;
(3.2): the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
(3.3): cycling (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
(3.5): after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
(3.6): repeating (3.4) and (3.5) until the Total IoU loss difference of adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible.
Step 4: in the more detailed feature extraction layer of the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained, specifically as follows:
(4.1): the training set image size obtained in step 2 is adjusted to 608×608, and the IOU threshold is set to 0.45, and the confidence threshold is set to 0.5. Each lattice predicts B bounding boxes, each bounding box containing 1 confidence score, 4 coordinate values and C class probabilities, where B is the number of output feature layers anchor boxes where the lattice is located. Then, for the output feature layer of size, the final output dimension is;
the clustering uses the formula d (box, centroid) =1-IOU (box, centroid)
Wherein, box is a priori frame, centroid is cluster center, IOU (box, centroid) is the ratio of the intersection of two regions, when d (box, centroid) is less than or equal to the measurement threshold value, confirm the width and height of the anchor box.
The formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively.
Confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the ratio of intersection between the predicted bounding box and the actual bounding box, the confidence score reflects whether the target is contained and the accuracy of the predicted location if the target is contained. If the confidence threshold is set to 0.5, deleting the predicted bounding box when the confidence of the predicted bounding box is less than 0.5; and when the confidence of the predicted boundary frame is greater than 0.5, reserving the predicted boundary frame.
(4.2): the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
the YOLOv3 network adopts a large number of convolutions every time it performs downsampling, and according to the receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by more information fusion, i.e. the deeper the network, the more concerned the global information. The pedestrian occupies smaller proportion in the picture, belongs to small-size object detection, and in a deep feature map, the influence of information of the small-size object on the feature map is smaller, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
(4.3): then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) formula used for class prediction is
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Judging whether the jth binding box in the ith grid is responsible for the object or not, and judging the IOU maximum binding box of the group_trunk of the object;representing the largest boundingbox of the IOU; lambda (lambda) noobj A weight representing classification error;judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, a convolution layer is added, finer feature extraction is obtained, and small targets are detected in a shallow layer, so that a pedestrian detection model under an orchard is obtained. The prior knowledge of the data set is utilized, the width and height of the candidate frames are obtained through a K-means clustering algorithm, the influence of different candidate frame numbers on the performance of the model is analyzed, the model with optimal performance is obtained under limited computing resources, and training parameters are optimized for improving the positioning accuracy of the model.
Step 6: the Kalman filtering algorithm is introduced and the corresponding improvement is carried out to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the specific steps are as follows:
the Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: prediction and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize the state of prediction estimation.
The state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i The observation state of the system at the moment i is shown, H is an observation matrix, and the observation matrix and the observation value are related.W i-1 Corresponding to system noise, v i The measurement noise of the corresponding system is subjected to normal distribution, and the covariance is Q, R respectively. As shown in fig. 4, the pedestrian detection method based on the improved YOLOv3 in the orchard environment is based on YOLOv3, aims at detection difficulties such as illumination and shielding in the orchard environment, improves a K-means clustering algorithm and a Kalman filtering algorithm by providing a YOLO-Z network in the improvement of training samples and network structures, improves the accuracy and recall rate of pedestrian detection, meets the requirement of real-time detection, reduces the requirement of a network model on hardware, and is beneficial to intelligent agricultural machinery pedestrian detection in the orchard.
In summary, the invention provides a pedestrian detection method in an orchard environment based on improved YOLOv 3. The method comprises the following steps: s1, acquiring images in an orchard environment, and preprocessing to manufacture an orchard pedestrian sample set; s2, generating an anchor box number by using a K-means clustering algorithm to calculate pedestrian candidate frames; s3, adding a finer feature extraction layer in the YOLOv3 network, and increasing the detection output of the network in the large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network to perform multiple environmental training, and then storing a weight file of the training set; s5, introducing a Kalman filtering algorithm and carrying out corresponding improvement to improve the robustness of the model, solve the problem of missed detection and improve the detection speed. The invention solves the dilemma of low real-time detection speed and low accuracy of pedestrians in an orchard environment, realizes multitask training, and ensures the detection speed and accuracy of pedestrians in the orchard environment.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
Claims (3)
1. The pedestrian detection method based on the improved YOLOv3 in the orchard environment is characterized by comprising the following steps of:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
step 3: processing the pedestrian detection data set in the step 2, then making a training set, putting the training set into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate predicted pedestrian boundary frame expansion data, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction;
step 4: the more detailed feature extraction layer is added in the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained;
the step 4 is specifically as follows:
step 4.1: firstly, adjusting the size of the training set image obtained in the step 2 to 608 multiplied by 608, setting a IoU threshold to 0.45, representing Intersection over Union by using an IoU, and setting a confidence threshold to 0.5, predicting B bounding boxes for each grid, wherein each bounding box comprises 1 confidence score value, 4 coordinate values and C category probabilities, wherein B is the number of output feature layers of the grid, and then, for the output feature layers of the size, the final output dimension is;
the formula for clustering is
d(box,centroid)=1-IOU(box,centroid)
Wherein, box is a priori frame, centroid is a cluster center, IOU (box, centroid) is the intersection ratio of two areas, when d (box, centroid) is smaller than or equal to the measurement threshold value, confirm the width and height of the anchor box;
the formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively;
confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the intersection ratio between the predicted boundary frame and the actual boundary frame, wherein the confidence coefficient confidence score reflects whether the target is contained or not and the accuracy of the predicted position under the condition that the target is contained, and the confidence coefficient threshold value is set to be 0.5, and deleting the predicted boundary frame when the confidence coefficient of the predicted boundary frame is smaller than 0.5; when the confidence coefficient of the predicted boundary frame is larger than 0.5, reserving the predicted boundary frame;
step 4.2: the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
according to a receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, the extracted features are formed by more information fusion, namely, the deeper the network is, the more concerned global information is, the smaller the proportion of pedestrians in the picture is, the detection of small-size objects is realized, in a deep feature map, the influence of the information of the small-size objects on the feature map is smaller, and the information loss of the small-size objects is serious; therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
step 4.3: then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function formula used for category prediction is as follows
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Judging whether the jth binding box in the ith grid is responsible for the object, and judging the IOU maximum binding box of the jth binding box with the truly existing target frame group_trunk of the object; />Representing the largest binding box of the IOU; lambda (lambda) coord Weight coefficients for the bounding box coordinate prediction error; lambda (lambda) noobj Weights representing classification errors classification error; />Judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
step 6: an improved Kalman filtering algorithm is introduced to improve the robustness of the model, solve the problem of missed detection and improve the detection speed.
2. The pedestrian detection method in an orchard environment based on improved YOLOv3 of claim 1, wherein the generating of the predicted pedestrian bounding box expansion data by generating the number of anchor boxes through a K-means clustering algorithm comprises the following specific steps:
step 3.1: randomly selecting the width and height of a coordinate frame as a first clustering center;
step 3.2: the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
step 3.3: cycling step 3.2 until all initial cluster centers are determined;
step 3.4: calculating IoU the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
step 3.5: after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
step 3.6: repeating the steps 3.4 and 3.5 until the Total IoU loss difference value of the adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
3. The method for pedestrian detection in an orchard environment based on improved YOLOv3 of claim 1, wherein step 6 is specifically as follows:
the improved Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: predicting and updating; after a state space model and an observation equation are established for the system, a filter can obtain a predicted value of a state variable at the current moment according to noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize a predicted estimated state;
the state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i Representing the observation state of the system at the moment i, wherein H is an observation matrix, and is related to the system matrix and the observation value, W i-1 Corresponding to system noise, v i Corresponding systemIs subjected to normal distribution, and covariance is Q, R respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010341941.7A CN111626128B (en) | 2020-04-27 | 2020-04-27 | Pedestrian detection method based on improved YOLOv3 in orchard environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010341941.7A CN111626128B (en) | 2020-04-27 | 2020-04-27 | Pedestrian detection method based on improved YOLOv3 in orchard environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111626128A CN111626128A (en) | 2020-09-04 |
CN111626128B true CN111626128B (en) | 2023-07-21 |
Family
ID=72260566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010341941.7A Active CN111626128B (en) | 2020-04-27 | 2020-04-27 | Pedestrian detection method based on improved YOLOv3 in orchard environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111626128B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307955A (en) * | 2020-10-29 | 2021-02-02 | 广西科技大学 | Optimization method based on SSD infrared image pedestrian detection |
CN112347938B (en) * | 2020-11-09 | 2023-09-26 | 南京机电职业技术学院 | People stream detection method based on improved YOLOv3 |
CN112329697B (en) * | 2020-11-18 | 2022-04-12 | 广西师范大学 | Improved YOLOv 3-based on-tree fruit identification method |
CN112381021B (en) * | 2020-11-20 | 2022-07-12 | 安徽一视科技有限公司 | Personnel detection counting method based on deep learning |
CN112381043A (en) * | 2020-11-27 | 2021-02-19 | 华南理工大学 | Flag detection method |
CN112733603B (en) * | 2020-12-11 | 2024-06-11 | 江苏大学 | Variable frequency scroll compressor fault diagnosis method based on improved VMD and SVM |
CN112541483B (en) * | 2020-12-25 | 2024-05-17 | 深圳市富浩鹏电子有限公司 | Dense face detection method combining YOLO and blocking-fusion strategy |
CN112668662B (en) * | 2020-12-31 | 2022-12-06 | 北京理工大学 | Outdoor mountain forest environment target detection method based on improved YOLOv3 network |
CN112766188B (en) * | 2021-01-25 | 2024-05-10 | 浙江科技学院 | Small target pedestrian detection method based on improved YOLO algorithm |
CN112911171B (en) * | 2021-02-04 | 2022-04-22 | 上海航天控制技术研究所 | Intelligent photoelectric information processing system and method based on accelerated processing |
CN113111703B (en) * | 2021-03-02 | 2023-07-28 | 郑州大学 | Airport pavement disease foreign matter detection method based on fusion of multiple convolutional neural networks |
CN113139481B (en) * | 2021-04-28 | 2023-09-01 | 广州大学 | Classroom people counting method based on yolov3 |
CN113609895A (en) * | 2021-06-22 | 2021-11-05 | 上海中安电子信息科技有限公司 | Road traffic information acquisition method based on improved Yolov3 |
CN113378753A (en) * | 2021-06-23 | 2021-09-10 | 华南农业大学 | Improved YOLOv 4-based boundary target identification method for rice field in seedling stage |
CN113486764B (en) * | 2021-06-30 | 2022-05-03 | 中南大学 | Pothole detection method based on improved YOLOv3 |
CN113822169B (en) * | 2021-08-30 | 2024-03-19 | 江苏大学 | Orchard tree pedestrian detection method based on improved PP-YOLO |
CN115100741B (en) * | 2022-06-16 | 2024-07-30 | 清华大学 | Point cloud pedestrian distance risk detection method, system, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325418A (en) * | 2018-08-23 | 2019-02-12 | 华南理工大学 | Based on pedestrian recognition method under the road traffic environment for improving YOLOv3 |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
CN110070074A (en) * | 2019-05-07 | 2019-07-30 | 安徽工业大学 | A method of building pedestrian detection model |
-
2020
- 2020-04-27 CN CN202010341941.7A patent/CN111626128B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325418A (en) * | 2018-08-23 | 2019-02-12 | 华南理工大学 | Based on pedestrian recognition method under the road traffic environment for improving YOLOv3 |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
CN110070074A (en) * | 2019-05-07 | 2019-07-30 | 安徽工业大学 | A method of building pedestrian detection model |
Also Published As
Publication number | Publication date |
---|---|
CN111626128A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111626128B (en) | Pedestrian detection method based on improved YOLOv3 in orchard environment | |
CN110232350B (en) | Real-time water surface multi-moving-object detection and tracking method based on online learning | |
CN111310861B (en) | License plate recognition and positioning method based on deep neural network | |
CN109559320B (en) | Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network | |
CN109859238B (en) | Online multi-target tracking method based on multi-feature optimal association | |
CN111179217A (en) | Attention mechanism-based remote sensing image multi-scale target detection method | |
CN107818571A (en) | Ship automatic tracking method and system based on deep learning network and average drifting | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN111797983A (en) | Neural network construction method and device | |
CN111753682B (en) | Hoisting area dynamic monitoring method based on target detection algorithm | |
CN110991444B (en) | License plate recognition method and device for complex scene | |
CN109145836B (en) | Ship target video detection method based on deep learning network and Kalman filtering | |
CN112884742A (en) | Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method | |
CN111626200A (en) | Multi-scale target detection network and traffic identification detection method based on Libra R-CNN | |
CN111079604A (en) | Method for quickly detecting tiny target facing large-scale remote sensing image | |
CN114120115B (en) | Point cloud target detection method integrating point features and grid features | |
CN111126278A (en) | Target detection model optimization and acceleration method for few-category scene | |
CN113888461A (en) | Method, system and equipment for detecting defects of hardware parts based on deep learning | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN112329861B (en) | Layered feature fusion method for mobile robot multi-target detection | |
CN112597919A (en) | Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board | |
CN112509014B (en) | Robust interpolation light stream computing method matched with pyramid shielding detection block | |
CN112668662B (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN112418358A (en) | Vehicle multi-attribute classification method for strengthening deep fusion network | |
CN112597875A (en) | Multi-branch network anti-missing detection aerial photography target detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |