CN113658222A

CN113658222A - Vehicle detection tracking method and device

Info

Publication number: CN113658222A
Application number: CN202110882079.5A
Authority: CN
Inventors: 吉长江
Original assignee: Shanghai Yingpu Technology Co Ltd
Current assignee: Shanghai Yingpu Technology Co Ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-11-16

Abstract

The application discloses a vehicle detection tracking method and device. The method comprises the following steps: training a Yolo target detector; detecting the type and the position of a vehicle in each frame of the video by using a trained Yolo target detector; the output of the Yolo target detector is matched with the IOU in the historic vehicle using the IOU tracker and assigned a unique ID. The device comprises: a training module configured to train a Yolo target detector; a detection module configured to detect the type and location of the vehicle in each frame of the video using a trained Yolo target detector; and a matching module configured to match the output of the Yolo target detector with the IOU in the historical vehicle using the IOU tracker and assign a unique ID.

Description

Vehicle detection tracking method and device

Technical Field

The application relates to the field of vehicle motion behavior recognition in videos, in particular to track recognition in vehicle motion behavior recognition.

Background

Trajectory recognition is the basis for vehicle motion behavior recognition. Through the vehicle trajectory, the road manager can acquire information about abnormal vehicle behavior, such as illegal lane change, abnormal parking, and the like. Trajectory recognition can be described simply as a fusion of object detection and object tracking. The main tasks in vehicle trajectory identification can be described as follows: -initiating and detecting a vehicle trajectory in motion, -tracking the vehicle trajectory using the coordinates and estimation of motion parameters, -identifying the category of the observed vehicle. With the rapid development of convolutional neural network technology and target detection technology, the mainstream research on track identification focuses on a method of tracking according to detection, and these track identification algorithms track vehicles by using the output of target detection, thereby achieving the balance between speed and accuracy. Object detection aims at locating instances of semantic objects of a particular class in a given image. Before the deep learning era, its typical detection process can be divided into three steps: proposal generation, feature vectorization and classification. Deep convolutional neural network with deep learning development

The success of the network (DCNN), object detection, has also introduced CNN as its backbone network. Today, object detection based on deep learning can be generally divided into two series: a two-stage detector and a one-stage detector, the most representative of the two-stage detector being fast R-CNN; the first detector is most typically YOLO and SSD. The two-stage detector first generates and extracts features from a proposed set using an architecture called the Region Proposal Network (RPN), and then applies a region classifier to predict the proposed classes. The framework of the two-stage detector is intended to achieve high localization and target recognition accuracy. The primary detector skips the region proposal step and detects directly on a dense sample of possible locations.

There are two methods of visual object tracking: a generative model method and a discriminative model method. The model generation method models a target region in a current frame and finds a region most similar to the model in the next frame, typical algorithms comprise a Kalman filter, a particle filter and mean shift, the classical methods cannot process and adapt to complex tracking change, and the robustness and the accuracy of the classical methods are surpassed by the current algorithm. The discriminant model method is also called "detection tracking" method, and unlike object detection, the output of object detection is a bounding box with its class, with an ID also associated to each box by the detection tracking method.

The method has the advantages that the prediction accuracy is low, the method is sensitive to the fluctuation of the vehicle track, the problem that the prediction error difference at different time points is large cannot be solved when a certain model is used alone, the model construction process is too complex, the time cost is relatively high, the method is sensitive to the change of noise data, the prediction error is increased continuously along with the increase of noise, and the method is approximately linear.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to one aspect of the present application, there is provided a vehicle detection tracking method, including:

training a Yolo target detector;

detecting the type and the position of a vehicle in each frame of the video by using a trained Yolo target detector;

the output of the Yolo target detector is matched with the IOU in the historic vehicle using the IOU tracker and assigned a unique ID.

Optionally, the training of the Yolo target detector comprises:

data enhancement and focus loss training are performed on the Yolo target detector.

Optionally, before the training of the Yolo target detector, the method further comprises:

the original training data set is augmented by random geometric transformations and random color dithering.

Optionally, the focus loss is used as the object confidence in the process of training the Yolo target detector and detecting the type and position of the vehicle in each frame of the video by using the trained Yolo target detector.

Optionally, during the process of training the Yolo target detector, the Yolo batch processing of each GPU is synchronized.

Optionally, the matching the output of the Yolo target detector and the IOU in the historical vehicle and assigning a unique ID with the IOU tracker includes:

for each frame of the current frame, respectively calculating an F-IOU between the frame and each frame in the previous frame, wherein the F-IOU is a combined upper intersection between the two frames, and recording vehicles in the two frames corresponding to the largest F-IOU as the same vehicle to form a track of the vehicle;

calculating the F-IOU of each vehicle in the current frame and the last frames of each track;

when the F-IOU between a vehicle and a trajectory is at a maximum and exceeds a threshold, the vehicle is added to the trajectory, otherwise a new trajectory is created with the vehicle and added to the set of trajectories.

According to another aspect of the present application, there is provided a vehicle detection tracking apparatus including:

a training module configured to train a Yolo target detector;

a detection module configured to detect the type and location of the vehicle in each frame of the video using a trained Yolo target detector; and

a matching module configured to match the output of the Yolo target detector with the IOU in the historical vehicle using the IOU tracker and assign a unique ID.

Optionally, in the training module, training the Yolo target detector includes:

Optionally, in the training module and the detection module, the focus loss is used as an object confidence.

Optionally, in the training module, the Yolo batch processing of each GPU is synchronized.

According to the vehicle detection and tracking method and device, the Yolo target detector and the IOU tracker are integrated, so that the speed and the precision of detection and tracking can be improved, the prediction error is reduced, and stable detection and tracking can be performed on the vehicle.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a schematic flow chart diagram of a vehicle detection and tracking method according to one embodiment of the present application;

FIG. 2 is a diagram of computing an F-IOU using two frames, where t represents a time instant, according to one embodiment of the present application;

FIG. 3 is a schematic diagram of a vehicle detection and tracking device according to an embodiment of the present application;

FIG. 4 is a schematic block diagram of a computing device according to one embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

FIG. 1 is a schematic flow chart diagram of a vehicle detection and tracking method according to one embodiment of the present application. The vehicle detection tracking method may generally include the following steps S1 to S3.

And step S1, training the Yolo target detector.

The present embodiment uses Yolo for vehicle detection. Yolo applies a single neural network to the complete image, divides the image into 13 x 13 regions (when the input size is 416 x 416), and predicts the bounding box and probability for each region, which are weighted by the prediction probability. The unique architecture makes Yolo extremely Fast, more than 100 times faster than two-level detectors such as Fast R-CNN, so the present embodiment selects Yolo as the detector in the vehicle trajectory recognition algorithm, trains the Yolo object detector using the COCO data set and MIO-TCD data set to detect the type and position of the vehicle in each frame of the video, and improves the accuracy of Yolo from 3 aspects: data expansion, focus loss, and synchronous batch normalization.

(1) Data expansion

Data augmentation can significantly improve the diversity of training data without actually collecting new data. The data expansion used in the present embodiment includes random geometric transformations, random color dithering. For the first method, it can be implemented by performing random translation, random rotation, random scaling and random reflection on the original training data. For the second approach, this may be achieved by processing the raw training data using random HSV saturation values and random HSV intensity values.

(2) Loss of focus

The present embodiment takes advantage of the loss of focus to solve the problem of unbalanced data sets. If the number of samples for one category in the dataset is much larger than the number of samples for the other categories, this dataset will be considered as an unbalanced dataset, and unbalanced data will lead to inefficient training. In Yolo, the imbalance occurs in the object confidence. In calculating the object confidence loss, (13 × 13+26 + 52) × 3 ═ 10647 object boxes will be presented, with most of the boxes not matching the real object. Thus, changing the loss of object confidence from cross entropy to focus loss will suppress the imbalance. Conventional cross entropy loss CE (p)_t) Expressed as: CE (p)_t)＝-log(p_t) Wherein p is_tIs the predicted probability that the sample is in the true category. Negative samples have a greater impact on the loss, resulting in a network bias towards the background category. And the focal point loss FL (p)_t) Is defined as: FL (p)_t)＝-(1-p_t)^γlog(p_t) Where γ is the focusing parameter, when p_tApproaching 1 or 0 means that the network is very deterministic to the prediction, where the focus loss is much lower than the cross entropy loss; otherwise, the focus loss will be more than p_tThe focus loss is slightly lower near 1 or 0.

(3) Synchronous batch normalization

Batch normalization allows for the use of higher learning rates with less concern for weight initialization. When multiple devices (typically GPUs) are used to train the model, the standard implementation of batch normalization in a common framework is asynchronous, with data being normalized individually within each GPU, rather than globally. This problem does not plague a simple network that occupies little RAM, each GPU can carry a large batch size, but the Yolo batch size of each GPU is small (about 16 per GPU), at which point, synchronous batch normalization is very important. In the embodiment, a special function transformation method for the numerical value is used, that is, assuming that some original numerical value is x, a function with normalization function is sleeved on the original numerical value, and the numerical value x before normalization is converted to form a normalized numerical value.

And step S2, detecting the type and the position of the vehicle in each frame of the video by using the trained Yolo target detector.

Step S3, matching the output of the Yolo target detector and the IOU in the historical vehicle by the IOU tracker and assigning a unique ID.

The associative above Intersection (IOU) is the ratio between the intersection and union of the prediction box and the ground truth box, expressed as

Where area (a) represents a prediction box and area (b) represents a ground true box, the IOU between two frames is calculated instead of the true and prediction boxes, and is referred to as the F-IOU. For each frame in the current frame, calculating the F-IOU between the frame and all frames in the previous frame, regarding the two frames corresponding to the largest F-IOU in the calculation result as the same object, and allocating the same ID to the current frame and the previous frame. For the current frame, the IOU tracker will calculate the F-IOU for each vehicle (as shown in FIG. 2) and the last few boxes for each trajectory if vehicle v and trajectory t₁Is maximum and exceeds a threshold, then vehicle v is added to trajectory t₁Otherwise, a new trajectory t is created with the vehicle v₂And join the trace set. Furthermore, the track set needs to be updated every frame. The trajectory without any vehicle added in the consecutive 5 frames will be deleted to eliminate the interference of the past trajectory. In addition, the trajectory set needs to be updated every frame, and the trajectory without any vehicle added in the consecutive 5 frames will be deleted to eliminate the interference of the historical trajectory.

In summary, the trajectory identification process of the vehicle detection and tracking method of the embodiment uses Yolo as a detector, and uses the IOU tracker to identify the vehicle and its trajectory. After obtaining the vehicle trajectory, the road administrator may further monitor and analyze the movement of the vehicle, including the following aspects:

(1) recognizing lane change, converting the monitoring image into a bird's-eye view through affine transformation, then detecting the change of the processed vehicle track, and recognizing the lane change from the monitoring video to provide legal basis for traffic monitoring and data support for analyzing microscopic traffic flow;

(2) the abnormal parking of the vehicle is identified, whether the current driving state of the vehicle is normal or not can be determined by analyzing the motion information of the vehicle track, and the reliable abnormal parking early warning information can be provided for monitoring personnel by identifying the abnormal parking of the vehicle;

(3) by combining the idea of re-identifying the vehicle, the same vehicle track under different cameras can be stitched, and then the vehicle running track under continuous and dense monitoring can be drawn.

The gradual detection and tracking method of the Yolo target detector and the IOU tracker in the embodiment, which are fused together, can improve the speed and precision of detection and tracking, reduce the prediction error, stably detect and track the vehicle and alleviate the problem of Yolo gradient disappearance.

Fig. 3 is a schematic structural diagram of a vehicle detection and tracking device according to an embodiment of the present application. The vehicle detection and tracking device may generally include:

a training module 1 configured to train a Yolo target detector;

a detection module 2 configured to detect the type and position of the vehicle in each frame of the video using a trained Yolo target detector; and

a matching module 3 configured to match the output of the Yolo target detector with the IOU in the historic vehicle using the IOU tracker and assign a unique ID.

As a preferred embodiment of the present application, in the training module 1, training the Yolo target detector includes:

In the training module 1 and the detection module 2, the focus loss is used as the object confidence.

As a preferred embodiment of the present application, in the training module 1, the Yolo batch processing of each GPU is synchronized.

Embodiments also provide a computing device, referring to fig. 4, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 5, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A vehicle detection tracking method, comprising:

training a Yolo target detector;

2. The method of claim 1, wherein the training a Yolo target detector comprises:

3. The method of claim 2, wherein prior to the training of the Yolo target detector, the method further comprises:

4. The method according to any one of claims 1 to 3, wherein the focus loss is taken as an object confidence in the process of training the Yolo target detector and detecting the type and position of the vehicle in each frame of the video by using the trained Yolo target detector.

5. The method according to any one of claims 1-4, wherein during the training of the Yolo target detector, the Yolo batch processing of each GPU is synchronized.

6. The method of any one of claims 1-5, wherein matching and assigning unique IDs to outputs of a Yolo target detector and IOUs in historical vehicles using an IOU tracker comprises:

7. A vehicle detection tracking apparatus, comprising:

a training module configured to train a Yolo target detector;

8. The apparatus of claim 7, wherein the training module to train the Yolo target detector comprises:

9. The apparatus of claim 7 or 8, wherein the training module and the detection module use focus loss as object confidence.

10. The apparatus of any of claims 7 to 9, wherein the Yolo batch processing of each GPU is synchronized in the training module.