CN117636267A - Vehicle target tracking method, system, device and storage medium - Google Patents
Vehicle target tracking method, system, device and storage medium Download PDFInfo
- Publication number
- CN117636267A CN117636267A CN202311595105.1A CN202311595105A CN117636267A CN 117636267 A CN117636267 A CN 117636267A CN 202311595105 A CN202311595105 A CN 202311595105A CN 117636267 A CN117636267 A CN 117636267A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- target tracking
- track
- vehicle target
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 55
- 238000011176 pooling Methods 0.000 claims abstract description 50
- 238000012544 monitoring process Methods 0.000 claims abstract description 20
- 230000007246 mechanism Effects 0.000 claims description 36
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a vehicle target tracking method, a vehicle target tracking system, a vehicle target tracking device and a storage medium, and relates to the technical field of artificial intelligence. And sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and performs global maximum pooling and global average pooling operation on the input feature images by the convolution block attention module to obtain an output feature image fusing space-time attention features, and the feature representation capability is improved by the convolution block attention module in the ELAN structure, so that the vehicle target detection accuracy of the detector is improved. Inputting the information of the vehicle detection frame into a deep Sort tracker for track tracking to obtain a vehicle target tracking track, correcting the vehicle position coordinates in the vehicle target tracking track according to camera parameters to obtain a corrected vehicle target tracking track, and improving the accuracy of vehicle tracking.
Description
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, a system, an apparatus, and a storage medium for tracking a vehicle target.
Background
With the rapid development of intelligent traffic, the importance of detecting and tracking vehicle targets by computer vision technology is increasing, and the technology can provide information about the position, track, speed and the like of vehicles. In the related art, a target tracking scheme mainly adopts a detection and tracking combination-based scheme. The tracking scheme based on detection is that all targets in an image are obtained by carrying out target detection on each frame of a video sequence, then the targets are converted into target association problems between the front frame and the rear frame, a similarity matrix is constructed through IoU, appearance and the like, and the similarity matrix is solved through a Hungary algorithm, a greedy algorithm and the like. However, the accuracy and efficiency of the target detection model adopted in the vehicle target tracking process are low, the track recognition is affected, and the track recognition accuracy is low due to the fact that the target information reflected by the acquired image is different from the real target.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a vehicle target tracking method, a system, a device and a storage medium, which can improve the accuracy of vehicle track identification.
In one aspect, an embodiment of the present invention provides a vehicle target tracking method, including the steps of:
acquiring a monitoring video;
sequentially inputting each frame of image of a monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
inputting the information of the vehicle detection frame into a deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track.
According to some embodiments of the present invention, the ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, where the first branch and the second branch are each a 1×1 convolution layer, the third branch is sequentially a 1×1 convolution layer and a 3×3 convolution layer, and the fourth branch is sequentially a 1×1 convolution layer, a 3×3 convolution layer, and a 3×3 convolution layer; the first, second, third and fourth branches are all connected to a splice module.
According to some embodiments of the invention, the convolution block attention module is disposed between the first branch and the splicing module, and the convolution block attention module includes a channel attention mechanism unit and a spatial attention mechanism unit connected in sequence.
According to some embodiments of the invention, the channel attention mechanism unit is configured to:
respectively pooling the input features of the channel attention mechanism unit in the channel dimension to obtain a channel global maximum pooling value and a channel global average pooling value;
inputting the channel global maximum pooling value and the channel global average pooling value into a fully-connected neural network respectively and activating the fully-connected neural network to obtain a first channel characteristic and a second channel characteristic;
the first channel characteristic and the second channel characteristic are activated after the addition operation, so that the channel attention characteristic is obtained;
multiplying the channel attention characteristic and the input characteristic of the input channel attention mechanism unit to obtain the output characteristic of the channel attention mechanism unit;
the spatial attention mechanism unit is used for:
respectively pooling the input features of the spatial attention mechanism unit in the spatial dimension to obtain a spatial global maximum pooling feature and a spatial global average pooling feature;
performing dimension reduction and activation operations after splicing the space global maximum pooling feature and the space global average pooling feature to obtain a space attention feature;
and multiplying the spatial attention characteristic and the input characteristic of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit.
According to some embodiments of the invention, the YOLOv7 detector is trained by a deep learning algorithm, and model loss in the training process of the YOLOv7 detector is obtained by the following steps:
determining the cross ratio loss according to the overlapping degree of the prediction frame and the real frame output by the YOLOv7 detector;
determining a loss weight according to the distance measurement of the prediction frame and the real frame output by the YOLOv7 detector;
based on the attention mechanism, model losses are determined from the loss weights and the cross-ratio losses.
According to some embodiments of the present invention, the inputting the vehicle detection frame information into the deep start tracker to track the track, to obtain a vehicle target tracking track, includes the following steps:
predicting the target position of the next frame according to the target motion trail extracted from the current input video by a Kalman filtering algorithm;
extracting corresponding target appearance features according to the target positions through a feature extraction network;
determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes similarity between features;
and associating the target motion trail with the detection frame according to the cost matrix by using a Hungary algorithm so as to update the target motion trail.
According to some embodiments of the present invention, the correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track includes the following steps:
carrying out distortion transformation on each image pixel point of the monitoring video through the distortion coefficient of the camera parameter so as to project the image pixel point to a camera standardized plane;
projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of the camera parameters to obtain a pixel point conversion relation between a distorted pixel and an original image pixel on the pixel plane;
and changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
In another aspect, an embodiment of the present invention further provides a vehicle target tracking system, including:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and the vehicle detection frame information into the deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and a fourth module, configured to correct the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, and obtain a corrected vehicle target tracking track.
On the other hand, the embodiment of the invention also provides a vehicle target tracking device, which comprises:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the vehicle target tracking method as previously described.
In another aspect, embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a vehicle target tracking method as described above.
The technical scheme of the invention has at least one of the following advantages or beneficial effects: and sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and performs global maximum pooling and global average pooling operation on the input feature images by the convolution block attention module to obtain an output feature image fusing space-time attention features, and the representing capability of the features is improved by the convolution block attention module in the ELAN structure, so that the vehicle target detection accuracy of the YOLOv7 detector is improved. And then inputting the information of the vehicle detection frame into a deep Sort tracker to track to obtain a vehicle target tracking track, correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track, and improving the accuracy of vehicle tracking.
Drawings
FIG. 1 is a flow chart of a method for tracking a vehicle target provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of an ELAN structure provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the location of a convolution block attention module in an ELAN structure according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a vehicle target tracking apparatus according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, left, right, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only, and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
The embodiment of the invention provides a vehicle target tracking method, a system, a device and a storage medium. The terminal may be, but is not limited to, a tablet computer, a notebook computer, a desktop computer, etc. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.
Referring to fig. 1, the vehicle target tracking method of the embodiment of the present invention includes, but is not limited to, step S110, step S120, step S130, and step S140.
Step S110, acquiring a monitoring video;
step S120, sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector for target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
step S130, inputting vehicle detection frame information into a deep start tracker to track, so as to obtain a vehicle target tracking track;
step S140, correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, to obtain the corrected vehicle target tracking track.
In some embodiments of step S120, yolov7 is a version of the Yolo model (You Only Look Once: unified, real-Time Object Detection target detection system based on a single neural network). YOLO models are deep learning algorithms that can be used for image recognition in computer vision technology. The YOLO model converts the object detection problem into a Regression class problem, i.e., given an input image, the bounding box of the object and its class are regressed directly at multiple locations on the image. The YOLO models include, but are not limited to, yolov3, yolov4, yolov5, yolov7 (all are different versions of YOLO), etc., and the weights, network structure diagrams and algorithms of the different models are different, and the used region sampling methods are also different.
The overall network architecture of the yolov7 detector consists mainly of three parts: input (input layer), back (backbone layer) for extracting features and head (header layer) for prediction. The yolov7 detector pre-processes the input picture and inputs the picture into the backbond network, and continuously outputs three layers of feature pictures with different size through the backbond network at the head layer according to three layers of output in the backbond network, predicts three types of tasks (classification, front and back background classification and frame) of image detection through RepVGG block and conv, and outputs a final result. The backup layer of yolov7 consists of a plurality of CBS modules, ELAN modules and MP modules, wherein the CBS modules are mainly used for performing convolution operation; the ELAN module is a high-efficiency network structure, and enables the network to learn more features and have stronger robustness by controlling the shortest and longest gradient paths; the MP module is mainly used for carrying out downsampling operation.
In an embodiment of the present invention, referring to fig. 2, the modified ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, where the first branch and the second branch are all 1×1 convolution layers, the third branch is sequentially 1×1 convolution layers and 3×3 convolution layers, and the fourth branch is sequentially 1×1 convolution layers, 3×3 convolution layers, and 3×3 convolution layers. The first branch, the second branch, the third branch and the fourth branch are all connected to a splicing module, and the output of each branch is spliced by adopting a Concat function in the splicing module. Specifically, the training profile yolov7.yaml of the YOLOv7 detector is modified, the backbone portion is modified, 8 th convolution layers are deleted from layers 7, 8, 20, 21, 33, 34, 46, 47, and four Concat layers from layers 10, 23, 36, 49 are modified to [ -1, -2, -3, -4],1, concat, [1] ].
Further, referring to fig. 3, an improved Convolution Block Attention Module (CBAM) is added to the ELAN structure, and the convolution block attention module is serially connected after the first convolution module (i.e., the first branch) in the optimized ELAN structure, and performs a pooling operation on the features output by the convolution module based on the attention mechanism, so that the important feature representation is enhanced by the model, and the target detection accuracy of the model is improved. The convolution block attention module comprises a channel attention mechanism unit and a space attention mechanism unit which are connected in sequence, and the working processes of the channel attention mechanism unit and the space attention mechanism unit are as follows:
channel attention mechanism unit: respectively pooling the input feature F (namely the output feature of the first branch) on the channel dimension through a global maximum pooling layer and a global average pooling layer to obtain a channel global maximum pooling value and a channel global average pooling value; inputting the channel global maximum pooling value and the channel global average pooling value into a layer of fully connected neural network respectively, and activating the fully connected neural network through a Relu activation function to obtain a first channel characteristic and a second channel characteristic; the first channel feature and the second channel feature are subjected to element-wise based addition operation, and then are subjected to sigmoid activation operation to obtain a channel attention feature M c The method comprises the steps of carrying out a first treatment on the surface of the Channel attention feature M c And performing element-wise multiplication operation on the input characteristic F of the input channel attention mechanism unit to obtain an output characteristic F' of the channel attention mechanism unit.
Spatial attention mechanism unit: respectively carrying out global maximum pooling and global average pooling on the input features of the spatial attention mechanism unit (namely, the output features F') of the attention mechanism unit in the spatial dimension to obtain spatial global maximum pooling features and spatial global average pooling features; splicing the space global maximum pooling feature and the space global average pooling feature, reducing the dimension to 1 channel through a convolution layer, and then activating through sigmoid to obtain a space attention feature; spatial attention feature M s And performing element-wise multiplication operation on the input characteristic F' of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit, namely, an output characteristic diagram fusing the space-time attention characteristics.
In some embodiments, the YOLOv7 detector is trained by a deep learning algorithm, and model losses need to be calculated according to a prediction frame and a real frame output by the YOLOv7 detector during the training process of the YOLOv7 detector, and the YOLOv7 detector is continuously updated according to the model losses until the output of the YOLOv7 detector reaches a preset precision. Specifically, the calculation is performed by using a Wise-IoU loss function in the training process of the YOLOv7 detector, and the calculation is specifically as follows:
determining the cross-ratio loss according to the overlapping degree of the predicted frame and the real frame output by the YOLOv7 detector, wherein the cross-ratio loss is defined as:
L IoU =1-IOU;
where IoU denotes the intersection ratio of the predicted frame and the real frame.
Determining a loss weight according to the distance metric of the prediction frame and the real frame output by the YOLOv7 detector, wherein the loss weight is defined as:
wherein x and y are the coordinates of the central point of the prediction frame, and x is the coordinate of the central point of the prediction frame gt 、y gt Is the coordinates of the center point of the real frame, W g 、H g Representing the maximum width and height of the sum area of the predicted and real frames, exp (·) represents the exponential operation.
Based on the attention mechanism, determining a model loss according to the loss weight and the cross ratio loss, wherein the model loss is expressed as follows:
L WIoU =R WIoU L IoU ;
further, after training is finished, the super parameters are adjusted according to the target data set, specifically, the data set can be trained again for selecting the evolve option, 300 rounds of training are performed, each round of training is performed for 10 generations, and the optimal super parameter configuration is found.
In some embodiments of step S130, the deepstart tracker is an algorithm that combines two tasks of target detection and target tracking. After the location and bounding box of the target object are detected in each frame, feature representations of the targets are extracted by a deep learning model (e.g., CNN), and each target is matched with the tracked targets in the previous frame. In the matching process, factors such as feature similarity, motion consistency and the like of the target are considered to determine the identity and the track of the target.
Specifically, in step S130, the step of inputting the vehicle detection frame information into the deep start tracker to track and obtain the vehicle target tracking track includes, but is not limited to, the following steps:
step S210, predicting a target position of a next frame according to a target motion track extracted from a current input video by a Kalman filtering algorithm, wherein the target motion track comprises an uncertain state track and a certain state track;
step S220, extracting corresponding target appearance features according to the target positions through a feature extraction network;
step S230, determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes the similarity between the features;
step S240, the target motion trail and the detection frame are associated according to the cost matrix through the Hungary algorithm so as to update the target motion trail.
In this embodiment, the YOLOv7 detector of the embodiment of the present invention is utilized to identify the vehicle target in the monitoring video frame by frame, so as to obtain the vehicle detection frame information of the vehicle target, where the vehicle detection frame information includes but is not limited to coordinate information, category, confidence level, image feature, and the like, and the vehicle detection frame information of the vehicle target is input into the deep start tracker. In the deep start tracker, a first batch of uncertain tracks needs to be created according to a first frame detection result, then a batch of uncertain tracks is obtained through subsequent updating of a detection frame and an association matching result of the uncertain tracks, and the deep start tracker comprises the following steps:
s1, carrying out IOU matching on a detection frame of a current frame and a target track frame obtained by carrying out Kalman filtering prediction according to a target motion track of a previous frame, calculating a cost matrix, inputting the cost matrix into a Hungary algorithm to obtain three matching results, wherein the first matching result is track mismatch, and deleting a definite state track and an indefinite state track, wherein the mismatch reaches the prediction times (such as 30 times); the second is target mismatch, then create new track; and thirdly, track matching, namely, successful tracking is illustrated, track variables corresponding to the target are updated through Kalman filtering, and the current steps are repeated until a track frame or a video frame with a determined state is finished.
S2, predicting a determined state track and an uncertain state track through Kalman filtering, and carrying out cascade matching on the determined state track and a target detection frame to obtain three matching results, wherein the first is track matching, and updating is carried out through Kalman filtering; the second is track mismatch; and the third result is target mismatch, at this time, IOU matching is carried out on the track of the previous uncertain state, the track of mismatch and the target of mismatch, and then the cost matrix of the target is calculated according to the matching result.
S3, inputting the cost matrix into a Hungary algorithm to obtain three matching results, wherein the first is track mismatch, deleting the determined state track and the uncertain state track, which are mismatched for 30 times, the second is target mismatch, creating a new track, and the third is track matching, indicating successful tracking, and updating the track variable corresponding to the target through Kalman filtering.
S4, repeating the step S3 until the video frame is finished, and obtaining the vehicle target position and the vehicle target track.
According to some embodiments of the present invention, in step S140, the step of correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track includes, but is not limited to, the steps of:
step S310, each image pixel of the monitoring video is subjected to distortion transformation through the distortion coefficient of the camera parameter so as to be projected to a camera standardized plane;
step S320, projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of camera parameters to obtain a pixel point conversion relation between the distorted pixels and original image pixels on the pixel plane;
and step S330, changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
In this embodiment, the internal reference matrix and distortion parameters of the camera and the pixel size of the monitoring video are obtained, then the pixel points on the video image are projected to the normal plane of the camera, distortion transformation is performed on the normal plane, then the points after the distortion transformation are projected back to the pixel plane, the corresponding conversion relation between the distorted pixel points and the original image pixel points is obtained, the above-mentioned process is repeated until all the pixel point conversion relations are obtained, and the vehicle position coordinates in the track output by the deep start tracker are changed according to the pixel point conversion relations, so that the corrected vehicle target tracking track is obtained.
According to some embodiments of the invention, an improved ELAN structure is adopted by an improved YOLOv7 detector to replace an ELAN structure in an original YOLOv7 network architecture, an improved convolution attention module is added in the optimized ELAN structure, a Wise-IoU loss function is adopted to perform model training, and super-parameters of a training model are evaluated and optimized according to a data set, so that accuracy of vehicle target tracking under a monitoring view angle is improved. The accuracy of extracting the motion trail of the vehicle target is improved through de-distortion treatment of the tracking trail coordinates.
The embodiment of the invention also provides a vehicle target tracking system, which comprises:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a backbone network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and vehicle detection frame information into the DeepSort tracker to track the track, so as to obtain a vehicle target tracking track;
and the fourth module is used for correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain the corrected vehicle target tracking track.
It can be understood that the content in the above-mentioned vehicle target tracking method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those in the above-mentioned vehicle target tracking method embodiment, and the beneficial effects achieved by the system embodiment are the same as those achieved by the above-mentioned vehicle target tracking method embodiment.
Referring to fig. 4, fig. 4 is a schematic diagram of a vehicle target tracking apparatus according to an embodiment of the present invention. The vehicle target tracking apparatus according to an embodiment of the present invention includes one or more control processors and a memory, and one control processor and one memory are exemplified in fig. 4.
The control processor and the memory may be connected by a bus or otherwise, for example in fig. 4.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the control processor, the remote memory being connectable to the vehicle target tracking apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be appreciated by those skilled in the art that the device configuration shown in fig. 4 is not limiting of the vehicle target tracking device and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.
The non-transitory software program and instructions required to implement the vehicle target tracking method applied to the vehicle target tracking apparatus in the above-described embodiments are stored in the memory, and when executed by the control processor, the vehicle target tracking method applied to the vehicle target tracking apparatus in the above-described embodiments is executed.
Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by one or more control processors to cause the one or more control processors to perform the vehicle target tracking method in the method embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention.
Claims (10)
1. A vehicle target tracking method, comprising the steps of:
acquiring a monitoring video;
sequentially inputting each frame of image of a monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
inputting the information of the vehicle detection frame into a deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track.
2. The vehicle target tracking method of claim 1, wherein the ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, the first branch and the second branch each being a 1 x 1 convolution layer, the third branch being sequentially a 1 x 1 convolution layer and a 3 x 3 convolution layer, the fourth branch being sequentially a 1 x 1 convolution layer, a 3 x 3 convolution layer, and a 3 x 3 convolution layer; the first, second, third and fourth branches are all connected to a splice module.
3. The vehicle object tracking method of claim 2, wherein the convolution block attention module is disposed between the first branch and the splice module, the convolution block attention module including a channel attention mechanism unit and a spatial attention mechanism unit connected in sequence.
4. A vehicle object tracking method as claimed in claim 3, characterized in that the channel attention mechanism unit is adapted to:
respectively pooling the input features of the channel attention mechanism unit in the channel dimension to obtain a channel global maximum pooling value and a channel global average pooling value;
inputting the channel global maximum pooling value and the channel global average pooling value into a fully-connected neural network respectively and activating the fully-connected neural network to obtain a first channel characteristic and a second channel characteristic;
the first channel characteristic and the second channel characteristic are activated after the addition operation, so that the channel attention characteristic is obtained;
multiplying the channel attention characteristic and the input characteristic of the input channel attention mechanism unit to obtain the output characteristic of the channel attention mechanism unit;
the spatial attention mechanism unit is used for:
respectively pooling the input features of the spatial attention mechanism unit in the spatial dimension to obtain a spatial global maximum pooling feature and a spatial global average pooling feature;
performing dimension reduction and activation operations after splicing the space global maximum pooling feature and the space global average pooling feature to obtain a space attention feature;
and multiplying the spatial attention characteristic and the input characteristic of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit.
5. The vehicle target tracking method according to claim 1, wherein the YOLOv7 detector is trained by a deep learning algorithm, and model loss during the training of the YOLOv7 detector is obtained by:
determining the cross ratio loss according to the overlapping degree of the prediction frame and the real frame output by the YOLOv7 detector;
determining a loss weight according to the distance measurement of the prediction frame and the real frame output by the YOLOv7 detector;
based on the attention mechanism, model losses are determined from the loss weights and the cross-ratio losses.
6. The vehicle target tracking method according to claim 1, wherein the step of inputting the vehicle detection frame information into a deep start tracker to track a track, and obtaining a vehicle target tracking track, comprises the steps of:
predicting the target position of the next frame according to the target motion trail extracted from the current input video by a Kalman filtering algorithm;
extracting corresponding target appearance features according to the target positions through a feature extraction network;
determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes similarity between features;
and associating the target motion trail with the detection frame according to the cost matrix by using a Hungary algorithm so as to update the target motion trail.
7. The vehicle target tracking method according to claim 1, wherein the correcting the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters, to obtain the corrected vehicle target tracking trajectory, comprises the steps of:
carrying out distortion transformation on each image pixel point of the monitoring video through the distortion coefficient of the camera parameter so as to project the image pixel point to a camera standardized plane;
projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of the camera parameters to obtain a pixel point conversion relation between a distorted pixel and an original image pixel on the pixel plane;
and changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
8. A vehicle target tracking system, comprising:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and the vehicle detection frame information into the deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and a fourth module, configured to correct the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, and obtain a corrected vehicle target tracking track.
9. A vehicle target tracking apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the vehicle object tracking method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium in which a processor-executable program is stored, characterized in that the processor-executable program is for realizing the vehicle object tracking method according to any one of claims 1 to 7 when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311595105.1A CN117636267A (en) | 2023-11-24 | 2023-11-24 | Vehicle target tracking method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311595105.1A CN117636267A (en) | 2023-11-24 | 2023-11-24 | Vehicle target tracking method, system, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117636267A true CN117636267A (en) | 2024-03-01 |
Family
ID=90015712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311595105.1A Pending CN117636267A (en) | 2023-11-24 | 2023-11-24 | Vehicle target tracking method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117636267A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118506298A (en) * | 2024-07-17 | 2024-08-16 | 江西锦路科技开发有限公司 | Cross-camera vehicle track association method |
-
2023
- 2023-11-24 CN CN202311595105.1A patent/CN117636267A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118506298A (en) * | 2024-07-17 | 2024-08-16 | 江西锦路科技开发有限公司 | Cross-camera vehicle track association method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111127513B (en) | Multi-target tracking method | |
CN108710885B (en) | Target object detection method and device | |
CN107529650B (en) | Closed loop detection method and device and computer equipment | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN114764868A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN110119148A (en) | A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium | |
CN113313763B (en) | Monocular camera pose optimization method and device based on neural network | |
CN113076871A (en) | Fish shoal automatic detection method based on target shielding compensation | |
CN117636267A (en) | Vehicle target tracking method, system, device and storage medium | |
CN111696133B (en) | Real-time target tracking method and system | |
CN112418195A (en) | Face key point detection method and device, electronic equipment and storage medium | |
CN114926766A (en) | Identification method and device, equipment and computer readable storage medium | |
CN113129336A (en) | End-to-end multi-vehicle tracking method, system and computer readable medium | |
CN115063447A (en) | Target animal motion tracking method based on video sequence and related equipment | |
CN117036397A (en) | Multi-target tracking method based on fusion information association and camera motion compensation | |
CN115527050A (en) | Image feature matching method, computer device and readable storage medium | |
CN114170269B (en) | Multi-target tracking method, equipment and storage medium based on space-time correlation | |
CN116310993A (en) | Target detection method, device, equipment and storage medium | |
CN115049731A (en) | Visual mapping and positioning method based on binocular camera | |
CN113592706B (en) | Method and device for adjusting homography matrix parameters | |
CN117765363A (en) | Image anomaly detection method and system based on lightweight memory bank | |
CN116824641B (en) | Gesture classification method, device, equipment and computer storage medium | |
CN116630367B (en) | Target tracking method, device, electronic equipment and storage medium | |
CN113112479A (en) | Progressive target detection method and device based on key block extraction | |
CN117372928A (en) | Video target detection method and device and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |