Nothing Special   »   [go: up one dir, main page]

CN117636267A - Vehicle target tracking method, system, device and storage medium - Google Patents

Vehicle target tracking method, system, device and storage medium Download PDF

Info

Publication number
CN117636267A
CN117636267A CN202311595105.1A CN202311595105A CN117636267A CN 117636267 A CN117636267 A CN 117636267A CN 202311595105 A CN202311595105 A CN 202311595105A CN 117636267 A CN117636267 A CN 117636267A
Authority
CN
China
Prior art keywords
vehicle
target tracking
track
vehicle target
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311595105.1A
Other languages
Chinese (zh)
Inventor
于德新
袁梓珉
张泽华
初良勇
刘晓佳
吴新程
王胪陈
杨宇
彭万里
周会奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN202311595105.1A priority Critical patent/CN117636267A/en
Publication of CN117636267A publication Critical patent/CN117636267A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle target tracking method, a vehicle target tracking system, a vehicle target tracking device and a storage medium, and relates to the technical field of artificial intelligence. And sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and performs global maximum pooling and global average pooling operation on the input feature images by the convolution block attention module to obtain an output feature image fusing space-time attention features, and the feature representation capability is improved by the convolution block attention module in the ELAN structure, so that the vehicle target detection accuracy of the detector is improved. Inputting the information of the vehicle detection frame into a deep Sort tracker for track tracking to obtain a vehicle target tracking track, correcting the vehicle position coordinates in the vehicle target tracking track according to camera parameters to obtain a corrected vehicle target tracking track, and improving the accuracy of vehicle tracking.

Description

Vehicle target tracking method, system, device and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, a system, an apparatus, and a storage medium for tracking a vehicle target.
Background
With the rapid development of intelligent traffic, the importance of detecting and tracking vehicle targets by computer vision technology is increasing, and the technology can provide information about the position, track, speed and the like of vehicles. In the related art, a target tracking scheme mainly adopts a detection and tracking combination-based scheme. The tracking scheme based on detection is that all targets in an image are obtained by carrying out target detection on each frame of a video sequence, then the targets are converted into target association problems between the front frame and the rear frame, a similarity matrix is constructed through IoU, appearance and the like, and the similarity matrix is solved through a Hungary algorithm, a greedy algorithm and the like. However, the accuracy and efficiency of the target detection model adopted in the vehicle target tracking process are low, the track recognition is affected, and the track recognition accuracy is low due to the fact that the target information reflected by the acquired image is different from the real target.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a vehicle target tracking method, a system, a device and a storage medium, which can improve the accuracy of vehicle track identification.
In one aspect, an embodiment of the present invention provides a vehicle target tracking method, including the steps of:
acquiring a monitoring video;
sequentially inputting each frame of image of a monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
inputting the information of the vehicle detection frame into a deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track.
According to some embodiments of the present invention, the ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, where the first branch and the second branch are each a 1×1 convolution layer, the third branch is sequentially a 1×1 convolution layer and a 3×3 convolution layer, and the fourth branch is sequentially a 1×1 convolution layer, a 3×3 convolution layer, and a 3×3 convolution layer; the first, second, third and fourth branches are all connected to a splice module.
According to some embodiments of the invention, the convolution block attention module is disposed between the first branch and the splicing module, and the convolution block attention module includes a channel attention mechanism unit and a spatial attention mechanism unit connected in sequence.
According to some embodiments of the invention, the channel attention mechanism unit is configured to:
respectively pooling the input features of the channel attention mechanism unit in the channel dimension to obtain a channel global maximum pooling value and a channel global average pooling value;
inputting the channel global maximum pooling value and the channel global average pooling value into a fully-connected neural network respectively and activating the fully-connected neural network to obtain a first channel characteristic and a second channel characteristic;
the first channel characteristic and the second channel characteristic are activated after the addition operation, so that the channel attention characteristic is obtained;
multiplying the channel attention characteristic and the input characteristic of the input channel attention mechanism unit to obtain the output characteristic of the channel attention mechanism unit;
the spatial attention mechanism unit is used for:
respectively pooling the input features of the spatial attention mechanism unit in the spatial dimension to obtain a spatial global maximum pooling feature and a spatial global average pooling feature;
performing dimension reduction and activation operations after splicing the space global maximum pooling feature and the space global average pooling feature to obtain a space attention feature;
and multiplying the spatial attention characteristic and the input characteristic of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit.
According to some embodiments of the invention, the YOLOv7 detector is trained by a deep learning algorithm, and model loss in the training process of the YOLOv7 detector is obtained by the following steps:
determining the cross ratio loss according to the overlapping degree of the prediction frame and the real frame output by the YOLOv7 detector;
determining a loss weight according to the distance measurement of the prediction frame and the real frame output by the YOLOv7 detector;
based on the attention mechanism, model losses are determined from the loss weights and the cross-ratio losses.
According to some embodiments of the present invention, the inputting the vehicle detection frame information into the deep start tracker to track the track, to obtain a vehicle target tracking track, includes the following steps:
predicting the target position of the next frame according to the target motion trail extracted from the current input video by a Kalman filtering algorithm;
extracting corresponding target appearance features according to the target positions through a feature extraction network;
determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes similarity between features;
and associating the target motion trail with the detection frame according to the cost matrix by using a Hungary algorithm so as to update the target motion trail.
According to some embodiments of the present invention, the correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track includes the following steps:
carrying out distortion transformation on each image pixel point of the monitoring video through the distortion coefficient of the camera parameter so as to project the image pixel point to a camera standardized plane;
projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of the camera parameters to obtain a pixel point conversion relation between a distorted pixel and an original image pixel on the pixel plane;
and changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
In another aspect, an embodiment of the present invention further provides a vehicle target tracking system, including:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and the vehicle detection frame information into the deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and a fourth module, configured to correct the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, and obtain a corrected vehicle target tracking track.
On the other hand, the embodiment of the invention also provides a vehicle target tracking device, which comprises:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the vehicle target tracking method as previously described.
In another aspect, embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a vehicle target tracking method as described above.
The technical scheme of the invention has at least one of the following advantages or beneficial effects: and sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and performs global maximum pooling and global average pooling operation on the input feature images by the convolution block attention module to obtain an output feature image fusing space-time attention features, and the representing capability of the features is improved by the convolution block attention module in the ELAN structure, so that the vehicle target detection accuracy of the YOLOv7 detector is improved. And then inputting the information of the vehicle detection frame into a deep Sort tracker to track to obtain a vehicle target tracking track, correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track, and improving the accuracy of vehicle tracking.
Drawings
FIG. 1 is a flow chart of a method for tracking a vehicle target provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of an ELAN structure provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the location of a convolution block attention module in an ELAN structure according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a vehicle target tracking apparatus according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, left, right, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only, and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
The embodiment of the invention provides a vehicle target tracking method, a system, a device and a storage medium. The terminal may be, but is not limited to, a tablet computer, a notebook computer, a desktop computer, etc. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.
Referring to fig. 1, the vehicle target tracking method of the embodiment of the present invention includes, but is not limited to, step S110, step S120, step S130, and step S140.
Step S110, acquiring a monitoring video;
step S120, sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector for target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
step S130, inputting vehicle detection frame information into a deep start tracker to track, so as to obtain a vehicle target tracking track;
step S140, correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, to obtain the corrected vehicle target tracking track.
In some embodiments of step S120, yolov7 is a version of the Yolo model (You Only Look Once: unified, real-Time Object Detection target detection system based on a single neural network). YOLO models are deep learning algorithms that can be used for image recognition in computer vision technology. The YOLO model converts the object detection problem into a Regression class problem, i.e., given an input image, the bounding box of the object and its class are regressed directly at multiple locations on the image. The YOLO models include, but are not limited to, yolov3, yolov4, yolov5, yolov7 (all are different versions of YOLO), etc., and the weights, network structure diagrams and algorithms of the different models are different, and the used region sampling methods are also different.
The overall network architecture of the yolov7 detector consists mainly of three parts: input (input layer), back (backbone layer) for extracting features and head (header layer) for prediction. The yolov7 detector pre-processes the input picture and inputs the picture into the backbond network, and continuously outputs three layers of feature pictures with different size through the backbond network at the head layer according to three layers of output in the backbond network, predicts three types of tasks (classification, front and back background classification and frame) of image detection through RepVGG block and conv, and outputs a final result. The backup layer of yolov7 consists of a plurality of CBS modules, ELAN modules and MP modules, wherein the CBS modules are mainly used for performing convolution operation; the ELAN module is a high-efficiency network structure, and enables the network to learn more features and have stronger robustness by controlling the shortest and longest gradient paths; the MP module is mainly used for carrying out downsampling operation.
In an embodiment of the present invention, referring to fig. 2, the modified ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, where the first branch and the second branch are all 1×1 convolution layers, the third branch is sequentially 1×1 convolution layers and 3×3 convolution layers, and the fourth branch is sequentially 1×1 convolution layers, 3×3 convolution layers, and 3×3 convolution layers. The first branch, the second branch, the third branch and the fourth branch are all connected to a splicing module, and the output of each branch is spliced by adopting a Concat function in the splicing module. Specifically, the training profile yolov7.yaml of the YOLOv7 detector is modified, the backbone portion is modified, 8 th convolution layers are deleted from layers 7, 8, 20, 21, 33, 34, 46, 47, and four Concat layers from layers 10, 23, 36, 49 are modified to [ -1, -2, -3, -4],1, concat, [1] ].
Further, referring to fig. 3, an improved Convolution Block Attention Module (CBAM) is added to the ELAN structure, and the convolution block attention module is serially connected after the first convolution module (i.e., the first branch) in the optimized ELAN structure, and performs a pooling operation on the features output by the convolution module based on the attention mechanism, so that the important feature representation is enhanced by the model, and the target detection accuracy of the model is improved. The convolution block attention module comprises a channel attention mechanism unit and a space attention mechanism unit which are connected in sequence, and the working processes of the channel attention mechanism unit and the space attention mechanism unit are as follows:
channel attention mechanism unit: respectively pooling the input feature F (namely the output feature of the first branch) on the channel dimension through a global maximum pooling layer and a global average pooling layer to obtain a channel global maximum pooling value and a channel global average pooling value; inputting the channel global maximum pooling value and the channel global average pooling value into a layer of fully connected neural network respectively, and activating the fully connected neural network through a Relu activation function to obtain a first channel characteristic and a second channel characteristic; the first channel feature and the second channel feature are subjected to element-wise based addition operation, and then are subjected to sigmoid activation operation to obtain a channel attention feature M c The method comprises the steps of carrying out a first treatment on the surface of the Channel attention feature M c And performing element-wise multiplication operation on the input characteristic F of the input channel attention mechanism unit to obtain an output characteristic F' of the channel attention mechanism unit.
Spatial attention mechanism unit: respectively carrying out global maximum pooling and global average pooling on the input features of the spatial attention mechanism unit (namely, the output features F') of the attention mechanism unit in the spatial dimension to obtain spatial global maximum pooling features and spatial global average pooling features; splicing the space global maximum pooling feature and the space global average pooling feature, reducing the dimension to 1 channel through a convolution layer, and then activating through sigmoid to obtain a space attention feature; spatial attention feature M s And performing element-wise multiplication operation on the input characteristic F' of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit, namely, an output characteristic diagram fusing the space-time attention characteristics.
In some embodiments, the YOLOv7 detector is trained by a deep learning algorithm, and model losses need to be calculated according to a prediction frame and a real frame output by the YOLOv7 detector during the training process of the YOLOv7 detector, and the YOLOv7 detector is continuously updated according to the model losses until the output of the YOLOv7 detector reaches a preset precision. Specifically, the calculation is performed by using a Wise-IoU loss function in the training process of the YOLOv7 detector, and the calculation is specifically as follows:
determining the cross-ratio loss according to the overlapping degree of the predicted frame and the real frame output by the YOLOv7 detector, wherein the cross-ratio loss is defined as:
L IoU =1-IOU;
where IoU denotes the intersection ratio of the predicted frame and the real frame.
Determining a loss weight according to the distance metric of the prediction frame and the real frame output by the YOLOv7 detector, wherein the loss weight is defined as:
wherein x and y are the coordinates of the central point of the prediction frame, and x is the coordinate of the central point of the prediction frame gt 、y gt Is the coordinates of the center point of the real frame, W g 、H g Representing the maximum width and height of the sum area of the predicted and real frames, exp (·) represents the exponential operation.
Based on the attention mechanism, determining a model loss according to the loss weight and the cross ratio loss, wherein the model loss is expressed as follows:
L WIoU =R WIoU L IoU
further, after training is finished, the super parameters are adjusted according to the target data set, specifically, the data set can be trained again for selecting the evolve option, 300 rounds of training are performed, each round of training is performed for 10 generations, and the optimal super parameter configuration is found.
In some embodiments of step S130, the deepstart tracker is an algorithm that combines two tasks of target detection and target tracking. After the location and bounding box of the target object are detected in each frame, feature representations of the targets are extracted by a deep learning model (e.g., CNN), and each target is matched with the tracked targets in the previous frame. In the matching process, factors such as feature similarity, motion consistency and the like of the target are considered to determine the identity and the track of the target.
Specifically, in step S130, the step of inputting the vehicle detection frame information into the deep start tracker to track and obtain the vehicle target tracking track includes, but is not limited to, the following steps:
step S210, predicting a target position of a next frame according to a target motion track extracted from a current input video by a Kalman filtering algorithm, wherein the target motion track comprises an uncertain state track and a certain state track;
step S220, extracting corresponding target appearance features according to the target positions through a feature extraction network;
step S230, determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes the similarity between the features;
step S240, the target motion trail and the detection frame are associated according to the cost matrix through the Hungary algorithm so as to update the target motion trail.
In this embodiment, the YOLOv7 detector of the embodiment of the present invention is utilized to identify the vehicle target in the monitoring video frame by frame, so as to obtain the vehicle detection frame information of the vehicle target, where the vehicle detection frame information includes but is not limited to coordinate information, category, confidence level, image feature, and the like, and the vehicle detection frame information of the vehicle target is input into the deep start tracker. In the deep start tracker, a first batch of uncertain tracks needs to be created according to a first frame detection result, then a batch of uncertain tracks is obtained through subsequent updating of a detection frame and an association matching result of the uncertain tracks, and the deep start tracker comprises the following steps:
s1, carrying out IOU matching on a detection frame of a current frame and a target track frame obtained by carrying out Kalman filtering prediction according to a target motion track of a previous frame, calculating a cost matrix, inputting the cost matrix into a Hungary algorithm to obtain three matching results, wherein the first matching result is track mismatch, and deleting a definite state track and an indefinite state track, wherein the mismatch reaches the prediction times (such as 30 times); the second is target mismatch, then create new track; and thirdly, track matching, namely, successful tracking is illustrated, track variables corresponding to the target are updated through Kalman filtering, and the current steps are repeated until a track frame or a video frame with a determined state is finished.
S2, predicting a determined state track and an uncertain state track through Kalman filtering, and carrying out cascade matching on the determined state track and a target detection frame to obtain three matching results, wherein the first is track matching, and updating is carried out through Kalman filtering; the second is track mismatch; and the third result is target mismatch, at this time, IOU matching is carried out on the track of the previous uncertain state, the track of mismatch and the target of mismatch, and then the cost matrix of the target is calculated according to the matching result.
S3, inputting the cost matrix into a Hungary algorithm to obtain three matching results, wherein the first is track mismatch, deleting the determined state track and the uncertain state track, which are mismatched for 30 times, the second is target mismatch, creating a new track, and the third is track matching, indicating successful tracking, and updating the track variable corresponding to the target through Kalman filtering.
S4, repeating the step S3 until the video frame is finished, and obtaining the vehicle target position and the vehicle target track.
According to some embodiments of the present invention, in step S140, the step of correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track includes, but is not limited to, the steps of:
step S310, each image pixel of the monitoring video is subjected to distortion transformation through the distortion coefficient of the camera parameter so as to be projected to a camera standardized plane;
step S320, projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of camera parameters to obtain a pixel point conversion relation between the distorted pixels and original image pixels on the pixel plane;
and step S330, changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
In this embodiment, the internal reference matrix and distortion parameters of the camera and the pixel size of the monitoring video are obtained, then the pixel points on the video image are projected to the normal plane of the camera, distortion transformation is performed on the normal plane, then the points after the distortion transformation are projected back to the pixel plane, the corresponding conversion relation between the distorted pixel points and the original image pixel points is obtained, the above-mentioned process is repeated until all the pixel point conversion relations are obtained, and the vehicle position coordinates in the track output by the deep start tracker are changed according to the pixel point conversion relations, so that the corrected vehicle target tracking track is obtained.
According to some embodiments of the invention, an improved ELAN structure is adopted by an improved YOLOv7 detector to replace an ELAN structure in an original YOLOv7 network architecture, an improved convolution attention module is added in the optimized ELAN structure, a Wise-IoU loss function is adopted to perform model training, and super-parameters of a training model are evaluated and optimized according to a data set, so that accuracy of vehicle target tracking under a monitoring view angle is improved. The accuracy of extracting the motion trail of the vehicle target is improved through de-distortion treatment of the tracking trail coordinates.
The embodiment of the invention also provides a vehicle target tracking system, which comprises:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a backbone network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and vehicle detection frame information into the DeepSort tracker to track the track, so as to obtain a vehicle target tracking track;
and the fourth module is used for correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain the corrected vehicle target tracking track.
It can be understood that the content in the above-mentioned vehicle target tracking method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those in the above-mentioned vehicle target tracking method embodiment, and the beneficial effects achieved by the system embodiment are the same as those achieved by the above-mentioned vehicle target tracking method embodiment.
Referring to fig. 4, fig. 4 is a schematic diagram of a vehicle target tracking apparatus according to an embodiment of the present invention. The vehicle target tracking apparatus according to an embodiment of the present invention includes one or more control processors and a memory, and one control processor and one memory are exemplified in fig. 4.
The control processor and the memory may be connected by a bus or otherwise, for example in fig. 4.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the control processor, the remote memory being connectable to the vehicle target tracking apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be appreciated by those skilled in the art that the device configuration shown in fig. 4 is not limiting of the vehicle target tracking device and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.
The non-transitory software program and instructions required to implement the vehicle target tracking method applied to the vehicle target tracking apparatus in the above-described embodiments are stored in the memory, and when executed by the control processor, the vehicle target tracking method applied to the vehicle target tracking apparatus in the above-described embodiments is executed.
Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions that are executed by one or more control processors to cause the one or more control processors to perform the vehicle target tracking method in the method embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention.

Claims (10)

1. A vehicle target tracking method, comprising the steps of:
acquiring a monitoring video;
sequentially inputting each frame of image of a monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and the convolution block attention module performs global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
inputting the information of the vehicle detection frame into a deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and correcting the vehicle position coordinates in the vehicle target tracking track according to the camera parameters to obtain a corrected vehicle target tracking track.
2. The vehicle target tracking method of claim 1, wherein the ELAN structure includes a first branch, a second branch, a third branch, and a fourth branch, the first branch and the second branch each being a 1 x 1 convolution layer, the third branch being sequentially a 1 x 1 convolution layer and a 3 x 3 convolution layer, the fourth branch being sequentially a 1 x 1 convolution layer, a 3 x 3 convolution layer, and a 3 x 3 convolution layer; the first, second, third and fourth branches are all connected to a splice module.
3. The vehicle object tracking method of claim 2, wherein the convolution block attention module is disposed between the first branch and the splice module, the convolution block attention module including a channel attention mechanism unit and a spatial attention mechanism unit connected in sequence.
4. A vehicle object tracking method as claimed in claim 3, characterized in that the channel attention mechanism unit is adapted to:
respectively pooling the input features of the channel attention mechanism unit in the channel dimension to obtain a channel global maximum pooling value and a channel global average pooling value;
inputting the channel global maximum pooling value and the channel global average pooling value into a fully-connected neural network respectively and activating the fully-connected neural network to obtain a first channel characteristic and a second channel characteristic;
the first channel characteristic and the second channel characteristic are activated after the addition operation, so that the channel attention characteristic is obtained;
multiplying the channel attention characteristic and the input characteristic of the input channel attention mechanism unit to obtain the output characteristic of the channel attention mechanism unit;
the spatial attention mechanism unit is used for:
respectively pooling the input features of the spatial attention mechanism unit in the spatial dimension to obtain a spatial global maximum pooling feature and a spatial global average pooling feature;
performing dimension reduction and activation operations after splicing the space global maximum pooling feature and the space global average pooling feature to obtain a space attention feature;
and multiplying the spatial attention characteristic and the input characteristic of the input spatial attention mechanism unit to obtain the output characteristic of the spatial attention mechanism unit.
5. The vehicle target tracking method according to claim 1, wherein the YOLOv7 detector is trained by a deep learning algorithm, and model loss during the training of the YOLOv7 detector is obtained by:
determining the cross ratio loss according to the overlapping degree of the prediction frame and the real frame output by the YOLOv7 detector;
determining a loss weight according to the distance measurement of the prediction frame and the real frame output by the YOLOv7 detector;
based on the attention mechanism, model losses are determined from the loss weights and the cross-ratio losses.
6. The vehicle target tracking method according to claim 1, wherein the step of inputting the vehicle detection frame information into a deep start tracker to track a track, and obtaining a vehicle target tracking track, comprises the steps of:
predicting the target position of the next frame according to the target motion trail extracted from the current input video by a Kalman filtering algorithm;
extracting corresponding target appearance features according to the target positions through a feature extraction network;
determining a cost matrix of each detection frame feature and the target appearance feature according to the vehicle detection frame information, wherein the cost matrix characterizes similarity between features;
and associating the target motion trail with the detection frame according to the cost matrix by using a Hungary algorithm so as to update the target motion trail.
7. The vehicle target tracking method according to claim 1, wherein the correcting the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters, to obtain the corrected vehicle target tracking trajectory, comprises the steps of:
carrying out distortion transformation on each image pixel point of the monitoring video through the distortion coefficient of the camera parameter so as to project the image pixel point to a camera standardized plane;
projecting the image pixel points subjected to distortion transformation on the standardized plane to a pixel plane through an internal reference matrix of the camera parameters to obtain a pixel point conversion relation between a distorted pixel and an original image pixel on the pixel plane;
and changing the position coordinates of each vehicle in the vehicle target tracking track according to the pixel point conversion relation to obtain a corrected vehicle target tracking track.
8. A vehicle target tracking system, comprising:
the first module is used for acquiring a monitoring video;
the second module is used for sequentially inputting each frame of image of the monitoring video into the YOLOv7 detector to carry out target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and carries out global maximum pooling and global average pooling operation on the input feature images to obtain an output feature image fused with space-time attention features;
the third module is used for inputting each frame of image and the vehicle detection frame information into the deep start tracker to track the track, so as to obtain a vehicle target tracking track;
and a fourth module, configured to correct the vehicle position coordinates in the vehicle target tracking track according to the camera parameters, and obtain a corrected vehicle target tracking track.
9. A vehicle target tracking apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the vehicle object tracking method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium in which a processor-executable program is stored, characterized in that the processor-executable program is for realizing the vehicle object tracking method according to any one of claims 1 to 7 when executed by the processor.
CN202311595105.1A 2023-11-24 2023-11-24 Vehicle target tracking method, system, device and storage medium Pending CN117636267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311595105.1A CN117636267A (en) 2023-11-24 2023-11-24 Vehicle target tracking method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311595105.1A CN117636267A (en) 2023-11-24 2023-11-24 Vehicle target tracking method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN117636267A true CN117636267A (en) 2024-03-01

Family

ID=90015712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311595105.1A Pending CN117636267A (en) 2023-11-24 2023-11-24 Vehicle target tracking method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN117636267A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118506298A (en) * 2024-07-17 2024-08-16 江西锦路科技开发有限公司 Cross-camera vehicle track association method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118506298A (en) * 2024-07-17 2024-08-16 江西锦路科技开发有限公司 Cross-camera vehicle track association method

Similar Documents

Publication Publication Date Title
CN111127513B (en) Multi-target tracking method
CN108710885B (en) Target object detection method and device
CN107529650B (en) Closed loop detection method and device and computer equipment
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN114764868A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110119148A (en) A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium
CN113313763B (en) Monocular camera pose optimization method and device based on neural network
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN117636267A (en) Vehicle target tracking method, system, device and storage medium
CN111696133B (en) Real-time target tracking method and system
CN112418195A (en) Face key point detection method and device, electronic equipment and storage medium
CN114926766A (en) Identification method and device, equipment and computer readable storage medium
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN115063447A (en) Target animal motion tracking method based on video sequence and related equipment
CN117036397A (en) Multi-target tracking method based on fusion information association and camera motion compensation
CN115527050A (en) Image feature matching method, computer device and readable storage medium
CN114170269B (en) Multi-target tracking method, equipment and storage medium based on space-time correlation
CN116310993A (en) Target detection method, device, equipment and storage medium
CN115049731A (en) Visual mapping and positioning method based on binocular camera
CN113592706B (en) Method and device for adjusting homography matrix parameters
CN117765363A (en) Image anomaly detection method and system based on lightweight memory bank
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN116630367B (en) Target tracking method, device, electronic equipment and storage medium
CN113112479A (en) Progressive target detection method and device based on key block extraction
CN117372928A (en) Video target detection method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination