CN117636267A

CN117636267A - Vehicle target tracking method, system, device and storage medium

Info

Publication number: CN117636267A
Application number: CN202311595105.1A
Authority: CN
Inventors: 于德新; 袁梓珉; 张泽华; 初良勇; 刘晓佳; 吴新程; 王胪陈; 杨宇; 彭万里; 周会奇
Original assignee: Jimei University
Current assignee: Jimei University
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-03-01

Abstract

The invention discloses a vehicle target tracking method, a vehicle target tracking system, a vehicle target tracking device and a storage medium, and relates to the technical field of artificial intelligence. And sequentially inputting each frame of image of the monitoring video into a YOLOv7 detector to perform target detection to obtain vehicle detection frame information, wherein a convolution block attention module is arranged in an ELAN structure of a main network layer of the YOLOv7 detector, and performs global maximum pooling and global average pooling operation on the input feature images by the convolution block attention module to obtain an output feature image fusing space-time attention features, and the feature representation capability is improved by the convolution block attention module in the ELAN structure, so that the vehicle target detection accuracy of the detector is improved. Inputting the information of the vehicle detection frame into a deep Sort tracker for track tracking to obtain a vehicle target tracking track, correcting the vehicle position coordinates in the vehicle target tracking track according to camera parameters to obtain a corrected vehicle target tracking track, and improving the accuracy of vehicle tracking.

Description

Vehicle target tracking method, system, device and storage medium

技术领域Technical field

本发明涉及人工智能技术领域，尤其涉及一种车辆目标跟踪方法、系统、装置及存储介质。The present invention relates to the field of artificial intelligence technology, and in particular to a vehicle target tracking method, system, device and storage medium.

背景技术Background technique

随着智慧交通的高速发展，通过计算机视觉技术对车辆目标进行检测与跟踪的重要程度日益增长，该技术能够提供有关车辆的位置、轨迹、速度等信息。相关技术中，目标跟踪的方案主要采用基于检测和跟踪联合的方案。基于检测进行跟踪的方案是先对视频序列的每一帧进行目标检测，得到图像中的所有目标，然后转化为前后两帧之间的目标关联问题，通过IoU、外观等构建相似度矩阵，并通过匈牙利算法、贪婪算法等进行求解。但是，车辆目标跟踪过程采用的目标检测模型准确度和效率较低，影响轨迹识别，且由于采集的图像所反映的目标信息与真实目标存在差异，导致轨迹识别准确性低。With the rapid development of smart transportation, the detection and tracking of vehicle targets through computer vision technology is increasingly important. This technology can provide information about the location, trajectory, speed and other information of the vehicle. In related technologies, target tracking solutions mainly adopt solutions based on joint detection and tracking. The detection-based tracking solution is to first perform target detection on each frame of the video sequence to obtain all targets in the image, and then convert it into a target association problem between the two frames before and after, construct a similarity matrix through IoU, appearance, etc., and Solve through Hungarian algorithm, greedy algorithm, etc. However, the target detection model used in the vehicle target tracking process has low accuracy and efficiency, which affects trajectory recognition. Moreover, because the target information reflected in the collected images is different from the real target, the accuracy of trajectory recognition is low.

发明内容Contents of the invention

本发明旨在至少解决现有技术中存在的技术问题之一。为此，本发明提出一种车辆目标跟踪方法、系统、装置及存储介质，能够提高车辆轨迹识别的准确性。The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a vehicle target tracking method, system, device and storage medium, which can improve the accuracy of vehicle trajectory identification.

一方面，本发明实施例提供了一种车辆目标跟踪方法，包括以下步骤：On the one hand, embodiments of the present invention provide a vehicle target tracking method, including the following steps:

获取监控视频；Obtain surveillance video;

将监控视频的每帧图像依次输入YOLOv7检测器进行目标检测，得到车辆检测框信息，其中，所述YOLOv7检测器的主干网络层的ELAN结构中设置有卷积块注意力模块，所述卷积块注意力模块对输入的特征图进行全局最大池化和全局平均池化操作以获得融合时空注意力特征的输出特征图；Each frame of the surveillance video is input into the YOLOv7 detector in turn for target detection to obtain vehicle detection frame information. The ELAN structure of the backbone network layer of the YOLOv7 detector is provided with a convolution block attention module. The block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

将所述车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹；Input the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory;

根据相机参数校正所述车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹。The vehicle position coordinates in the vehicle target tracking trajectory are corrected according to the camera parameters to obtain a corrected vehicle target tracking trajectory.

根据本发明一些实施例，所述ELAN结构包括第一分支、第二分支、第三分支和第四分支，所述第一分支和所述第二分支均为1×1的卷积层，所述第三分支先后依次为1×1的卷积层和3×3的卷积层，所述第四分支先后依次为1×1的卷积层、3×3的卷积层和3×3的卷积层；所述第一分支、所述第二分支、所述第三分支和所述第四分支均连接至拼接模块。According to some embodiments of the present invention, the ELAN structure includes a first branch, a second branch, a third branch and a fourth branch, and both the first branch and the second branch are 1×1 convolution layers, so The third branch is a 1×1 convolution layer and a 3×3 convolution layer, and the fourth branch is a 1×1 convolution layer, a 3×3 convolution layer, and a 3×3 convolution layer. The convolutional layer; the first branch, the second branch, the third branch and the fourth branch are all connected to the splicing module.

根据本发明一些实施例，所述卷积块注意力模块设置在所述第一分支和所述拼接模块之间，所述卷积块注意力模块包括依次连接的通道注意力机制单元和空间注意力机制单元。According to some embodiments of the present invention, the convolution block attention module is provided between the first branch and the splicing module. The convolution block attention module includes a channel attention mechanism unit and a spatial attention unit connected in sequence. Force mechanism unit.

根据本发明一些实施例，所述通道注意力机制单元用于：According to some embodiments of the present invention, the channel attention mechanism unit is used for:

分别对通道注意力机制单元的输入特征在通道维度上进行池化，得到通道全局最大池化值和通道全局平均池化值；The input features of the channel attention mechanism unit are pooled in the channel dimension respectively, and the channel global maximum pooling value and the channel global average pooling value are obtained;

将所述通道全局最大池化值和所述通道全局平均池化值分别输入全连接神经网络后激活，得到第一通道特征和第二通道特征；The channel global maximum pooling value and the channel global average pooling value are respectively input into the fully connected neural network and then activated to obtain the first channel feature and the second channel feature;

将所述第一通道特征和所述第二通道特征进行相加操作后激活，得到通道注意力特征；The first channel feature and the second channel feature are added and activated to obtain the channel attention feature;

将所述通道注意力特征和输入通道注意力机制单元的输入特征进行乘法操作，得到通道注意力机制单元的输出特征；Perform a multiplication operation on the channel attention feature and the input feature of the input channel attention mechanism unit to obtain the output feature of the channel attention mechanism unit;

空间注意力机制单元用于：The spatial attention mechanism unit is used for:

分别对空间注意力机制单元的输入特征在空间维度上进行池化，得到空间全局最大池化特征和空间全局平均池化特征；The input features of the spatial attention mechanism unit are pooled in the spatial dimension respectively to obtain the spatial global maximum pooling features and the spatial global average pooling features;

拼接所述空间全局最大池化特征和所述空间全局平均池化特征后进行降维和激活操作，得到空间注意力特征；After splicing the spatial global maximum pooling features and the spatial global average pooling features, perform dimensionality reduction and activation operations to obtain spatial attention features;

将所述空间注意力特征和输入空间注意力机制单元的输入特征进行乘法操作，得到空间注意力机制单元的输出特征。The spatial attention feature and the input feature of the spatial attention mechanism unit are multiplied to obtain the output feature of the spatial attention mechanism unit.

根据本发明一些实施例，所述YOLOv7检测器通过深度学习算法训练得到，所述YOLOv7检测器训练过程中的模型损失通过以下步骤得到：According to some embodiments of the present invention, the YOLOv7 detector is trained through a deep learning algorithm, and the model loss during the training process of the YOLOv7 detector is obtained through the following steps:

根据YOLOv7检测器输出的预测框和真实框的重叠程度，确定交并比损失；According to the degree of overlap between the predicted box output by the YOLOv7 detector and the real box, the intersection ratio loss is determined;

根据YOLOv7检测器输出的预测框和真实框的距离度量确定损失权重；Determine the loss weight based on the distance measure between the predicted box and the real box output by the YOLOv7 detector;

基于注意力机制，根据所述损失权重和所述交并比损失，确定模型损失。Based on the attention mechanism, the model loss is determined based on the loss weight and the intersection-union ratio loss.

根据本发明一些实施例，所述将所述车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹，包括以下步骤：According to some embodiments of the present invention, inputting the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory includes the following steps:

通过卡尔曼滤波算法，根据当前已输入视频提取的目标运动轨迹，预测下一帧的目标位置；Through the Kalman filter algorithm, the target position of the next frame is predicted based on the target motion trajectory extracted from the current input video;

通过特征提取网络，根据所述目标位置提取对应的目标外观特征；Through the feature extraction network, the corresponding target appearance features are extracted according to the target position;

根据所述车辆检测框信息确定每个检测框特征与所述目标外观特征的代价矩阵，其中，所述代价矩阵表征特征之间的相似度；Determine a cost matrix for each detection frame feature and the target appearance feature according to the vehicle detection frame information, where the cost matrix represents the similarity between features;

通过匈牙利算法，根据所述代价矩阵关联目标运动轨迹和检测框，以更新所述目标运动轨迹。Through the Hungarian algorithm, the target motion trajectory and the detection frame are associated according to the cost matrix to update the target motion trajectory.

根据本发明一些实施例，所述根据相机参数校正所述车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹，包括以下步骤：According to some embodiments of the present invention, correcting the vehicle position coordinates in the vehicle target tracking trajectory according to camera parameters to obtain the corrected vehicle target tracking trajectory includes the following steps:

通过所述相机参数的畸变系数，将监控视频的每个图像像素点进行畸变变换以投影至相机标准化平面；Using the distortion coefficient of the camera parameters, each image pixel of the surveillance video is subjected to distortion transformation to project to the camera normalized plane;

通过所述相机参数的内参矩阵，将所述标准化平面上畸变变换后的所述图像像素点投影到像素平面，得到像素平面上畸变像素与原图像素的像素点转换关系；Project the image pixels after distortion transformation on the standardized plane to the pixel plane through the internal parameter matrix of the camera parameters, and obtain the pixel conversion relationship between the distorted pixels and the original image pixels on the pixel plane;

根据所述像素点转换关系更改所述车辆目标跟踪轨迹中的各个车辆位置坐标，得到校正后的车辆目标跟踪轨迹。Each vehicle position coordinate in the vehicle target tracking trajectory is modified according to the pixel point conversion relationship to obtain a corrected vehicle target tracking trajectory.

另一方面，本发明实施例还提供一种车辆目标跟踪系统，包括：On the other hand, embodiments of the present invention also provide a vehicle target tracking system, including:

第一模块，用于获取监控视频；The first module is used to obtain surveillance video;

第二模块，用于将监控视频的每帧图像依次输入YOLOv7检测器进行目标检测，得到车辆检测框信息，其中，所述YOLOv7检测器的主干网络层的ELAN结构中设置有卷积块注意力模块，所述卷积块注意力模块对输入的特征图进行全局最大池化和全局平均池化操作以获得融合时空注意力特征的输出特征图；The second module is used to input each frame of the surveillance video into the YOLOv7 detector in sequence for target detection to obtain vehicle detection frame information. The ELAN structure of the backbone network layer of the YOLOv7 detector is provided with a convolution block attention. Module, the convolution block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

第三模块，用于将每帧图像和所述车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹；The third module is used to input each frame image and the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory;

第四模块，用于根据相机参数校正所述车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹。The fourth module is used to correct the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory.

另一方面，本发明实施例还提供一种车辆目标跟踪装置，包括：On the other hand, embodiments of the present invention also provide a vehicle target tracking device, including:

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；At least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行，使得至少一个所述处理器实现如前面所述的车辆目标跟踪方法。When the at least one program is executed by the at least one processor, at least one of the processors implements the vehicle target tracking method as described above.

另一方面，本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机可执行指令，所述计算机可执行指令用于使计算机执行如前面所述的车辆目标跟踪方法。On the other hand, embodiments of the present invention also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to cause the computer to execute the vehicle target as described above. Tracking methods.

本发明上述的技术方案至少具有如下优点或有益效果之一：将监控视频的每帧图像依次输入YOLOv7检测器进行目标检测，得到车辆检测框信息，YOLOv7检测器的主干网络层的ELAN结构中设置有卷积块注意力模块，卷积块注意力模块对输入的特征图进行全局最大池化和全局平均池化操作以获得融合时空注意力特征的输出特征图，通过ELAN结构中的卷积块注意力模块来提高特征的表示能力，从而提高YOLOv7检测器的车辆目标检测准确性。然后将车辆检测框信息输入DeepSort跟踪器进行轨迹追踪得到车辆目标跟踪轨迹，再根据相机参数校正车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹，提高了车辆跟踪的准确性。The above technical solution of the present invention has at least one of the following advantages or beneficial effects: Each frame of the surveillance video is input into the YOLOv7 detector in sequence for target detection to obtain vehicle detection frame information. The YOLOv7 detector is set in the ELAN structure of the backbone network layer. There is a convolution block attention module. The convolution block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that integrates spatiotemporal attention features. Through the convolution block in the ELAN structure The attention module is used to improve the representation ability of features, thereby improving the vehicle target detection accuracy of the YOLOv7 detector. Then input the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory, and then correct the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory, which improves the accuracy of vehicle tracking. .

附图说明Description of drawings

图1是本发明实施例提供的车辆目标跟踪方法流程图；Figure 1 is a flow chart of a vehicle target tracking method provided by an embodiment of the present invention;

图2是本发明实施例提供的ELAN结构示意图；Figure 2 is a schematic structural diagram of an ELAN provided by an embodiment of the present invention;

图3是本发明实施例提供的卷积块注意力模块在ELAN结构中位置示意图；Figure 3 is a schematic diagram of the position of the convolution block attention module in the ELAN structure provided by the embodiment of the present invention;

图4是本发明实施例提供的车辆目标跟踪装置结构示意图。Figure 4 is a schematic structural diagram of a vehicle target tracking device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或者类似的标号表示相同或者类似的原件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be understood as limiting the present invention.

在本发明的描述中，需要理解的是，涉及到方位描述，例如上、下、左、右等指示的方位或者位置关系为基于附图所示的方位或者位置关系，仅是为了便于描述本发明和简化描述，而不是指示或者暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the orientation descriptions involved, such as the orientations or positional relationships indicated by up, down, left, right, etc. are based on the orientations or positional relationships shown in the drawings, and are only for the convenience of describing the present invention. The invention and simplified description are not intended to indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operate in a specific orientation, and therefore are not to be construed as limitations of the present invention.

本发明的描述中，如果有描述到第一、第二等只是用于区分技术特征为目的，而不能理解为指示或者暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, if the first, second, etc. are described, they are only used for the purpose of distinguishing technical features, and cannot be understood as indicating or implying the relative importance or implicitly indicating the number or implicit indication of the indicated technical features. The sequence relationship of the indicated technical features.

本发明实施例提供了一种车辆目标跟踪方法、系统、装置及存储介质，本发明实施例中的车辆目标跟踪方法，可应用于终端中，也可应用于服务器中，还可以是运行于终端或服务器中的软件等。终端可以是平板电脑、笔记本电脑、台式计算机等，但并不局限于此。服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。Embodiments of the present invention provide a vehicle target tracking method, system, device and storage medium. The vehicle target tracking method in the embodiment of the present invention can be applied to terminals, can also be applied to servers, and can also be run on terminals. Or the software in the server, etc. The terminal can be a tablet computer, a laptop computer, a desktop computer, etc., but is not limited thereto. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers for basic cloud computing services such as software services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

参照图1，本发明实施例的车辆目标跟踪方法包括但不限于步骤S110、步骤S120、步骤S130和步骤S140。Referring to Figure 1, the vehicle target tracking method according to the embodiment of the present invention includes but is not limited to step S110, step S120, step S130 and step S140.

步骤S110，获取监控视频；Step S110, obtain surveillance video;

步骤S120，将监控视频的每帧图像依次输入YOLOv7检测器进行目标检测，得到车辆检测框信息，其中，YOLOv7检测器的主干网络层的ELAN结构中设置有卷积块注意力模块，卷积块注意力模块对输入的特征图进行全局最大池化和全局平均池化操作以获得融合时空注意力特征的输出特征图；Step S120, input each frame of the surveillance video into the YOLOv7 detector in turn for target detection to obtain vehicle detection frame information. Among them, the ELAN structure of the backbone network layer of the YOLOv7 detector is equipped with a convolution block attention module. The convolution block The attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

步骤S130，将车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹；Step S130, input the vehicle detection frame information into the DeepSort tracker for trajectory tracking, and obtain the vehicle target tracking trajectory;

步骤S140，根据相机参数校正车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹。Step S140: Correct the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory.

在步骤S120一些实施例中，YOLOv7是YOLO模型(You Only Look Once:Unified,Real-Time Object Detection基于单个神经网络的目标检测系统)的一种版本。YOLO模型是计算机视觉技术中可以用于图像识别的深度学习算法。YOLO模型将目标检测问题转化成一个Regression回归类的问题，即给定输入图像，直接在图像的多个位置上回归出目标的bounding box(边界框)以及其分类类别。YOLO模型包括但不限于Yolov3、Yolov4、Yolov5、Yolov7(均为YOLO的不同版本)等，不同模型的权重、网络结构图、算法不同，所使用的区域采样方法也不同。In some embodiments of step S120, YOLOv7 is a version of the YOLO model (You Only Look Once: Unified, Real-Time Object Detection, an object detection system based on a single neural network). The YOLO model is a deep learning algorithm that can be used for image recognition in computer vision technology. The YOLO model transforms the target detection problem into a Regression regression problem, that is, given an input image, it directly regresses the target's bounding box (bounding box) and its classification category at multiple locations in the image. YOLO models include but are not limited to Yolov3, Yolov4, Yolov5, Yolov7 (all different versions of YOLO), etc. Different models have different weights, network structure diagrams, and algorithms, and the regional sampling methods used are also different.

yolov7检测器整体的网络架构主要由三个部分组成：input(输入层)，backbone(主干网络层)和head(头部层)，backbone用于提取特征，head用于预测。yolov7检测器先对输入的图片预处理后输入到backbone网络中，根据backbone网络中的三层输出，在head层通过backbone网络继续输出三层不同size大小的特征图，经过RepVGG block和conv，对图像检测的三类任务(分类、前后背景分类、边框)预测，输出最后的结果。其中，yolov7的backbone层由若干CBS模块、ELAN模块以及MP模块组成，CBS模块主要用于进行卷积操作；ELAN模块是一个高效的网络结构，它通过控制最短和最长的梯度路径，使网络能够学习到更多的特征，并且具有更强的鲁棒性；MP模块主要用于进行下采样操作。The overall network architecture of the yolov7 detector mainly consists of three parts: input (input layer), backbone (backbone network layer) and head (head layer). The backbone is used to extract features, and the head is used for prediction. The yolov7 detector first preprocesses the input image and then inputs it into the backbone network. According to the three-layer output in the backbone network, the head layer continues to output three layers of feature maps of different sizes through the backbone network. After RepVGG block and conv, the Predict the three types of image detection tasks (classification, front and rear background classification, and border) and output the final results. Among them, the backbone layer of yolov7 consists of several CBS modules, ELAN modules and MP modules. The CBS module is mainly used for convolution operations; the ELAN module is an efficient network structure that controls the shortest and longest gradient paths to make the network It can learn more features and has stronger robustness; the MP module is mainly used for downsampling operations.

在本发明实施例中，参照图2，改进的ELAN结构包括第一分支、第二分支、第三分支和第四分支，第一分支和第二分支均为1×1的卷积层，第三分支先后依次为1×1的卷积层和3×3的卷积层，第四分支先后依次为1×1的卷积层、3×3的卷积层和3×3的卷积层。第一分支、第二分支、第三分支和第四分支均连接至拼接模块，拼接模块中对各个分支的输出采用Concat函数进行拼接。具体地，修改YOLOv7检测器的训练配置文件yolov7.yaml，修改backbone部分，删除第7、8、20、21、33、34、46、47层共8个卷积层，并将第10、23、36、49层共四个Concat层修改为[[-1,-2,-3,-4],1,Concat,[1]]。In the embodiment of the present invention, referring to Figure 2, the improved ELAN structure includes a first branch, a second branch, a third branch and a fourth branch. The first branch and the second branch are both 1×1 convolution layers. The three branches are 1×1 convolution layer and 3×3 convolution layer, and the fourth branch is 1×1 convolution layer, 3×3 convolution layer and 3×3 convolution layer. . The first branch, the second branch, the third branch and the fourth branch are all connected to the splicing module. In the splicing module, the output of each branch is spliced using the Concat function. Specifically, modify the training configuration file yolov7.yaml of the YOLOv7 detector, modify the backbone part, delete a total of 8 convolutional layers, layers 7, 8, 20, 21, 33, 34, 46, and 47, and replace the 10th and 23rd convolutional layers. A total of four Concat layers, layers 36 and 49, are modified to [[-1,-2,-3,-4],1,Concat,[1]].

进一步地，参照图3，ELAN结构中加入改进的卷积块注意力模块(CBAM)，卷积块注意力模块串联在优化的ELAN结构中第一个卷积模块(即第一分支)之后，卷积块注意力模块基于注意力机制对卷积模块输出的特征进行池化操作，以使模型加强重要的特征表示，提高模型的目标检测精度。卷积块注意力模块包括依次连接的通道注意力机制单元和空间注意力机制单元，通道注意力机制单元和空间注意力机制单元的工作过程分别如下：Further, referring to Figure 3, an improved convolution block attention module (CBAM) is added to the ELAN structure. The convolution block attention module is connected in series after the first convolution module (i.e., the first branch) in the optimized ELAN structure. The convolution block attention module performs a pooling operation on the features output by the convolution module based on the attention mechanism, so that the model can strengthen the representation of important features and improve the target detection accuracy of the model. The convolutional block attention module includes a channel attention mechanism unit and a spatial attention mechanism unit that are connected in sequence. The working processes of the channel attention mechanism unit and the spatial attention mechanism unit are as follows:

通道注意力机制单元：通过全局最大池化层和全局平均池化层分别对输入特征F(即第一分支的输出特征)在通道维度上进行池化，得到通道全局最大池化值和通道全局平均池化值；将通道全局最大池化值和通道全局平均池化值分别输入一层全连接神经网络并经过Relu激活函数进行激活，得到第一通道特征和第二通道特征；将第一通道特征和第二通道特征进行基于element-wise的加和操作，再经过sigmoid激活操作，得到通道注意力特征M_c；将通道注意力特征M_c和输入通道注意力机制单元的输入特征F进行element-wise乘法操作，得到通道注意力机制单元的输出特征F′。Channel attention mechanism unit: The input feature F (that is, the output feature of the first branch) is pooled in the channel dimension through the global maximum pooling layer and the global average pooling layer to obtain the channel global maximum pooling value and the channel global Average pooling value; the channel global maximum pooling value and the channel global average pooling value are respectively input into a layer of fully connected neural network and activated through the Relu activation function to obtain the first channel feature and the second channel feature; the first channel The feature and the second channel feature are element-wise based on the addition operation, and then through the sigmoid activation operation, the channel attention feature M _c is obtained; the channel attention feature M _c and the input feature F of the input channel attention mechanism unit are element-wise -wise multiplication operation to obtain the output feature F′ of the channel attention mechanism unit.

空间注意力机制单元：分别对空间注意力机制单元的输入特征(即注意力机制单元的输出特征F′)在空间维度上进行全局最大池化和全局平均池化，得到空间全局最大池化特征和空间全局平均池化特征；拼接空间全局最大池化特征和空间全局平均池化特征后经过卷积层降维至1个通道，然后经过sigmoid激活，得到空间注意力特征；将空间注意力特征M_s和输入空间注意力机制单元的输入特征F′进行element-wise乘法操作，得到空间注意力机制单元的输出特征，即融合时空注意力特征的输出特征图。Spatial attention mechanism unit: perform global maximum pooling and global average pooling on the spatial dimension of the input features of the spatial attention mechanism unit (i.e., the output features F′ of the attention mechanism unit) to obtain the spatial global maximum pooling features. and spatial global average pooling features; after splicing the spatial global maximum pooling features and the spatial global average pooling features, the dimensionality is reduced to 1 channel through the convolution layer, and then through sigmoid activation, the spatial attention features are obtained; the spatial attention features are M _s and the input feature F′ of the input spatial attention mechanism unit perform element-wise multiplication operations to obtain the output features of the spatial attention mechanism unit, that is, the output feature map that fuses spatiotemporal attention features.

在一些实施例中，YOLOv7检测器通过深度学习算法训练得到，在YOLOv7检测器训练过程中，需要根据YOLOv7检测器输出的预测框和真实框计算模型损失，在根据模型损失不断更新YOLOv7检测器，直到YOLOv7检测器的输出达到预设精度为止。具体地，YOLOv7检测器训练过程中的采用Wise-IoU损失函数进行计算，具体如下：In some embodiments, the YOLOv7 detector is trained by a deep learning algorithm. During the training process of the YOLOv7 detector, it is necessary to calculate the model loss based on the predicted box and the real box output by the YOLOv7 detector, and continuously update the YOLOv7 detector based on the model loss. Until the output of the YOLOv7 detector reaches the preset accuracy. Specifically, the Wise-IoU loss function is used for calculation during the YOLOv7 detector training process, as follows:

根据YOLOv7检测器输出的预测框和真实框的重叠程度，确定交并比损失，交并比损失定义为：According to the degree of overlap between the predicted box output by the YOLOv7 detector and the real box, the intersection and union ratio loss is determined. The intersection and union ratio loss is defined as:

L_IoU＝1-IOU；L _IoU = 1-IOU;

其中，IoU表示预测框与真实框的交并比。Among them, IoU represents the intersection and union ratio of the predicted box and the real box.

根据YOLOv7检测器输出的预测框和真实框的距离度量确定损失权重，损失权重定义为：The loss weight is determined based on the distance metric between the predicted box and the real box output by the YOLOv7 detector. The loss weight is defined as:

其中，x、y为预测框中心点坐标，x_gt、y_gt为真实框中心点坐标，W_g、H_g表示预测框与真实框总和区域最大宽度和高度，exp(·)表示指数操作。Among them, x and y are the coordinates of the center point of the prediction box, x _gt and y _gt are the coordinates of the center point of the real box, W _g and H _g represent the maximum width and height of the sum of the prediction box and the real box, and exp(·) represents the exponential operation.

基于注意力机制，根据损失权重和所述交并比损失，确定模型损失，模型损失表示如下：Based on the attention mechanism, the model loss is determined based on the loss weight and the intersection-union ratio loss. The model loss is expressed as follows:

L_WIoU＝R_WIoUL_IoU； _LWIoU = _RWIoULIoU _;

进一步地，训练结束后，根据目标数据集调整超参数，具体可以为选用evolve选项再次训练数据集，训练300轮，每轮10代，寻找最优超参数配置。Furthermore, after training, adjust the hyperparameters according to the target data set. Specifically, you can select the evolve option to train the data set again, train for 300 rounds, 10 generations per round, and find the optimal hyperparameter configuration.

在步骤步骤S130的一些实施例中，DeepSort跟踪器是将目标检测和目标跟踪两个任务相结合的算法。在每一帧中检测出目标物体的位置和边界框后，通过深度学习模型(如CNN)提取目标的特征表示，将每个目标与先前帧中已跟踪的目标进行匹配。匹配过程中会考虑目标的特征相似度、运动一致性等因素，以确定目标的身份和轨迹。In some embodiments of step S130, the DeepSort tracker is an algorithm that combines the two tasks of target detection and target tracking. After detecting the position and bounding box of the target object in each frame, the feature representation of the target is extracted through a deep learning model (such as CNN), and each target is matched with the target that has been tracked in the previous frame. During the matching process, factors such as target feature similarity and motion consistency will be considered to determine the target's identity and trajectory.

具体地，在步骤S130中，将所述车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹这一步骤，包括但不限于以下步骤：Specifically, in step S130, the step of inputting the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory includes but is not limited to the following steps:

步骤S210，通过卡尔曼滤波算法，根据当前已输入视频提取的目标运动轨迹，预测下一帧的目标位置，其中，目标运动轨迹包括不确定态轨迹和确定态轨迹；Step S210, use the Kalman filter algorithm to predict the target position of the next frame based on the target motion trajectory extracted from the currently input video, where the target motion trajectory includes an uncertain state trajectory and a definite state trajectory;

步骤S220，通过特征提取网络，根据目标位置提取对应的目标外观特征；Step S220: Extract corresponding target appearance features according to the target location through the feature extraction network;

步骤S230，根据车辆检测框信息确定每个检测框特征与目标外观特征的代价矩阵，其中，代价矩阵表征特征之间的相似度；Step S230, determine the cost matrix of each detection frame feature and the target appearance feature based on the vehicle detection frame information, where the cost matrix represents the similarity between the features;

步骤S240，通过匈牙利算法，根据代价矩阵关联目标运动轨迹和检测框，以更新目标运动轨迹。Step S240, use the Hungarian algorithm to associate the target motion trajectory and the detection frame according to the cost matrix to update the target motion trajectory.

在本实施例中，利用本发明实施例的YOLOv7检测器对监控视频中的车辆目标进行逐帧识别，得到车辆目标的车辆检测框信息，车辆检测框信息包括但不限于坐标信息、类别、置信度和图像特征等，将车辆目标的车辆检测框信息输入DeepSort跟踪器。在DeepSort跟踪器中，需要根据第一帧检测的结果创建第一批不确定态轨迹，然后通过检测框与不确定态轨迹的关联匹配结果，后续更新得到一批确定态轨迹，DeepSort跟踪器过程如下：In this embodiment, the YOLOv7 detector of the embodiment of the present invention is used to identify vehicle targets in the surveillance video frame by frame to obtain the vehicle detection frame information of the vehicle target. The vehicle detection frame information includes but is not limited to coordinate information, category, and confidence. degree and image features, etc., and input the vehicle detection frame information of the vehicle target into the DeepSort tracker. In the DeepSort tracker, it is necessary to create the first batch of uncertain state trajectories based on the results of the first frame detection, and then through the correlation matching results between the detection frame and the uncertain state trajectory, and subsequent updates to obtain a batch of definite state trajectories. The DeepSort tracker process as follows:

S1、将当前帧的检测框与根据上一帧的目标运动轨迹进行卡尔曼滤波预测所得的目标轨迹框进行IOU匹配并计算代价矩阵，将代价矩阵输入匈牙利算法，得到三种匹配结果，第一种为轨迹失配，则删除其中失配达到预测次数(如30次)的确定态轨迹和不确定态轨迹；第二种为目标失配，则创建新轨迹；第三种为轨迹匹配，则说明追踪成功，通过卡尔曼滤波更新目标对应的轨迹变量，重复当前步骤直至出现确定态的轨迹框或视频帧结束。S1. Perform IOU matching between the detection frame of the current frame and the target trajectory frame predicted by Kalman filtering based on the target motion trajectory of the previous frame and calculate the cost matrix. Enter the cost matrix into the Hungarian algorithm to obtain three matching results. First If the first type is trajectory mismatch, then delete the determined and uncertain trajectories in which the mismatch reaches the predicted number of times (such as 30 times); the second type is target mismatch, then create a new trajectory; the third type is trajectory matching, then Indicates that the tracking is successful, update the trajectory variables corresponding to the target through Kalman filtering, and repeat the current step until a definite trajectory frame appears or the video frame ends.

S2、通过卡尔曼滤波预测得确定态轨迹和不确定态轨迹，将确定态的轨迹和目标检测框进行级联匹配，得到三种匹配结果，第一种为轨迹匹配，则通过卡尔曼滤波更新；第二种是轨迹失配；第三种结果是目标失配，此时将之前的不确定态的轨迹、失配的轨迹和失配的目标进行IOU匹配，再通过匹配结果计算其代价矩阵。S2. Predict the definite state trajectory and the uncertain state trajectory through Kalman filtering. Perform cascade matching between the definite state trajectory and the target detection frame to obtain three matching results. The first one is trajectory matching, which is updated through Kalman filtering. ; The second result is trajectory mismatch; the third result is target mismatch. At this time, the previous uncertain trajectory, mismatched trajectory and mismatched target are IOU matched, and then the cost matrix is calculated through the matching results. .

S3、将代价矩阵输入匈牙利算法，得到三种匹配结果，第一种为轨迹失配，则删除其中失配达到30次的确定态轨迹和不确定态轨迹，第二种为目标失配，则创建新轨迹，第三种为轨迹匹配，则说明追踪成功，通过卡尔曼滤波更新目标对应的轨迹变量。S3. Input the cost matrix into the Hungarian algorithm and obtain three matching results. The first one is trajectory mismatch, then delete the definite state trajectory and uncertain state trajectory where the mismatch reaches 30 times. The second one is the target mismatch, then Create a new trajectory. The third type is trajectory matching, which means the tracking is successful. The trajectory variables corresponding to the target are updated through Kalman filtering.

S4、重复步骤S3，直至视频帧结束，获得车辆目标位置及车辆目标轨迹。S4. Repeat step S3 until the end of the video frame to obtain the vehicle target position and vehicle target trajectory.

根据本发明一些实施例，在步骤S140中，根据相机参数校正车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹这一步骤，包括但不限于以下步骤：According to some embodiments of the present invention, in step S140, the step of correcting the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory includes but is not limited to the following steps:

步骤S310，通过相机参数的畸变系数，将监控视频的每个图像像素点进行畸变变换以投影至相机标准化平面；Step S310, use the distortion coefficient of the camera parameters to perform distortion transformation on each image pixel of the surveillance video to project it to the camera standardized plane;

步骤S320，通过相机参数的内参矩阵，将标准化平面上畸变变换后的图像像素点投影到像素平面，得到像素平面上畸变像素与原图像素的像素点转换关系；Step S320: Project the distorted image pixels on the standardized plane to the pixel plane through the internal parameter matrix of the camera parameters to obtain the pixel conversion relationship between the distorted pixels and the original image pixels on the pixel plane;

步骤S330，根据像素点转换关系更改车辆目标跟踪轨迹中的各个车辆位置坐标，得到校正后的车辆目标跟踪轨迹。Step S330: Modify each vehicle position coordinate in the vehicle target tracking trajectory according to the pixel point conversion relationship to obtain a corrected vehicle target tracking trajectory.

在本实施例中，获取摄像机的内参矩阵和畸变参数以及监控视频的像素大小，然后将视频图像上像素点投影至相机normalize平面，在normalize平面上进行畸变变换，再将畸变变换后的点投影回像素平面，得到畸变像素点与原图像素点的对应转换关系，重复上述过程直至获取所有像素点转换关系，根据像素点转换关系更改DeepSort跟踪器输出的轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹。In this embodiment, the internal parameter matrix and distortion parameters of the camera and the pixel size of the surveillance video are obtained, and then the pixel points on the video image are projected to the normalize plane of the camera, distortion transformation is performed on the normalize plane, and then the distorted points are projected Return to the pixel plane, obtain the corresponding conversion relationship between the distorted pixels and the original image pixels, repeat the above process until all pixel conversion relationships are obtained, and change the vehicle position coordinates in the trajectory output by the DeepSort tracker according to the pixel conversion relationship, and obtain the corrected vehicle target tracking trajectory.

根据本发明一些实施例，通过改进的YOLOv7检测器采用优化的ELAN结构替换原始YOLOv7网络架构中的ELAN结构，在优化的ELAN结构中加入改进的卷积注意力模块，采用Wise-IoU损失函数进行模型训练，并根据数据集评估并调优训练模型的超参数，提高监控视角下车辆目标跟踪的准确率。通过对跟踪轨迹坐标的去畸变处理，提高车辆目标运动轨迹提取的精度。According to some embodiments of the present invention, the improved YOLOv7 detector uses an optimized ELAN structure to replace the ELAN structure in the original YOLOv7 network architecture, adds an improved convolution attention module to the optimized ELAN structure, and uses the Wise-IoU loss function. Model training, and evaluating and tuning the hyperparameters of the training model based on the data set to improve the accuracy of vehicle target tracking from the surveillance perspective. By de-distorting the tracking trajectory coordinates, the accuracy of vehicle target motion trajectory extraction is improved.

本发明实施例还提供一种车辆目标跟踪系统，包括：An embodiment of the present invention also provides a vehicle target tracking system, including:

第二模块，用于将监控视频的每帧图像依次输入YOLOv7检测器进行目标检测，得到车辆检测框信息，其中，YOLOv7检测器的主干网络层的ELAN结构中设置有卷积块注意力模块，卷积块注意力模块对输入的特征图进行全局最大池化和全局平均池化操作以获得融合时空注意力特征的输出特征图；The second module is used to input each frame of the surveillance video into the YOLOv7 detector in sequence for target detection to obtain vehicle detection frame information. Among them, the ELAN structure of the backbone network layer of the YOLOv7 detector is equipped with a convolution block attention module. The convolution block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

第三模块，用于将每帧图像和车辆检测框信息输入DeepSort跟踪器进行轨迹追踪，得到车辆目标跟踪轨迹；The third module is used to input each frame image and vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory;

第四模块，用于根据相机参数校正车辆目标跟踪轨迹中的车辆位置坐标，得到校正后的车辆目标跟踪轨迹。The fourth module is used to correct the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory.

可以理解的是，上述车辆目标跟踪方法实施例中的内容均适用于本系统实施例中，本系统实施例所具体实现的功能与上述车辆目标跟踪方法实施例相同，并且达到的有益效果与上述车辆目标跟踪方法实施例所达到的有益效果也相同。It can be understood that the contents in the above vehicle target tracking method embodiment are applicable to this system embodiment. The functions implemented by this system embodiment are the same as those in the above vehicle target tracking method embodiment, and the beneficial effects achieved are the same as those mentioned above. The beneficial effects achieved by the embodiments of the vehicle target tracking method are also the same.

参照图4，图4是本发明一个实施例提供的车辆目标跟踪装置的示意图。本发明实施例的车辆目标跟踪装置包括一个或多个控制处理器和存储器，图4中以一个控制处理器及一个存储器为例。Referring to Figure 4, Figure 4 is a schematic diagram of a vehicle target tracking device provided by an embodiment of the present invention. The vehicle target tracking device according to the embodiment of the present invention includes one or more control processors and memories. In FIG. 4 , one control processor and one memory are taken as an example.

控制处理器和存储器可以通过总线或者其他方式连接，图4中以通过总线连接为例。The control processor and the memory can be connected through a bus or other means. Figure 4 takes the connection through a bus as an example.

存储器作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外，存储器可以包括高速随机存取存储器，还可以包括非暂态存储器，例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中，存储器可选包括相对于控制处理器远程设置的存储器，这些远程存储器可以通过网络连接至该车辆目标跟踪装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may optionally include memory located remotely relative to the control processor, and these remote memories may be connected to the vehicle target tracking device via a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.

本领域技术人员可以理解，图4中示出的装置结构并不构成对车辆目标跟踪装置的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the device structure shown in Figure 4 does not constitute a limitation on the vehicle target tracking device, and may include more or fewer components than shown, or combine certain components, or arrange different components. .

实现上述实施例中应用于车辆目标跟踪装置的车辆目标跟踪方法所需的非暂态软件程序以及指令存储在存储器中，当被控制处理器执行时，执行上述实施例中应用于车辆目标跟踪装置的车辆目标跟踪方法。The non-transient software programs and instructions required to implement the vehicle target tracking method applied to the vehicle target tracking device in the above embodiment are stored in the memory, and when executed by the control processor, execute the vehicle target tracking method applied to the vehicle target tracking device in the above embodiment. Vehicle target tracking method.

此外，本发明的一个实施例还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，该计算机可执行指令被一个或多个控制处理器执行，可使得上述一个或多个控制处理器执行上述方法实施例中的车辆目标跟踪方法。In addition, one embodiment of the present invention also provides a computer-readable storage medium, which stores computer-executable instructions. The computer-executable instructions are executed by one or more control processors, which can cause the above-mentioned One or more control processors execute the vehicle target tracking method in the above method embodiment.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

上面结合附图对本发明实施例作了详细说明，但是本发明不限于上述实施例，在所属技术领域普通技术人员所具备的知识范围内，还可以在不脱离本发明宗旨的前提下作出各种变化。The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those of ordinary skill in the art, various modifications can be made without departing from the purpose of the present invention. Variety.

Claims

1. A vehicle target tracking method, characterized in that it includes the following steps:

Obtain surveillance video;

Each frame of the surveillance video is input into the YOLOv7 detector in turn for target detection to obtain vehicle detection frame information. The ELAN structure of the backbone network layer of the YOLOv7 detector is provided with a convolution block attention module. The block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

Input the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory;

The vehicle position coordinates in the vehicle target tracking trajectory are corrected according to the camera parameters to obtain a corrected vehicle target tracking trajectory.

2. The vehicle target tracking method according to claim 1, wherein the ELAN structure includes a first branch, a second branch, a third branch and a fourth branch, the first branch and the second branch Both are 1×1 convolutional layers. The third branch is successively a 1×1 convolutional layer and a 3×3 convolutional layer. The fourth branch is successively a 1×1 convolutional layer and a 3×3 convolutional layer. 3×3 convolution layer and 3×3 convolution layer; the first branch, the second branch, the third branch and the fourth branch are all connected to the splicing module.

3. The vehicle target tracking method according to claim 2, wherein the convolution block attention module is provided between the first branch and the splicing module, and the convolution block attention module includes The channel attention mechanism unit and the spatial attention mechanism unit are connected in sequence.

4. The vehicle target tracking method according to claim 3, characterized in that the channel attention mechanism unit is used for:

The input features of the channel attention mechanism unit are pooled in the channel dimension respectively, and the channel global maximum pooling value and the channel global average pooling value are obtained;

The channel global maximum pooling value and the channel global average pooling value are respectively input into the fully connected neural network and then activated to obtain the first channel feature and the second channel feature;

The first channel feature and the second channel feature are added and activated to obtain the channel attention feature;

Perform a multiplication operation on the channel attention feature and the input feature of the input channel attention mechanism unit to obtain the output feature of the channel attention mechanism unit;

The spatial attention mechanism unit is used for:

The input features of the spatial attention mechanism unit are pooled in the spatial dimension respectively to obtain the spatial global maximum pooling features and the spatial global average pooling features;

After splicing the spatial global maximum pooling features and the spatial global average pooling features, perform dimensionality reduction and activation operations to obtain spatial attention features;

The spatial attention feature and the input feature of the spatial attention mechanism unit are multiplied to obtain the output feature of the spatial attention mechanism unit.

5. The vehicle target tracking method according to claim 1, characterized in that the YOLOv7 detector is obtained through deep learning algorithm training, and the model loss during the training process of the YOLOv7 detector is obtained through the following steps:

According to the degree of overlap between the predicted box output by the YOLOv7 detector and the real box, the intersection ratio loss is determined;

Determine the loss weight based on the distance measure between the predicted box and the real box output by the YOLOv7 detector;

Based on the attention mechanism, the model loss is determined based on the loss weight and the intersection-union ratio loss.

6. The vehicle target tracking method according to claim 1, characterized in that said inputting the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory includes the following steps:

Through the Kalman filter algorithm, the target position of the next frame is predicted based on the target motion trajectory extracted from the current input video;

Through the feature extraction network, the corresponding target appearance features are extracted according to the target position;

Determine a cost matrix for each detection frame feature and the target appearance feature according to the vehicle detection frame information, where the cost matrix represents the similarity between features;

Through the Hungarian algorithm, the target motion trajectory and the detection frame are associated according to the cost matrix to update the target motion trajectory.

7. The vehicle target tracking method according to claim 1, wherein correcting the vehicle position coordinates in the vehicle target tracking trajectory according to camera parameters to obtain the corrected vehicle target tracking trajectory includes the following steps:

Using the distortion coefficient of the camera parameters, each image pixel of the surveillance video is subjected to distortion transformation to project to the camera normalized plane;

Project the image pixels after distortion transformation on the standardized plane to the pixel plane through the internal parameter matrix of the camera parameters, and obtain the pixel conversion relationship between the distorted pixels and the original image pixels on the pixel plane;

Each vehicle position coordinate in the vehicle target tracking trajectory is modified according to the pixel point conversion relationship to obtain a corrected vehicle target tracking trajectory.

8. A vehicle target tracking system, characterized by including:

The first module is used to obtain surveillance video;

The second module is used to input each frame of the surveillance video into the YOLOv7 detector in sequence for target detection to obtain vehicle detection frame information. The ELAN structure of the backbone network layer of the YOLOv7 detector is provided with a convolution block attention. Module, the convolution block attention module performs global maximum pooling and global average pooling operations on the input feature map to obtain an output feature map that fuses spatiotemporal attention features;

The third module is used to input each frame of image and the vehicle detection frame information into the DeepSort tracker for trajectory tracking to obtain the vehicle target tracking trajectory;

The fourth module is used to correct the vehicle position coordinates in the vehicle target tracking trajectory according to the camera parameters to obtain the corrected vehicle target tracking trajectory.

9. A vehicle target tracking device, characterized in that it includes:

at least one processor;

At least one memory for storing at least one program;

When the at least one program is executed by the at least one processor, at least one of the processors implements the vehicle target tracking method according to any one of claims 1 to 7.

10. A computer-readable storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by the processor, is used to implement any of claims 1 to 7. The vehicle target tracking method described in one item.