CN118506298A

CN118506298A - Cross-camera vehicle track association method

Info

Publication number: CN118506298A
Application number: CN202410954197.6A
Authority: CN
Inventors: 沈阳; 陈震; 傅清丁; 罗阳; 胡静雅; 肖庆辉; 周路标; 王安安; 谭翊鑫; 吴相文
Original assignee: Jiangxi Kingroad Technology Development Co ltd
Current assignee: Jiangxi Kingroad Technology Development Co ltd
Priority date: 2024-07-17
Filing date: 2024-07-17
Publication date: 2024-08-16
Anticipated expiration: 2044-07-17
Also published as: CN118506298B

Abstract

The invention discloses a vehicle track association method crossing cameras, which comprises the following steps: establishing a world coordinate system, and unifying the coordinate systems of a plurality of continuous monitoring cameras in a highway traffic scene; screening tracks to be matched for cross-camera track association from vehicle tracks of a plurality of continuous monitoring cameras; predicting vehicle space information at the next moment of the track to be matched by adopting Gaussian process regression; extracting vehicle depth information after the prediction of the vehicle space information is completed so as to increase clues of data association; the vehicle space information and the vehicle depth information are used as clues of a multi-feature data association mechanism to complete cross-camera track association. According to the invention, a multi-feature data association mechanism is used for realizing cross-camera track association in a highway traffic scene, so that a long-time stable tracking track of a target vehicle under a plurality of monitoring cameras is obtained, and the accuracy and the tracking continuity of vehicle tracking can be improved.

Description

Cross-camera vehicle track association method

Technical Field

The invention relates to the technical field of data association, in particular to a vehicle track association method crossing cameras.

Background

In the current expressway traffic scene, the inventor of the patent at least finds that the following technical problems exist in the prior art in the process of realizing the technical method of the embodiment of the invention:

the monitoring range of the fixed monitoring cameras on the road surface is very limited, one monitoring camera can only cover a distance of about one hundred meters, track information obtained through single-camera vehicle tracking is very limited, the track information can only represent vehicle tracks in the monitoring range, track information between adjacent monitoring cameras is isolated from each other, effective association is not established, and once a vehicle exits the monitoring range, the vehicle tracks cannot be positioned.

In summary, the existing vehicle monitoring methods cannot meet the actual vehicle tracking requirements.

Disclosure of Invention

The embodiment of the invention provides a vehicle track association method across cameras, which is used for monitoring traffic conditions of various areas in a highway traffic scene by arranging a plurality of monitoring cameras to obtain more abundant road traffic running condition information, and the vehicle track association is used for associating tracks of the same target vehicle among adjacent monitoring cameras by using a certain rule so as to obtain continuous tracks of the same target vehicle under the plurality of monitoring cameras, thereby solving the problem that the existing vehicle monitoring method cannot meet the actual vehicle tracking requirement.

The embodiment of the invention provides a vehicle track association method crossing cameras, which comprises the following steps:

Unifying a coordinate system: establishing a world coordinate system, namely a coordinate system of a plurality of continuous monitoring cameras in an expressway traffic scene, wherein the origin of the world coordinate system is positioned right below a first monitoring camera and is attached to a road surface, an X axis is perpendicular to the direction of the expressway, a Y axis is along the direction of a lane, and a Z axis is perpendicular to an XY plane formed by the X axis and the Y axis;

screening tracks to be matched: screening tracks to be matched for cross-camera track association from vehicle tracks of a plurality of continuous monitoring cameras;

predicting vehicle space information: predicting vehicle space information at the next moment of the track to be matched by adopting Gaussian process regression;

Extracting vehicle depth information: extracting vehicle depth information after the prediction of the vehicle space information is completed so as to increase clues of data association;

A cross-camera track association step: and taking the vehicle space information and the vehicle depth information as clues of a multi-feature data association mechanism to complete cross-camera track association.

Optionally, the step of predicting vehicle spatial information specifically includes:

Representing a track to be matched as The tracks to be matchedJ tracking nodes are arranged on the space-time diagram, and the tracks to be matched are subjected to matchingConsidered as a gaussian processThe tracks to be matchedIs considered as a random variable conforming to a gaussian distribution, gaussian processExpressed as:

，

wherein, As a function of the mean value of the function,As a function of the kernel,And (3) withSuper parameters for kernel functions;

Nth trace node that will need to implement prediction Is defined asThe predicted vehicle space information at the next moment isThe following formula is obtained by a Bayesian formula:

；

trajectory to be matched The joint probability distribution between them isThen the next time the vehicle space informationIs given by the prior probability distribution of (2)The joint probability distribution satisfies the following equation:

，

In the method, in the process of the invention, ，So that the vehicle space information at the next timeThe method comprises the following steps:

。

Optionally, the vehicle depth information includes vehicle category information, vehicle color information, vehicle size information, track category information.

Optionally, the step of extracting the depth information of the vehicle specifically includes:

Adding a channel attention module and a space attention module into a residual error network, wherein the original features of the vehicle output by a network convolution layer are subjected to weight distribution through the channel attention module and then multiplied with the original features of the vehicle according to channels to obtain vehicle feature adjustment information;

and taking the vehicle characteristic adjustment information as input, carrying out weight distribution through the space attention module, and multiplying the vehicle characteristic adjustment information of the space attention module element by element to obtain finally output vehicle depth information.

Optionally, the process of obtaining the vehicle feature adjustment information by the channel attention module specifically includes:

The channel attention module performs weight distribution on each characteristic channel according to the original characteristics of the vehicle;

respectively pooling the original features of the vehicle through global maximum pooling and global average pooling;

mapping the pooled result through a shared multi-layer neural network;

The mapped feature vectors are activated by using a Sigmoid function after being added based on elements, and the channel attention output vector serving as vehicle feature adjustment information is obtained The calculation formula is as follows:

，

wherein F represents the original characteristics of the vehicle, Representing sigmoid functions, avgpool representing global average pooling computations, MLP representing multi-layer perceptron network architecture processing, maxpool representing global maximum pooling computations,Representing a first weight in the characteristic channel,Representing a second weight in the characteristic channel,A channel attention average representing the original characteristics of the vehicle,Channel attention maxima representing original features of the vehicle.

Optionally, the process of obtaining the vehicle depth information by the spatial attention module specifically includes:

the space attention module only focuses on image areas with certain correlation to tasks, and the vehicle characteristic adjustment information is respectively pooled through global maximum pooling and global average pooling of channel dimensions;

Stacking the obtained pooled results in channel dimension, and reducing dimension of the stacked results into a channel through a convolution layer;

activating the dimension-reduced result through a Sigmoid function to obtain a spatial attention output vector serving as vehicle depth information The calculation formula is as follows:

，

wherein, The sigmoid function is represented as a function,Representing a 7 x 7 convolution operation, maxpool representing a global max pooling calculation, avgpool representing a global average pooling calculation,A spatial attention mean value representing the original characteristics of the vehicle,A spatial attention maximum representing the original characteristics of the vehicle.

Optionally, the step of cross-camera track association specifically includes:

constructing a track matching cost matrix based on a Kalman filtering method;

performing preliminary screening on the track matching cost matrix by utilizing track category information corresponding to the track to be matched;

calculating the Euclidean distance of the track group to be matched;

calculating the minimum cosine distance of the track group to be matched;

And obtaining a final track distance based on the Euclidean distance and the minimum cosine distance so as to update the track matching cost matrix.

Optionally, the step of calculating the euclidean distance of the track group to be matched specifically includes:

Using the collection in the track library to be matched of the current monitoring camera Defining the set in the track library to be matched of the next monitoring cameraDefining the tracks to be matchedIs defined asTrajectory to be matchedIs defined asTrack group to be matchedIs the Euclidean distance of (2)The calculation formula of (2) is as follows:

。

Optionally, the step of calculating the minimum cosine distance of the track group to be matched specifically includes:

The tracks to be matched The target depth information of a tracking node in the track is defined as A, and the track to be matched is defined as AThe target depth information of a tracking node is defined as B, and the track group to be matched is thenLeast cosine distance of (2)The calculation formula is as follows:

。

optionally, the step of obtaining the final track distance based on the euclidean distance and the minimum cosine distance specifically includes:

At the Euclidean distance And the minimum cosine distanceIntroducing weights intoThe final track distance is calculated in combination, and the calculation formula is as follows:

。

One or more technical solutions provided in the embodiments of the present invention at least have the following technical effects or advantages:

According to the invention, a multi-feature data association mechanism is used for establishing association between tracks of the same target train between adjacent monitoring cameras, so that cross-camera track association under an expressway traffic scene is finally realized, long-time stable tracking tracks of target vehicles under a plurality of monitoring cameras are obtained, and the accuracy and the tracking continuity of vehicle tracking can be improved.

Drawings

FIG. 1 is a flow chart of a cross-camera vehicle track correlation method according to an embodiment of the invention.

Detailed Description

For a better understanding of the above-described cross-camera vehicle track association method, reference will be made to the following detailed description and accompanying drawings. It will be apparent that the described embodiments of the invention are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method for associating vehicle trajectories across cameras includes:

1) Unifying a coordinate system: and establishing a world coordinate system, namely a coordinate system of a plurality of continuous monitoring cameras in an expressway traffic scene, wherein the origin of the world coordinate system is positioned right below the first monitoring camera and is attached to a road surface, an X axis is perpendicular to the direction of the expressway, a Y axis is along the direction of a lane, and a Z axis is perpendicular to an XY plane formed by the X axis and the Y axis.

The vehicle track correlation method across cameras aims at utilizing the space information of a target vehicle to realize the determination of the global position of the target vehicle in a plurality of monitoring cameras. The key premise of realizing cross-camera track association is to realize the unification of a coordinate system in the expressway traffic scene under a plurality of calibrated continuous monitoring cameras.

Firstly, calibrating a plurality of continuous monitoring cameras by using a traditional camera calibration method, an active vision camera calibration method, a camera self-calibration method and other camera calibration methods. The monitoring camera is calibrated, and the correlation between the three-dimensional geometric position of a certain point on the surface of the space object and the corresponding point in the image can be determined. Based on the calibrated monitoring camera parameters, the three-dimensional scene can be reconstructed according to the acquired images.

After calibrating the monitoring camera, the coordinate system of the individual scene needs to be converted into a world coordinate system. By means of known camera parameters of three monitoring cameras and three scene parameters corresponding to the camera parameters. The origin of the world coordinate system is located at a position right below the monitoring camera 1 (i.e., the first monitoring camera) and is attached to the road surface; the X axis is perpendicular to the road direction; the Y axis is along the lane direction; the Z axis is perpendicular to an XY plane formed by the X axis and the Y axis. Since the origin of the world coordinate system is located at a position directly below the monitoring camera 1, a plurality of monitoring cameras such as the following monitoring camera n+1, the monitoring camera n+2 and the like can be incorporated into the world coordinate system one by one, and the corresponding world coordinates thereof can be obtained through calculation with the calibration results of the monitoring cameras. Once the plurality of monitoring cameras jointly establish the world coordinate system, the world coordinate value of the target vehicle in the unified world coordinate system, namely the position coordinate of the target vehicle in the real world, can be calculated according to the image coordinate of the target vehicle in any scene image.

2) Screening tracks to be matched: and screening tracks to be matched, which are to be correlated across camera tracks, from the vehicle tracks of the continuous monitoring cameras.

The vehicle track has a stable state and a vanishing state in the image, and in the cross-camera correlation, the track to be matched is screened from the vehicle tracks obtained by the single monitoring camera, and only when the vehicle track is converted into the stable state, a unique vehicle ID is allocated to the vehicle. Therefore, in the step of screening the trajectories to be matched, the vehicle trajectories of the monitoring camera n in the current frame converted into the vanishing state are stored in the trajectory library to be matched of the monitoring camera n, and the vehicle trajectories of the monitoring camera n+1 in the current frame converted into the steady state are stored in the trajectory library to be matched of the monitoring camera n+1.

3) Predicting vehicle space information: and predicting the vehicle space information at the next moment of the track to be matched by adopting Gaussian process regression.

It is assumed that there is one vehicle track (referred to as vehicle track 1) in the track library to be matched of the monitoring camera n, and the state is switched to the vanishing state at the kth frame. Meanwhile, in the track library to be matched of the monitoring camera n+1, there is a vehicle track (called a vehicle track 2), and the vehicle track appears in the field of view of the monitoring camera n+1 in the k+i frame, that is, the first tracking node of the vehicle track 2 appears in the k+i frame. In order to determine whether the vehicle track 1 and the vehicle track 2 belong to the same target vehicle, the movement of the vehicle track 1 is predicted by using a moving target track prediction method, and the position of the vehicle track 1 in the k+i frame is predicted. Specifically, gaussian process regression is selected for target track prediction, and the moving target vehicle behavior characteristics are mined by utilizing a plurality of historical data, so that the movement trend of the target vehicle is predicted, namely, the vehicle space information at the next moment of the track to be matched is predicted.

The step of predicting vehicle space information specifically comprises the following steps:

3.1 assuming that a track T to be matched exists in a track library to be matched of the monitoring camera n, and representing the track T to be matched as . The trajectory T to be matched with j tracking nodes on the space-time diagram can be regarded as a Gaussian process, and each tracking node in the trajectory to be matched can be regarded as a random variable conforming to Gaussian distribution. A series of random processes, which are constructed in a continuous domain, subject to gaussian distribution of random variables, are known as gaussian processes, and can be used to represent a function, a mean function and a covariance function, typically constitute a unique gaussian process. I.e.ThenAll subject to a multi-element gaussian distribution,As a gaussian process, this can be expressed as:

；

wherein, As a mean function of the gaussian process,Is a kernel function of the gaussian process and is used to measure the "distance" between any two tracking nodes. The mean function is determined to be 0, and the kernel function is taken as a core part, and is set as the most common gaussian kernel function when the problem is actually solved, and the formula can be expressed as follows:

；

In the method, in the process of the invention, And (3) withIs a hyper-parameter of the kernel function.

3.2 Nth tracking node that will need to implement predictionIs defined asThe predicted vehicle space information at the next moment isAfter the mean function and the kernel function of the gaussian process are determined, the following formula is obtained through a bayesian formula:

。

3.3 trajectory to be matched The joint probability distribution between them isThen the next time the vehicle space informationIs given by the prior probability distribution of (2)Any subset of dimensions for any one gaussian process is also gaussian-like, so that a joint probability distribution can be calculated that satisfies the following equation:

；

。

4) Extracting vehicle depth information: and extracting the depth information of the vehicle after the prediction of the spatial information of the vehicle is completed so as to increase clues of data association and describe the correlation among the vehicle tracks more deeply.

And extracting the depth information of the vehicle so as to further add the depth characteristics of the vehicle on the basis of the spatial information of the vehicle and increase the data relevance. The vehicle depth information refers to information characterizing appearance characteristics of the vehicle in addition to vehicle space information, and includes vehicle category information, vehicle color information, vehicle size information, and track category information.

The vehicle depth information extracting step specifically comprises the following steps:

4.1 a residual network is used as reference extraction network, which uses a "short-circuit" connection. For a neural network architecture, assume that when the network input is x, the network expects to learn characteristics that are . Whereas the residual network adds a "short" connection before the second layer activation function, the original features expected to be learned are represented byIs converted intoWhen (when)When in use, thenThis is an identity mapping. The expected learning residual is at this time. The residual network can be generalized to replicate the output of a shallow network and fuse the output of the deep network, and when the shallow network is trained,When the optimal condition is reached, the error becomes larger by continuing trainingThe network depth layer is just mapped with identity, the performance of the residual network is not worse than before, and the operation does not involve the situation that the calculation amount is greatly increased, but the aim of improving the network performance is achieved.

And 4.2, a channel attention module and a space attention module are added in the reference extraction network, so that the channel attention and the space attention are effectively combined together, and the local key area of the attention target vehicle is simply and efficiently improved. The method comprises the steps that after weight distribution is carried out on original features of a vehicle output by a network convolution layer through a channel attention module, the original features of the vehicle are multiplied by the original features of the vehicle according to channels, and vehicle feature adjustment information is obtained; and then taking the vehicle characteristic adjustment information as input, carrying out weight distribution through the spatial attention module, and multiplying the weight distribution by the vehicle characteristic adjustment information of the spatial attention module element by element to obtain finally output vehicle depth information. The channel attention module performs weight distribution on each characteristic channel according to the input original characteristics of the vehicle. After the original features of the vehicle are respectively subjected to global maximum pooling and global average pooling to obtain pooled results, mapping the pooled results through a shared multi-layer neural network, adding the mapped feature vectors based on elements, and then activating the feature vectors by using a sigmoid function to obtain a channel attention output vector serving as vehicle feature adjustment informationThe calculation formula is as follows:

；

The space attention is only focused on the image area with certain correlation to the task, the pooled results obtained by global maximum pooling and global average pooling of the vehicle characteristic adjustment information through the channel dimension are stacked in the channel dimension, the stacked results are reduced in dimension into a channel through a convolution layer, and the reduced-dimension results are activated by a sigmoid function to obtain the space attention output vectorThe calculation formula is as follows:

；

5) A cross-camera track association step: the vehicle space information and the vehicle depth information are used as clues of a multi-feature data association mechanism to complete cross-camera track association.

And (3) carrying out multi-feature data association, respectively quantifying the similarity of the adjacent monitoring camera target vehicles by using a spatial information measurement method and a depth information measurement method, calculating a final similarity updating track matching cost matrix after fusion, and completing the optimal matching of the targets by using a Hungary algorithm.

The cross-camera track association step specifically comprises the following steps:

5.1 based on a Kalman filtering method, matching a cost matrix by constructing a track. And rejecting the track group with unmatched vehicle category information, wherein the track group does not perform a subsequent cross-camera matching process. For collection in track library to be matched of current monitoring camera n Defining, wherein the collection in the n+1 track library to be matched of the monitoring camera is used forDefinition is performed. Initializing a track matching cost matrixTaking the track library to be matched of the monitoring camera n as a row of a track association cost matrix, taking the track library to be matched of the monitoring camera n+1 as a column of the track association cost matrix, and taking each element in the track association cost matrix as the total distance after the corresponding track pair fuses the space measurement and the depth measurement. All elements of the track association cost matrixThe matrix is set to 1 at initialization.

And 5.2, performing preliminary screening on the track matching cost matrix by utilizing track category information corresponding to the tracks to be matched. In short, the track matching cost matrix between track groups with inconsistent track category informationUpdating to 0, and carrying out no subsequent operation on the track group with the cost of 0.

And 5.3, calculating the Euclidean distance of the track group to be matched. The Euclidean distance is adopted to measure the similarity of the spatial information among the tracks to be matched, so that the track library to be matched of the current monitoring camera under the unified coordinate system can be well estimatedTrack library to be matched between each track space prediction position (namely, vehicle space information at the next moment) and the next monitoring cameraThe degree of association of the initial spatial positions of the respective vehicle trajectories (i.e., initial vehicle spatial information), the euclidean distance calculates the absolute distance between the tracking nodes. The tracks to be matchedIs defined asTrajectory to be matchedIs defined asThen the track group to be matchedIs the Euclidean distance of (2)The calculation formula of (2) is as follows:

。

And 5.4, calculating the minimum cosine distance of the track group to be matched. Measuring the similarity of depth information among track groups to be matched by using the minimum cosine distance, and enabling the tracks to be matched The target depth information of a tracking node is defined as A, and the track to be matched is defined as ADefining the target depth information of a tracking node as B, and then matching the track group to be matchedThe minimum cosine distance calculation formula of (2) is as follows:

。

Because two different vehicle tracks can also have similar appearance in different frames, the feature vectors of the two different vehicle tracks can be similar, and the appearance of the node detection frame of the same vehicle track can also change to a certain extent at the moment with a slightly larger time difference, so that the factors affecting the similarity are balanced while the matching error of the vehicle tracks is reduced, the minimum cosine distance is calculated at the first three moment nodes in each vehicle track of the track pair to be matched, and the average value is taken as a clue of measurement fusion after the maximum value and the minimum value are removed.

And 5.5, obtaining a final track distance based on the Euclidean distance and the minimum cosine distance so as to update the track matching cost matrix.

Introducing weights in spatial and depth metrics of target vehicles, i.e. in Euclidean distanceAnd minimum cosine distanceIntroducing weights intoThe final track distance is calculated in combination, and the calculation formula is as follows:

；

In the middle of When the visual blind area between adjacent monitoring cameras is short, the space information of the vehicle is relatively reliable, otherwise, the reliability is reduced, and the error is relatively large, soIs set according to the specific scene. At this time, the track matching cost matrix is updated, and an optimal association group among track groups to be matched is obtained by adopting a Hungary algorithm, wherein the larger the value of an element in the track matching cost matrix is, the better the value is, so that when the value of the element of the matrix is smaller than a set threshold value, the matching is considered to be failed. When (when)When the maximum value of the rows and columns in the cost matrix is matched for the track, the track of the vehicle is representedWith the track of the vehicleFor two vehicle tracks of the same target vehicle under adjacent monitoring cameras, the vehicle tracks are determinedAssignment of ID to vehicle trackSuccessful association is achieved. When the vehicle is on trackAnd when the matching is unsuccessful, the target vehicle is not shown in the monitoring camera n+1, is not processed, and is continuously associated with the vehicle track in the track library to be matched, which is successfully initialized by the monitoring camera n+1 in the frame, in the next frame image. When the vehicle is on trackWhen the matching is unsuccessful, the vehicle track is generally considered to not pass through the monitoring camera n, i.e. the vehicle track is at the blind zone position between the monitoring cameras at the beginning of the video, and a new unique ID is assigned to the vehicle track. After the matching of the current frame is completed, deleting the successfully matched vehicle tracks and all the unmatched vehicle tracks in the monitoring camera n+1 from the track library to be matched of the corresponding monitoring camera, and then reading in the next frame image to continue the detection and tracking of the vehicle targets and the cross-camera track association processing.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of cross-camera vehicle track association, comprising:

2. The method of claim 1, wherein the predicting vehicle space information step specifically comprises:

，

；

，

。

3. the method of claim 1, wherein the vehicle depth information includes vehicle category information, vehicle color information, vehicle size information, track category information.

4. The method of claim 1, wherein the step of extracting vehicle depth information comprises:

5. The method of claim 4, wherein the process of the channel attention module obtaining the vehicle characteristic adjustment information comprises:

mapping the pooled result through a shared multi-layer neural network;

，

6. The method of claim 4, wherein the process of the spatial attention module obtaining the vehicle depth information comprises:

，

7. The method of claim 1, wherein the cross-camera trajectory correlation step specifically comprises:

constructing a track matching cost matrix based on a Kalman filtering method;

calculating the Euclidean distance of the track group to be matched;

calculating the minimum cosine distance of the track group to be matched;

8. The method of claim 7, wherein the step of calculating the euclidean distance of the set of trajectories to be matched comprises:

。

9. the method of claim 8, wherein the step of calculating the minimum cosine distance of the set of traces to be matched comprises:

。

10. the method according to claim 9, wherein the step of obtaining a final track distance based on the euclidean distance and the minimum cosine distance is in particular:

。