CN114692964B

CN114692964B - Space-time traffic flow prediction method based on cross attention mechanism

Info

Publication number: CN114692964B
Application number: CN202210294232.7A
Authority: CN
Inventors: 张珣; 梁春芳; 康京丽; 梁义珂; 丛扬潇
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2024-05-31
Anticipated expiration: 2042-03-24
Also published as: CN114692964A

Abstract

The invention discloses a space-time traffic flow prediction method based on a cross attention mechanism, which captures the periodic characteristics of space-time traffic data through the cross attention mechanism improved on the attention mechanism, respectively captures the spatial dependency relationship and the time correlation characteristic of the space-time traffic data by using a graph convolution network and a time convolution network, and then fuses the extracted characteristics, thereby effectively realizing the space-time traffic flow prediction. The method has complete modeling of various characteristics of space-time traffic data, can effectively realize the prediction task of traffic flow, has more accurate and reliable prediction of traffic flow, has lower data requirements, can improve the accuracy and the interpretability of prediction, and is more suitable for popularization and application.

Description

Space-time traffic flow prediction method based on cross attention mechanism

Technical Field

The invention belongs to the technical field of space-time data processing, relates to a traffic prediction technology of deep learning in space-time data mining, and particularly relates to a space-time traffic flow prediction method based on a cross attention mechanism.

Background

With the development of traffic sensor technology, loop detectors, road cameras, and mobile devices are generating and capturing large amounts of space-time traffic data. The space-time traffic data refers to data acquired by entities with space coordinates at the same time frequency, and comprises three-dimensional information of time, space and traffic thematic attributes, which can be understood as a time sequence set generated on different space coordinates. The mining of the space-time traffic data is helpful for finding out the space-time pattern contained in the space-time traffic data, provides basis for the management and control of related departments and the guidance of traffic transportation, and helps decision makers to better make decisions. In conventional statistical-based data mining methods, there is a common assumption that data samples are generated independently, however, spatio-temporal data are auto-correlated and have a high degree of non-linearity and dynamic correlation, so classical data mining methods do not handle such data well. With the development of deep learning technology, a model based on deep learning has been applied to solve various spatiotemporal data prediction problems, with great capability of layered feature learning.

One classical problem in the task of spatio-temporal data prediction is traffic flow prediction, i.e. predicting the traffic flow at a future time based on historically observed traffic data. Traffic flow prediction faces some challenges that are common in that, first, traffic data can be seen as time series data that has time series properties, including proximity, periodicity, trending, etc. The spatio-temporal traffic data has heterogeneity in the spatial dimension, that is to say the traffic patterns of different regions or locations are different. From both the time and space dimensions, the traffic network has a correlation in both the time and space dimensions, while these two dimensions are mixed together to form a so-called spatiotemporal correlation. The prediction of traffic data is to consider how to capture time series, spatial heterogeneity and time-space correlation simultaneously. The prior art mostly regards traffic data simply as time series data, models historical data with a recurrent neural network (Recurrent neural network, RNN) or a time convolution network (Temporal convolutional network, TCN), but ignores spatial heterogeneity, making the predictive performance of these models poor. Part of the research introduces a graph convolution network (Graph convolutional network, GCN), designs a space-time convolution module, and jointly learns by simply stacking a plurality of modules, and does not model the periodic characteristics of traffic data well in the time dimension although the space attribute is considered, so that the model has limited representation capability and poor interpretation.

Disclosure of Invention

Aiming at the problems, the invention provides a space-time traffic flow prediction method based on a cross attention mechanism, which is named as cross attention by improving a common attention mechanism, captures periodic characteristics of space-time traffic data by using the cross attention mechanism, automatically gives relatively more attention to valuable information through a cross attention module, then respectively captures spatial dependency and time correlation characteristics of the space-time traffic data by using a graph convolution network and a time convolution network, finally fuses the extracted characteristics, and completely models various characteristics of the space traffic data.

The technical scheme of the invention is as follows:

a space-time traffic flow prediction method based on a cross attention mechanism comprises the following steps:

1) Dividing the acquired traffic data in different time scales (periods) at different positions on a time axis, and taking the data in different time scales (data in different time periods) as input data;

2) Fusing different time scale data (input data) by using an improved Cross Attention mechanism (Cross Attention), and performing residual connection on fused information and the input data; comprising the following steps:

21 The features of the target period are recombined by calculating the attention scores of the target period and other different time scales; in particular implementations, the method may include:

211 The characteristics of the adjacent time are recombined, and firstly, the adjacent time characteristic representation considering the correlation with the daily period is calculated; then calculating the attention score calculated with the cycle, multiplying the attention score by the characteristic of the cycle, and obtaining the adjacent time characteristic representation considering the correlation with the cycle;

The attention score represents the magnitude of the correlation between the elements of the two matrices, obtained by calculating the inner product of the two matrices, each element in this inner product matrix representing the correlation between different features within each node at different time periods.

Respectively recombining adjacent time features through two cross attention modules;

212 And then further fusing by utilizing a self-defined parameter matrix to be trained to obtain the final characteristic representation after the adjacent time passes through the cross attention module.

Further, to optimize training efficiency, the fused information and the original input may be connected as residuals.

22 A characteristic representation of the day period is calculated using the characteristic of the adjacent time and the characteristic of the week period, respectively;

23 A characteristic representation of the week period is computationally available using a characteristic of the adjacent time and a characteristic of the day period, respectively;

3) Constructing a graph structure of a traffic network, wherein nodes in the graph represent traffic sensors, and the characteristics of each node can be regarded as sensor signals on the traffic network graph;

Two patterning approaches can be used to construct the graph structure of the traffic network, including: one is a common adjacency matrix patterning method, and the other is a node distance patterning based method, i.e. a threshold gaussian kernel is used to calculate the weights of the edges of connected nodes.

4) Processing signals on the graph by using a graph rolling network (Graph Convolutional Network, GCN), and extracting to obtain the spatial correlation of the traffic network; extracting time correlation by using a time convolution network TCN, and connecting the input of the graph convolution network and the output of the time convolution network as residual errors;

The signals on the traffic network graph are processed by two layers of graph rolling networks (GCNs) respectively, so that the correlation characteristics of the traffic network in the space dimension are captured. And further fusing the characteristics obtained by the two patterning methods, and outputting the fused characteristics as a graph rolling network. Traffic road network data is modeled from a time dimension using a Time Convolutional Network (TCN) to update the signals of nodes by merging information on adjacent time slices. And performing residual connection between the obtained output and the input of the graph rolling network through a layer of activation function.

5) The one-dimensional convolution is utilized to ensure that the output of each time period branch (such as an adjacent time branch, a daily period branch and a Zhou Zhouqi branch) has the same dimension and shape as the traffic road network prediction target, and finally the output of the three branches is fused to be used as the final traffic road network flow prediction result.

In the implementation, after the input of the graph convolution network of each branch and the output of the time convolution network are connected in a residual way, the output of each branch and a prediction target are ensured to have the same dimension by utilizing one-dimensional convolution, and then the outputs of the three branches are fused to be used as a final prediction result.

Compared with the prior art, the invention has the advantages that:

The invention provides a space-time traffic flow prediction method based on a cross attention mechanism, which improves the existing attention mechanism, so that the existing attention mechanism can capture the correlation between multiple time scales, and simultaneously designs a graph convolution network for acquiring spatial characteristics and a time convolution network for describing time sequence dependency, wherein the graph convolution network describes the spatial relationship by two composition methods. The invention carries out complete modeling on various characteristics of space-time traffic data, considers the heterogeneity of space, can effectively realize the task of predicting traffic flow, and improves the accuracy and the interpretability of prediction.

Drawings

FIG. 1 is a diagram of an overall network model architecture for a space-time traffic flow prediction method of the present invention;

Wherein x _h is the divided adjacent time data; x _d is the divided daily cycle data; x _w is the divided weekly period data; cross Attention is an improved Attention mechanism, named Cross Attention, (d- > h), (w- > h) denotes the use of x _d and x _w to calculate the Attention score with x _h, respectively, to reorganize x _h, (h- > d), (w- > d) denotes the use of x _h and x _w to calculate the Attention score with x _d, respectively, to reorganize x _d, (h- > w), (d- > w) denotes the use of x _h and x _d to calculate the Attention score with x _w, respectively, to reorganize x _w; fusion is a Fusion process, wherein a parameter matrix is utilized to perform dimension transformation, and then a plurality of inputs are summed to obtain one output; is residual connection; GCN is a graph rolling module; TCN is a time convolution module; 1D-Conv is one-dimensional convolution for dimension transformation; /(I) An output derived for the adjacent time branch; /(I)Output obtained for the day period branch; /(I)Output obtained for cycle branching; /(I)The three branches are fused and finally output, namely the result of model prediction; y is real data; loss is a Loss function, the Loss of a predicted value and a true value is calculated, and parameters in the model are updated through inverse gradient propagation.

FIG. 2 is a schematic diagram of a specific process of computing time of day and time period cross-attention using an improved attention mechanism in the present invention;

Wherein x _h is the divided adjacent time data; x _d is the divided daily cycle data; n is the number of nodes in the traffic road network; f is the characteristic dimension of each node; th is the number of time slices of the selected adjacent time; td is the number of time slices of the selected daily period; d _k、d_v are super-parameters, where d _k also serves as a normalization factor when calculating the attention score; Representing a parameter matrix to be trained, and randomly obtaining an initial value; CA _d→h(x_h,x_d) for calculating the cross attention of the adjacent time and the day period, x _h is multiplied by a parameter matrix to obtain Q, x _d is multiplied by the parameter matrix to obtain K and V, then the attention scores of Q and K are calculated, and the feature matrix after the recombination of the adjacent time x _h is obtained by normalizing and softmax function and multiplying by V, and the dimension of the matrix is n×t _h×d_v.

Fig. 3 is a flow chart of the method of the present invention.

Detailed Description

The invention is described below with reference to the drawings and the detailed description.

The invention provides a space-time traffic flow prediction method based on a cross attention mechanism, and fig. 3 is a flow chart of the method of the invention, comprising the following steps:

(1) Dividing the acquired traffic data at different positions on a time axis to obtain characteristic information under different time scales; the specific implementation time division is divided into three time scales as original input;

(2) Improving the Attention mechanism into a Cross Attention mechanism (Cross Attention), fusing characteristic information under different time scales by using the Cross Attention mechanism, and performing residual connection between the fused information and the original input;

(3) Then under two composition modes, extracting the spatial correlation by using a graph rolling network (Graph Convolutional Network, GCN);

(4) Extracting the time correlation by using a time convolution network (Temporal Convolutional Network, TCN), and connecting the input of the graph convolution network and the output of the time convolution network as residual errors;

(5) The one-dimensional convolution is utilized to ensure that the output obtained by the three time period branches (adjacent time branches, daily period branches and Zhou Zhouqi branches) has the same dimension and shape as the target to be predicted (traffic flow information of each node in the traffic road network), and finally the output of the three branches is fused to be used as the final prediction result.

In step (1), the acquired traffic data is divided according to the time sequence in consideration of the adjacency and periodicity of the time sequence, and three time scales of the adjacency time, the daily cycle and the weekly cycle are divided as input data, for example, the adjacency time takes traffic data of the first two hours of the period to be predicted, the daily cycle takes traffic data of the same period of the day before the period to be predicted, and the weekly cycle takes traffic data of the same period of the week before the period to be predicted. The invention adopts a multi-component construction mode, as shown in figure 1, each component adopts the same model structure but different parameters to be trained. The characteristic information under different time scales is obtained through data partitioning, wherein the data partitioning is to intercept different time periods from historical data to be used as inputs of adjacent time branches, day period branches and week period branches respectively, for example, traffic flow from 7 points to 8 points early on Tuesday is to be predicted, and the week period is to intercept traffic data from 7 points to 8 points early on Tuesday (the previous week); the day period is traffic data from 7 to 8 points earlier on the current monday (the previous day); the adjacent time is to intercept traffic data from 5 to 7 points (the first two hours to be predicted) on the morning of tuesday.

Specifically, the input data obtained by dividing are three-dimensional (h, d, w dimensions) matrixes, xh, xd and Xw dimensions are n×f×t, i.e. N is the number of nodes, F is the feature number, T is the number of time slices, and since the time scales selected in each time period may be different, the sizes of the time slices T may be different, the input data are divided into Th, td and Tw, and then are uniformly aligned to T through matrix multiplication, i.e. the number of time slices (time step) to be finally predicted, for example, the custom parameter matrix size is th×t, and the size is n×f×th through the output of the left-multiplication Xh branches, so as to obtain the size n×f×t; the other two branches are aligned in the same way.

In step (2), the divided data in step (1) is input into an improved attention module, Q (Query, query matrix), K (Key, keyword matrix), V (Value, value matrix) of the traditional attention mechanism are obtained from the same matrix, the attention mechanism is initially used for machine translation task in natural language processing, and is used for finding out the correlation relation of each vocabulary in sentences, and is popularized as a method for finding out the correlation between features, namely, attention weights are continuously updated through training, so that the model can pay attention to important parts, and a better effect is achieved. The specific calculation is to calculate the inner product of Q and K to represent the correlation between the internal elements, this inner product is called attention score, in order to find the correlation between the elements in the same matrix, and considering that there may be correlation between different periods of time series, the invention improves this, Q and K are respectively from different matrices, the attention score between two matrix elements is calculated, in order to find the correlation between the elements in two matrices, i.e. a certain period calculates the attention score with another two periods respectively by means of a cross attention module, in order to re-represent the characteristics of itself, wherein V and K are obtained from the same matrix. The calculation of the cross-attention mechanism is shown in fig. 2. For example, the features of adjacent time are recombined, attention scores obtained by daily periods are calculated first, and then the features of the daily periods are multiplied, so that the attention scores are taken as an adjacent time feature representation taking the correlation relationship with the daily periods into consideration; then calculating the attention score calculated by the cycle, multiplying the attention score by the characteristic of the cycle, and taking the attention score and the characteristic of the cycle as the adjacent time characteristic representation taking the correlation relationship with the cycle into consideration; and respectively recombining the adjacent time features through the two cross attention modules, respectively re-representing the adjacent time features in the daily cycle feature space and the weekly cycle feature space, respectively multiplying the recombined adjacent time cycle features by parameter matrixes to transform the same dimension, and then carrying out matrix addition on the two matrixes to obtain the final feature representation of the adjacent time after passing through the cross attention module. The characteristic representation of the daily period is calculated by utilizing the characteristic of the adjacent time and the characteristic of the week period respectively, the characteristic representation of the week period is calculated by utilizing the characteristic of the adjacent time and the characteristic of the daily period respectively, and the process is equally available. In order to optimize training efficiency, the fused information and the original input are connected in a residual mode.

In the step (3), after the recombined multi-time scale features are obtained, the spatial features are further extracted. Traffic networks are essentially a graph structure in which nodes represent traffic sensors, and the characteristics of each node can be considered as a signal on the graph. At present, the research of popularizing the grid-based image convolution to the graph structure data is roughly divided into two directions, namely, the graph convolution based on a spectrum domain, and the graph Fourier transformation and the spectrogram theory are utilized to realize the convolution operation on the graph by migrating the basic concept of signal processing to the graph signal domain; and secondly, based on the spatial domain graph convolution, the characteristic of the adjacent matrix is utilized in space, the adjacent matrix is multiplied by the characteristic matrix, the characteristic addition of the adjacent node equivalent to a certain node is realized, and multi-layer superposition can be used for aggregating the information of the multi-hop adjacent node. In order to fully utilize the topological property of the traffic network, the invention selects the graph convolution based on the spatial domain, and compared with the graph convolution based on the spectral domain, the graph convolution based on the spatial domain has basically the same information aggregation effect and lower complexity. The invention adopts two modes of composition, one is a common adjacent matrix composition method, and the other is a node distance composition method. The signals on the graph are then processed by a two-layer graph rolling network (GCN) to capture the correlation of the traffic network in the spatial dimension. The features obtained by the two patterning methods are further fused (two adjacent matrixes obtained by the two patterning methods are respectively input into the graph convolution, and then the obtained output feature matrixes are subjected to weighted summation) and used as the output of the graph convolution module, the initial weight is reset to 0.5, and the weight value is optimized during training.

In step (4), after neighboring information of each node on the graph is captured using the graph convolutional network, modeling is further performed from the time dimension using a Time Convolutional Network (TCN). Processing timing problems cyclic neural networks (RNNs) and variants thereof are commonly used, because the cyclic autoregressive structure of the RNNs can be well represented by time series, but in recent years, specific convolutional neural networks have also been used to process timing problems, such as Time Convolutional Networks (TCNs), which can be processed in parallel and by sampling historical data at intervals, compared to RNN sequential processing time series, so as to expand the receptive field of the convolutional networks, and meanwhile, the time convolutional networks are less prone to gradient vanishing during training, and thus the training efficiency can be improved. The signals of the nodes are updated by merging the information on the adjacent time slices. And obtaining output through an activation function layer of the time convolution network, and connecting the obtained output with the input of the graph convolution network as residual errors.

In step (5), after the input of the graph convolution network in each branch (adjacent time branch, daily cycle branch and Zhou Zhouqi branch) is connected with the output of the time convolution network, one-dimensional convolution is utilized to ensure that the output of each branch has the same dimension as the prediction target, and then the output of the three branches is summed to serve as a final prediction result, a weight value is randomly allocated to each branch during summation, and the weight value is optimized during training.

The implementation steps of the specific implementation include:

(1) The traffic road network is represented as a graph structure, nodes in the graph represent sensors for collecting data, each sensor records a plurality of road characteristic values such as traffic flow, average speed, average occupancy and the like at the same time frequency, namely each node can obtain a time sequence, each time point on the time sequence comprises a plurality of characteristic values, the time sequence is represented as a single time slice on the time axis, and the time slices can be regarded as a graph structure and represent the data collected at the same time. For example, a real highway data set in california, with a collection time interval of 5 minutes, records 3 road characteristics per sensor, including vehicle flow, vehicle speed, and average occupancy. Considering the adjacency and periodicity of traffic data, the method divides the traffic data into three time scales at different positions on a time axis, wherein the traffic data are respectively adjacent time traffic data x _h, daily period traffic data x _d and weekly period traffic data x _w as input. Taking the example of calculating the cross attention between x _h and x _d, first calculating the attention scores of x _h and x _d, then multiplying the attention scores by V obtained by x _d, and obtaining a matrix which is the representation of x _h in the feature space of x _d, that is, considering the correlation between different features of each node in two periods, and the flow is shown in fig. 2.

The following describes a calculation process for feature recombination of the adjacent time x _h, and the same applies to the calculation process for feature recombination of the periodic day traffic data x _d and the periodic week traffic data x _w. First, the features at adjacent time x _h are reorganized using the period of day x _d features:

Wherein Y _dh is the output of the adjacent time traffic data x _h and the daily period traffic data x _d obtained by the cross attention module, N represents the number of nodes, F represents the characteristic dimension of each node, th represents the number of time slices of selected adjacent time, td represents the number of time slices of selected daily period, d _k represents the normalization coefficient, Representing a matrix of parameters to be trained. Here, x _h is Q and x _d is K and V, where The resulting output/>

The features at adjacent time x _h are then reorganized using a period of weeks x _w:

wherein Y _wh is the output of the adjacent time traffic data x _h and the periodic traffic data x _w through the cross attention module, Tw represents the number of time slices of the selected cycle,/>Representing a matrix of parameters to be trained. Here, x _h is Q, x _w is K and V, where/>The resulting output/>

After the features of the adjacent time x _h are recombined through the two cross attention modules, the features are multiplied by parameter matrixes respectively to be transformed into the same dimension, then the two matrixes are added in a matrix mode to obtain the final feature representation Z _h of the adjacent time x _h after passing through the cross attention modules, and the daily periods x _d and Zhou Zhouqi x _w obtain the feature representations Z _d and Z _w after passing through the cross attention mechanism, wherein the calculation formula is as follows:

wherein, Representing the parameter matrix to be trained, and performing dimension transformation on the recombined characteristics to keep the same dimension with the original input.

To optimize training efficiency, a residual connection is employed at output as the output of the cross-attention module:

(2) The cross-attention module may allow the cross-attention module to automatically pay relatively more attention to the valuable information. And carrying out residual connection on the recombined characteristic and the original input characteristic, sending the recombined characteristic and the original input characteristic into a graph convolution network, and capturing a spatial dependency relationship from a neighborhood of a node in the graph by using the graph convolution network. In the present invention, a traffic network is defined as an undirected graph g= (V, E, a), where V represents a set of N nodes, |v|=n; e is a set of edges representing connectivity between nodes; a εR ^N×N represents the adjacency matrix of FIG. G. In a traffic network, each node detects F measurements at the same sampling frequency.

The invention adopts two modes to carry out composition, one is a common adjacent matrix composition method, and an adjacent matrix A ₁ is obtained; the other is based on a node distance composition method, and the weight of the edge of the connected node is calculated by using a threshold Gaussian kernel to obtain an adjacency matrix A ₂.

The specific definition of the composition is as follows:

adjacent matrix patterning:

node distance patterning:

Where d represents the distance between two sensors in the traffic network and σ represents the standard deviation of the distance between the sensors in the traffic network. The two patterns result in two undirected graphs G ₁＝(V,E,A₁),G₂＝(V,E,A₂, which are input into the graph rolling network, respectively.

(3) The two compositions are respectively input into GCN, and in order to avoid a large amount of operations caused by decomposing eigenvectors by a Laplace matrix in spectrum domain graph convolution, the space domain graph convolution is adopted in the specific implementation of the invention. In order to extract a larger range of spatial features, multiple layers of GCNs can be overlapped, but the multiple layers of GCNs have the problem of excessive smoothness, namely after information aggregation of multiple layers of nodes, signals of each node are more and more similar, the degree of distinction of the nodes is reduced, and therefore the accuracy of a prediction result is low, so that the spatial features are extracted by adopting two layers of GCNs. After two layers of GCNs are passed, respectively distributing a weight, adding to obtain the output of graph rolling network

The calculation formula of the spatial convolution under adjacent time branches is as follows:

The calculation formula of the space convolution under the solar cycle branch is as follows:

the calculation formula of the space convolution under the cycle branch is as follows:

Where a ₁ represents an adjacency matrix and a ₂ represents an adjacency matrix based on node distances. I is an identity matrix, and adding the identity matrix ensures that when neighbor information is aggregated by using an adjacent matrix, information of a node is added at the same time; /(I)The degree matrix is used for normalizing the information propagated by each node. The weights calculated using the threshold gaussian kernel do not require normalization using the degree matrix. Obtaining the output/>, of a graph rolling network through two layers of GCNs Wherein the parameter matrix to be trained/> Σ () represents an activation function, and the parameters to be trained δ _(·) are all 0.5.

(4) After capturing the spatial correlation with graph convolution, each time slice is aligned along the time axis, modeled from the time dimension using the time convolution network TCN, and the signals of the nodes are updated by merging the information on the adjacent time slices:

Where, { Φ _h,Φ_d,Φ_w } represents the parameters to be trained of the time convolution, i.e. the convolution kernel of the time convolution network, To optimize training efficiency, a residual connection is also employed at output as the output of the time convolution network:

Wherein Z _h,Z_d,Z_w is the output of the cross-attention module, Is the output of the time-convolution network,

(5) After the input and the output of the space-time convolution module of each branch are connected in a residual way, the output of each branch and a traffic network flow prediction target are ensured to have the same dimension and shape by utilizing one-dimensional convolution:

Wherein the method comprises the steps of { Φ' _h,Φ′_d,Φ′_w } represents the parameters of one-dimensional convolution,/>The number of convolved input channels is Th, td, tw, respectively, and the number of output channels is T, which represents the time step to be predicted. The outputs of the three branches are then fused as the final output of the model:

Wherein the method comprises the steps of “/>"Means the Hadamard product, i.e. the multiplication of the corresponding positions of the matrix elements. { W _h,W_d,W_w } is a parameter to be trained, and when outputs of different branches are fused, the influence weights of the three branches on each node are different, and the three branches need to be learned from historical data.

(6) In model training and validation, random gradient descent SGD (Stochastic GRADIENT DESCENT) is used as an optimization function, mean square error MSE (Mean Square Error) of predicted and true values is used as a loss function, and minimization by back propagation. The root mean square error RMSE (Root Mean Square Error) and the average absolute error MAE (Mean Absolute Error) were used as evaluation indexes. The specific formula is as follows:

wherein Y _i represents the ith true value, The i-th predicted value is represented, and N represents the number of nodes. As can be seen from the formula, the smaller the numerical value obtained by calculation of the two evaluation indexes is, the better the model training effect is, and the more accurate the predicted result is.

It should be noted that the purpose of the disclosed embodiments is to aid further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims

1. A space-time traffic flow prediction method based on a cross attention mechanism captures periodic characteristics of space-time traffic data through the cross attention mechanism which improves the attention mechanism, a graph convolution network and a time convolution network are utilized to capture space-time dependence relation and time correlation characteristics of the space-time traffic data respectively, and then the extracted characteristics are fused, so that space-time traffic flow prediction is effectively realized; the method comprises the following steps:

1) Dividing the acquired traffic data at different positions on a time axis in different time periods to obtain data of different time periods as input data;

2) Fusing the divided input data in the step 1) through an improved cross attention mechanism, and performing residual error connection on the fused information and the input data; the method of fusion by improved cross-attention mechanisms includes:

21 The features of the target period are recombined by calculating the attention scores of the target period and other time periods;

In the improved cross attention mechanism, the query matrix and the keyword matrix come from different matrixes respectively, and the correlation between the elements in the two matrixes is found out by calculating the attention scores between the elements of the two matrixes, namely, the attention scores of the target time period and other time periods are calculated respectively by an improved cross attention module, so that the re-representation of the self characteristics is completed;

22 Utilizing the parameter matrix to further fuse, namely multiplying the recombined target period characteristics by the parameter matrix respectively, transforming the matrix into the same dimension, and then adding the matrix to obtain the characteristic representation obtained after the target time period passes through the cross attention module;

23 Performing residual connection on the characteristic representation information after fusion in the step 22) and input data;

3) Constructing a graph structure of a traffic network; nodes in the graph represent traffic sensors, and each node is characterized by signals on a traffic road network graph;

4) Processing signals on the graph by using a graph rolling network GCN, and extracting to obtain the spatial correlation of the traffic network; extracting time correlation by using a time convolution network TCN, and connecting the input of the graph convolution network and the output of the time convolution network as residual errors;

The method specifically comprises the steps of processing signals on a graph by using two layers of graph rolling networks GCN, further fusing the characteristics, and outputting the fused characteristics as the graph rolling networks, thereby capturing the correlation characteristics of the traffic network in the space dimension;

Modeling from a time dimension by using a time convolution network TCN, and updating signals of nodes by combining information on adjacent time slices; the time slices are traffic data acquired at the same time; then outputting an activation function of the time convolution network;

Residual connection is carried out on the output of the obtained time convolution network and the input of the graph convolution network;

5) The output of each time period branch and the traffic network flow prediction target have the same dimension through one-dimensional convolution, and the outputs of different time period branches are fused to be used as the final traffic network flow prediction result;

Through the steps, the space-time traffic flow prediction based on the cross attention mechanism is realized.

2. The method for predicting traffic flow in space-time based on cross-attention mechanism of claim 1, wherein in step 1), the different time periods are divided into three time scales including adjacent time, day period and week period; the corresponding traffic data are adjacent time traffic data, daily cycle traffic data and weekly cycle traffic data respectively.

3. The method for predicting traffic flow in space-time based on cross-attention mechanism of claim 2, wherein step 21) reorganizes the features of the target period, and reorganizes the features of the adjacent time, specifically: calculating to obtain a neighboring time characteristic representation considering the correlation with the period of the day; calculating the attention score calculated by the cycle time, and multiplying the attention score by the characteristic of the cycle time to obtain the adjacent time characteristic representation considering the correlation with the cycle time; for the characteristic representation of the daily period, calculating by utilizing the characteristic of the adjacent time and the characteristic of the week period respectively; the characteristic representation of the cycle is calculated by using the characteristic of the adjacent time and the characteristic of the day cycle.

4. The method for predicting traffic flow in space-time based on cross-attention mechanism as recited in claim 3, wherein the process of feature recombination for the neighboring time x _h is as follows:

first, the features of the adjacent time x _h are reorganized using the time of day x _d features, expressed as:

Wherein Y _dh is the output of the adjacent time traffic data x _h and the daily period traffic data x _d obtained by the cross attention module, N represents the number of nodes, F represents the characteristic dimension of each node, th represents the number of time slices of selected adjacent time, td represents the number of time slices of selected daily period, d _k represents the normalization coefficient, Representing a parameter matrix to be trained; here, x _h is Q and x _d is K and V, where The resulting output/>

The features at adjacent time x _h are then reorganized using a period of weeks x _w, denoted as:

wherein Y _wh is the output of the adjacent time traffic data x _h and the periodic traffic data x _w through the cross attention module, Tw represents the number of time slices of the selected cycle,/>Representing a parameter matrix to be trained; here, x _h is Q, x _w is K and V, where/>The resulting output/>

5. The cross-attention mechanism based spatio-temporal traffic flow prediction method of claim 1, wherein in step 3), the traffic network is defined as an undirected graph g= (V, E, a), where V represents a set of N nodes, |v|=n; e is a set of edges representing connectivity between nodes; a ε R ^N×N represents the adjacency matrix of graph G; in a traffic network, each node detects a measurement value at the same sampling frequency; specifically, an adjacency matrix composition method and a graph structure based on a node distance composition method are adopted to construct a traffic network, so that an adjacency matrix composition and a node distance composition are respectively obtained.

6. The cross-attention mechanism based spatio-temporal traffic flow prediction method of claim 5, further characterized by the adjacency matrix patterning method expressed as:

The node distance patterning method is expressed as:

Where d represents the distance between two sensors in the traffic network and σ represents the standard deviation of the distance between the sensors in the traffic network.

7. The space-time traffic flow prediction method based on the cross attention mechanism as claimed in claim 2, wherein in the step 4), spatial features are extracted by adopting a two-layer graph rolling network GCN, weights are respectively allocated after passing through the two layers of GCNs, and the outputs of the GCNs are obtained by adding;

the computation of the spatial convolution under adjacent time branches is expressed as:

The computation of the spatial convolution under the photoperiod branch is expressed as:

The computation of the spatial convolution under the cycle-by-cycle branch is expressed as:

Wherein a ₁ represents an adjacency matrix, and a ₂ represents an adjacency matrix based on node distances; I is an identity matrix, and adding the identity matrix ensures that when neighbor information is aggregated by using an adjacent matrix, information of a node is added at the same time; /(I) The degree matrix is used for normalizing information transmitted by each node; the weight calculated by using the threshold Gaussian kernel does not need to be normalized by using a degree matrix; obtaining the output/>, of a graph rolling network through two layers of GCNs Wherein the parameter matrix to be trained/> Σ () represents an activation function, and the parameters to be trained δ _(·) are all 0.5.

8. The method for predicting traffic flow in time and space based on the cross-attention mechanism of claim 7, wherein after capturing the spatial correlation by using graph convolution, each time slice is arranged along a time axis, modeling is performed from a time dimension by using a time convolution network TCN, and signals of nodes are updated by combining information on adjacent time slices, which is expressed as:

9. The method for predicting traffic flow in space and time based on cross-attention mechanism as claimed in claim 8, wherein step 5) after the input and output of the spatio-temporal convolution module of each branch are connected as residual errors, the output of each branch is ensured to have the same dimension and shape as traffic network traffic prediction targets by using one-dimensional convolution, expressed as:

Wherein the method comprises the steps of { Φ' _h,Φ′_d,Φ′_w } represents the parameters of one-dimensional convolution,/>The number of convolved input channels is Th, td, tw, respectively, and the number of output channels is T, which represents the time step to be predicted.

10. The cross-attention mechanism based space-time traffic flow prediction method of claim 9 wherein the outputs of the three branches are fused as a model final output expressed as:

Wherein the method comprises the steps of Representing Hadamard products, namely multiplying corresponding positions of matrix elements; { W _h,W_d,W_w } is the parameter to be trained.