CN116186547A

CN116186547A - Method for rapidly identifying abnormal data of environmental water affair monitoring and sampling

Info

Publication number: CN116186547A
Application number: CN202310467167.8A
Authority: CN
Inventors: 赵鑫; 梁彬锐; 阳秀春; 黄文稻; 邓超联; 张毅; 王晖文; 彭玉萍; 江锦燕; 龚艳光; 杨茂勇; 谢艳玲; 刘瑶瑶
Original assignee: Shenzhen Ghy Environment Water Conservancy Co ltd
Current assignee: Shenzhen Ghy Environment Water Conservancy Co ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-05-30
Anticipated expiration: 2043-04-27
Also published as: CN116186547B

Abstract

The invention provides a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling, which relates to the technical field of water affair data processing, and comprises the following steps: s1, given collected water service data, extracting characteristics based on sequence time sequence distribution; s2, carrying out feature aggregation based on space diagram distribution and Kullback-Leibler divergence on the features to obtain feature vectors of space distribution information fused with different time sequence data; s3, constructing a data correlation matrix based on a mechanism equation for the feature vector; s4, introducing picture information in the data, extracting image features, and constructing a cross-modal vision-time sequence-spatial distribution-mechanism feature characterization model to obtain integral features; s5, constructing a feature decoding prediction and prediction network training module, decoding the integral features, outputting a predicted value and training a prediction network; s6, predicting new water service data by using the existing training set data, judging whether distribution deviation occurs or not, and identifying data abnormality.

Description

Method for rapidly identifying abnormal data of environmental water affair monitoring and sampling

Technical Field

The invention relates to the technical field of water affair data processing, in particular to a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling.

Background

In the field of environmental water engineering, a large amount of data, such as real-time flow, temperature, dissolved oxygen, algae, organic carbon, organic phosphorus, organic nitrogen, ammonia nitrogen, etc., are usually collected in a certain water area, and these data can be used for analyzing the hydrodynamic characteristics of the water area and the pollution condition, pollution degree and pollution mechanism of the water area. Therefore, the step of collecting data plays an important role in water area flood control and drainage and pollution control. However, data collected in the field often causes anomalies in the data for various reasons, such as sensor damage, mishandling of the collection personnel, accumulated errors in the equipment, etc. Therefore, from a large amount of data, the automatic identification of abnormal data is particularly critical, and the method has very important significance for ensuring the environmental water analysis work.

Through searching, publication number CN109160550a discloses an urban sewage treatment information management system, wherein an expert system based on a fault tree is adopted to identify abnormal parts in the sewage treatment process. However, such methods do not adequately consider the spatio-temporal continuity relationship between the same data types nor the mechanism relationship expressed in differential equations between different data types. In this method, only the abnormality (too high or too low) of the data size is simply considered, and the abnormality in the system is judged by referring to the fault tree. Such expert systems are characterized as simple, efficient, but fail to identify complex, potential system failures. In addition, the chinese patent of invention, publication No. CN111830871a, discloses abnormality identification of water affair monitoring equipment by using a frequency domain analysis method of equipment data, but the scheme has the following drawbacks: representing data features in the frequency domain typically represents inadequate capability, difficulty in characterizing insignificant anomalies, and difficulty in finding potential data anomalies.

The disclosed scheme has the problems of unreliability and inaccuracy in identifying the abnormality of water service data, and the invention provides a new scheme for identifying complex and potential abnormality so as to realize more reliable and more accurate data abnormality identification in environmental water service and meet the actual needs in order to improve the abnormality identification capability.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling, so as to solve the technical problems.

The technical method adopted for solving the technical problems is as follows: a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling is characterized in that: the method comprises the following steps of: s1, given collected water service data, extracting characteristics based on sequence time sequence distribution; s2, carrying out feature aggregation based on space diagram distribution and Kullback-Leibler divergence on the features to obtain feature vectors of space distribution information fused with different time sequence data; s3, constructing a data correlation matrix based on a mechanism equation for the feature vector; s4, introducing picture information in the data, extracting image features, and constructing a cross-modal vision-time sequence-spatial distribution-mechanism feature characterization model to obtain integral features; s5, constructing a feature decoding prediction and prediction network training module, decoding the integral features, outputting a predicted value and training a prediction network; s6, predicting new water service data by using the existing training set data, judging whether distribution deviation occurs or not, and identifying data abnormality.

In the above method, the step S1 includes the following steps:

s11, giving acquired data

Where m, n denote that the data is acquired at a spatial location (m, n), t denote that the data is acquired at time t, and k denote the type of the data;

s12, constructing a time sequence feature extraction network based on a transducer for each data category k;

s13, aggregating the same kind of data at the same place to form a sequence data

And extracting the characteristics of all the sequence data by using the time sequence characteristic extraction network.

In the above method, in the step S13, features of all the sequence data are extracted, and a weighted time sequence attention mechanism is adopted, and the formula is as follows

Wherein the method comprises the steps of

。

In the above method, the attention weight matrix based on the time sequence is obtained as

。

In the above method, the step S2 includes the following steps:

s21, arranging the characteristics constructed by the obtained data of the same category according to the geographical position distribution of the data, constructing a characteristic diagram, wherein nodes in the characteristic diagram are characteristic vectors of a piece of time sequence data, and the weight calculation mode of edges between two nodes is as follows: the geographic position is defined as

Is constructed such that a mean value is +.>

Covariance is

Is a gaussian distribution of (c); two adjacent nodes in the feature map>

The weight of the edge of (2) is calculated according to the Kullback-Leibler divergence between two Gaussian distributions

；

S22, adopting a graph network to perform weighted feature aggregation on the data time sequence feature vectors based on the spatial arrangement in the feature graph.

In the above method, the step S3 includes the following steps:

s31, constructing importance scores among different data categories according to hydrodynamic force, water quality and algae growth and elimination change mechanisms;

s32, constructing an adjacent matrix among different data types according to the fraction

Wherein the elements of the adjacency matrix are->

Representing the marked class of data->

The scores between the data are used for constructing a correlation matrix between the data>

The construction mode is that

。

In the above method, the step S4 includes the following steps:

s41, introducing picture information in the data, and extracting picture information features by adopting Vision Transformer;

s42, splicing the picture information features with the feature vectors;

s43, utilizing the correlation matrix

Weighting the spliced features, and constructing a cross-modal vision-time sequence-space distribution-mechanism feature characterization model to obtain integral features.

In the above method, the step S5 includes the following steps:

s51, decoding the integral features, and outputting predicted values of (m, n, t, k) data at unknown points;

s52, masking random data in the original data, carrying out position coding, time coding and data type coding on the masked data, predicting the masked data by using a prediction network, constructing gradient information for the prediction target of the euclidean distance between the true value and the predicted value of the masked data, and training the prediction network.

In the above method, in the step S51, the overall feature is decoded, and a Multi-layer Multi-Head attention network is used as a decoder.

In the above method, the step S6 includes the following steps:

s61, using the existing training set data to generate new data

Extracting q batches of data in the neighborhood of (1)>

Cyclically taking the q batches of data as input of a prediction network, predicting the data of (m, n, t, k) at the unknown points, and outputting a prediction mean +.>

And prediction standard deviation->

Q predicted values +.>

And prediction standard deviation

；

S62, constructing q Gaussian distributions according to the predicted mean and the predicted variance

；

S63, using new data

As Gaussian distribution->

Calculating the probability density of the sample point in the q Gaussian distributions;

s63, when the sample point is in one of the Gaussian distributions and the probability density of the new data is lower than a set threshold, identifying that the sample point is abnormal.

The beneficial effects of the invention are as follows: the method has the advantages that the time sequence relation and the spatial distribution relation among the data and the correlation module based on the mechanism are fused, the water service data is extracted by adopting a cross-modal large model, and potential features in the data can be captured more effectively, so that the reliability and the accuracy of data anomaly identification in the water service data are improved; in the data characteristic representation, a time sequence relationship, a spatial distribution relationship based on graph representation and a mechanism relationship are fused, so that the characteristic representation of the data is more accurate, and the problem of inaccurate recognition caused by insufficient characteristic capacity representation in the normal data abnormal recognition is solved; the data is firstly predicted and then identified, namely, the data to be judged is firstly predicted in multiple rounds by utilizing the neighborhood data (different neighborhood samples), if the variance of the prediction result in the multiple rounds of prediction is larger, the data is identified to be abnormal, and the identification stability of the data abnormality is stronger.

Drawings

FIG. 1 is a flow chart of a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling in the invention.

Fig. 2 is a diagram showing a comparison of a network module used in a transducer network according to an embodiment of the present invention and a weighted attention module according to the present invention.

Fig. 3 is a feature extraction module based on a sequence timing distribution in an embodiment of the invention.

Fig. 4 is a feature aggregation module based on feature space diagram distribution and Kullback-Leibler divergence in an embodiment of the present invention.

FIG. 5 is a cross-modal visual-time series-spatial distribution-mechanism characterization model in an embodiment of the invention.

Fig. 6 is a schematic diagram of position encoding in an embodiment of the invention.

FIG. 7 is a data prediction flow in an embodiment of the invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

The conception, specific structure, and technical effects produced by the present invention will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, features, and effects of the present invention. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present invention based on the embodiments of the present invention. In addition, all the coupling/connection relationships referred to in the patent are not direct connection of the single-finger members, but rather, it means that a better coupling structure can be formed by adding or subtracting coupling aids depending on the specific implementation. The technical features in the invention can be interactively combined on the premise of no contradiction and conflict.

The invention discloses a method for rapidly identifying abnormal data of environmental water affair monitoring and sampling, which comprises the following steps:

s1, given collected water service data, extracting characteristics based on sequence time sequence distribution;

specifically, the step S1 includes the following steps:

s11, giving acquired data

s12, constructing a time sequence feature extraction network based on a transducer which is a network for improving the model training speed by using an attention mechanism for each data class k;

And extracting the characteristics of all the sequence data by using the time sequence characteristic extraction network in the step S12.

Further, the feature of all the sequence data is extracted, and a weighted time sequence attention mechanism can be adopted, and the formula is as follows

Wherein the method comprises the steps of

。

Obtaining the corresponding attention weight matrix based on the time sequence as

。

S2, carrying out feature aggregation based on space diagram distribution and Kullback-Leibler divergence on the features constructed by the data of the same category obtained in the step S1 to obtain feature vectors fused with space distribution information of different time sequence data;

specifically, the step S2 includes the following steps:

s21, arranging the characteristics constructed by the data of the same category obtained in the step S13 according to the geographical position distribution of the data to construct a characteristic diagram, wherein nodes in the characteristic diagram are characteristic vectors of a piece of time sequence data, and the weight calculation mode of edges between two nodes is as follows: the geographic position is defined as

Is constructed such that a mean value is +.>

Covariance is +.>

Is a gaussian distribution of (c); two adjacent nodes in the feature map>

；

S3, in order to fully utilize the relation among different kinds of data, constructing a data correlation matrix based on a mechanism equation for the feature vector;

specifically, the step S3 includes the following steps:

s31, constructing importance scores among different data categories by an industry expert according to hydrodynamic force, water quality and algae generation and elimination change mechanisms;

Wherein the elements of the adjacency matrix are->

Representing class of data uttered by expert->

The construction mode is that

。

S4, based on the feature extraction and correlation matrix construction of the time sequence, the spatial distribution and the mechanism, additionally introducing picture information in data, extracting image features, and constructing a cross-modal vision-time sequence-spatial distribution-mechanism feature characterization model to obtain integral features;

specifically, the step S4 includes the following steps:

s41, based on the characteristic extraction and the correlation matrix construction about time sequence, spatial distribution and mechanism, additionally introducing picture information in the data, such as alum water purifying effect in the water body, wherein the alum water purifying effect in the water body is difficult to be represented by numerical values, and the picture information characteristic can be extracted only by judging according to the picture information and adopting Vision Transformer;

s42, splicing the picture information features with the feature vectors obtained in the step S2;

s43, utilizing the correlation matrix in the step S3

And weighting the spliced features, so that a cross-modal vision-time sequence-spatial distribution-mechanism feature characterization model is integrally constructed, and integral features are obtained.

S5, constructing a feature decoding prediction and prediction network training module, decoding the integral features, outputting a predicted value and training a prediction network;

the step S5 comprises the following steps:

s51, according to the overall characteristics of the data acquired in the step S4, a Multi-layer Multi-Head attention network is adopted as a decoder, the overall characteristics are decoded, and a predicted value of the data at an unknown point (m, n, t, k) is output;

s52, shielding random data in the original data, carrying out position coding, time coding and data type coding on the shielded data, predicting the shielded data by using a prediction network, constructing gradient information for predicting the real value of the shielded data and Euclidean distance between the predicted values as prediction targets, and training the prediction network;

to train a predictive network, given input data, the input data is masked randomly and the network output needs to predict the masked data. And training the prediction network according to the L2 distance between the predicted value and the true value as a loss so as to construct gradient information.

S6, predicting new water service data by utilizing the existing training set data, judging whether distribution deviation occurs or not, and identifying data abnormality;

specifically, the step S6 includes the following steps:

s61, using the existing training set data to generate new data

Extracting q batches of data in the neighborhood of (1)>

The q batches of data are cyclically taken as input of the prediction network to predict the data at the unknown points (m, n, t, k), and the prediction network can output a prediction mean value simultaneously>

And prediction standard deviation->

Thus q predicted values for the unknown point can be obtained +.>

And prediction standard deviation->

；

；

S63, using new data

(i.e. the data of new unknown points (m, n, t, k)) as a gaussian distribution +.>

s63, when the sample point is in one of the Gaussian distributions and the probability density of the new data is lower than a set threshold, identifying that the sample point is abnormal, namely the water service data of the new unknown point (m, n, t, k) is abnormal.

Taking Shenzhen Yan field reservoir basin as an example, consider that the data indexes in the basin include: flow, water temperature, dissolved oxygen, ammonia nitrogen and total nitrogenTotal phosphorus, algae density, etc. Firstly, various water affair related data of various positions and various time periods in the river basin are utilized, wherein the data comprise indexes such as flow, water temperature, dissolved oxygen, ammonia nitrogen, total phosphorus and algae density. Here we note that

For being in place->

At (I) a part of>

The data type collected at the moment is +.>

Wherein the data subscript indicates +.>

Data.

And S1, extracting features based on sequence time sequence distribution. And constructing the data according to the time sequence to obtain a plurality of time sequence data sequences. For example, a certain section

All ammonia nitrogen data in the last half year can be constructed into a time sequence data sequence +.>

. According to the scheme, ammonia nitrogen time sequence data sequences with a plurality of different sections in the river basin can be obtained. And then, performing feature extraction on the time sequence data of the ammonia nitrogen by using a transducer. For a piece of time series data, in order to enable the data with shorter time intervals to have larger correlation, the invention constructs a weighting matrix of the time series data, and when the data characteristics are extracted by using a transducer, a weighting attention layer is additionally added in an intermediate layer, so that the obtained characteristics have stronger representation capability. Specifically, referring to fig. 2, a stack of a plurality of network modules as shown in the left diagram of fig. 2 is included in an encoder of a conventional transducer. However, for the purpose ofThe invention replaces the Multi-Head Attention layer in one of the network modules with the weighted Attention layer proposed by the invention by fully utilizing the time sequence relation among the data.

Therefore, the time sequence characteristics of ammonia nitrogen data at different places can be obtained. These features are constructed as a map from the geographical location of the acquisition of the sequence data. The overall flow is shown in fig. 3. In addition, the same feature extraction scheme is also employed for other types of data (dissolved oxygen, total phosphorus, etc.).

And S2, feature aggregation based on space diagram distribution and Kullback-Leibler divergence. In the last step, the features of the organic nitrogen data are constructed as a graph using their spatial distribution. Each node in the graph is an organic nitrogen feature vector for that slice. If the Kullback-Leibler divergence between two nodes is greater than a certain threshold, an edge exists between the two nodes, and the weight of the edge is the Kullback-Leibler divergence between the two nodes. Under the definition of this graph, the aggregation process for one feature node is as follows. As shown in fig. 4, node 5 is connected to node 2, node 3 and node 4. Wherein the organic nitrogen feature vector of the site is represented. For characteristic vector

The converged feature is expressed as +.>

The feature aggregation mode is: />

Wherein the method comprises the steps of

Representing a weight to be learned, +.>

Representing the Kullback-Leibler divergence between node 2 and node 5.

Performing one aggregation on the nodes represents one aggregation on all the characteristic nodes in the graph. The aggregation process is carried out for a plurality of times, and finally the aggregated data characteristics can be obtained. Here, not only the ammonia nitrogen features but also the time sequence features of other data types are required to be converged, so that feature vectors of spatial distribution information fused with different time sequence data are obtained.

And S3, constructing a data correlation matrix based on a mechanism equation. In the last step, the characteristic vectors of data such as flow, water temperature, dissolved oxygen, ammonia nitrogen, total phosphorus, algae density and the like of different places (different nodes in the figure) are obtained. Each vector already contains timing information as well as spatial distribution information. These feature vectors are further feature extracted in this step. Considering hydrodynamic force, water quality and algae growth and elimination change process mechanism in the river basin, the correlation among indexes such as water temperature, dissolved oxygen, ammonia nitrogen, total phosphorus, algae density and the like is scored by an industry expert, and each score is a triplet which represents the correlation between two different types of data, such as (ammonia nitrogen, dissolved oxygen, 0.1). The correlation score sum between the same type of data and other types of data is set to 1. Thus, a correlation matrix P between the several types of data can be obtained, and then the nth power of the matrix P is calculated to obtain a final correlation matrix.

Similar to the feature extraction step based on sequence time sequence distribution, the weighted attention layer is constructed by utilizing the correlation matrix P, and the feature vectors are further fused by adopting the weighted attention network module, so that the feature representation can contain the mechanism relation among different data.

And S4, cross-modal vision-time sequence-spatial distribution-mechanism characteristic characterization model. According to the feature extraction step based on sequence time sequence distribution, the feature aggregation step based on space diagram distribution and Kullback-Leibler divergence and the data correlation matrix construction step based on a mechanism equation, an image feature extraction step based on vision transformer is added on the basis of the steps for evaluating the eutrophication degree in the water body. The overall feature encoding model is shown in fig. 5.

And S5, feature decoding prediction and training network training. The acquired data are subjected to characteristic representation in the steps, and the time sequence relationship, the topological relationship of spatial distribution and the mechanism relationship among the data are considered in the characteristic representation process. These features are then used to decode the features so that the decoder can predict the organic nitrogen, dissolved oxygen, algae, etc. indicators for unknown time in the unknown area. In the present invention, the decoder section matches the decoder in the transformer, but additionally adds time code and index type code, and predicts the index size of a specific time, a specific position and a specific type as the condition information.

The position code is still represented in the form of a graph network, as shown in fig. 7, if a node is to be represented

The representation of a certain position is completed by only setting the value of the corresponding node to 1 and the values of other nodes to 0. In order to convert the representation of the graph form into a dense vector representation, the graphs shown in the right graph in fig. 6 are still assembled and spliced here in an assembled manner.

The time code is similar to the onehot code type, and for example, considering a period from 5 times in the past to the present, the code indicating the current time may be indicated as "000001", and the time code indicating the last time may be indicated as "000010".

The data type code is also based on onehot code, wherein the flow code is "00000001", the water temperature code is "00000010", the dissolved oxygen code is "00000100", the ammonia nitrogen code is "00001000", the total nitrogen code is "00010000", the total phosphorus code is "00100000", and the algae density code is "01000000".

With the above-described encoding scheme, a decoder can decode the feature and predict the unknown value at the encoding site. The overall flow is shown in fig. 7 below.

Thus, the training network can be trained with a large amount of data. The training samples were constructed as follows: masking a certain data in the original data, performing position coding, time coding and data type coding on the data, and predicting the masked data by using the training network. Its predicted target is the true value of the data and the Euclidean distance between the predicted values.

And S6, identifying data abnormality. As previously described, first, the existing training set data is utilized to generate new data

Extracting q batches of data in the neighborhood of (1)>

. The q lot data is cyclically used as input to a prediction network to predict the data at (m, n, t, k). Since the prediction network can output the prediction mean +.>

And prediction standard deviation

Thus q predicted values +.>

And prediction standard deviation->

. Q Gaussian distributions can be constructed from the predicted mean and predicted variance>

Thus consider new data +>

For one sample point in these gaussian distributions, the probability density of that sample point in these q distributions is calculated. If the sample point is in one of the Gaussian distributions and the probability density of the new data is lower than the manually set threshold, the sample point is considered to be abnormal.

According to the invention, the time sequence relation and the spatial distribution relation among the data and the correlation module based on the mechanism are fused, and the water service data is extracted by adopting a cross-modal large model, so that the potential characteristics in the data can be more effectively captured, and the reliability and the accuracy of data anomaly identification in the water service data are improved; in the data characteristic representation, a time sequence relationship, a spatial distribution relationship based on graph representation and a mechanism relationship are fused, so that the characteristic representation of the data is more accurate, and the problem of inaccurate recognition caused by insufficient characteristic capacity representation in the normal data abnormal recognition is solved; the data is firstly predicted and then identified, namely, the data to be judged is firstly predicted in multiple rounds by utilizing the neighborhood data (different neighborhood samples), if the variance of the prediction result in the multiple rounds of prediction is larger, the data is identified to be abnormal, and the identification stability of the data abnormality is stronger.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims

1. A method for rapidly identifying abnormal data of environmental water affair monitoring and sampling is characterized by comprising the following steps: the method comprises the following steps of:

s2, carrying out feature aggregation based on space diagram distribution and Kullback-Leibler divergence on the features to obtain feature vectors of space distribution information fused with different time sequence data;

s3, constructing a data correlation matrix based on a mechanism equation for the feature vector;

s4, introducing picture information in the data, extracting image features, and constructing a cross-modal vision-time sequence-spatial distribution-mechanism feature characterization model to obtain integral features;

s6, predicting new water service data by using the existing training set data, judging whether distribution deviation occurs or not, and identifying data abnormality.

2. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 1, which is characterized in that: the step S1 comprises the following steps:

s11, giving acquired data

3. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 2, which is characterized in that: in the step S13, features of all the sequence data are extracted, and a weighted time sequence attention mechanism is adopted, and the formula is as follows

Wherein the method comprises the steps of

。

4. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 3, which is characterized in that: obtaining the corresponding attention weight matrix based on the time sequence as

。

5. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 2, which is characterized in that: the step S2 comprises the following steps:

Is constructed such that a mean value is +.>

Covariance is

Is a gaussian distribution of (c); two adjacent nodes in the feature map>

；

6. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 5, which is characterized in that: the step S3 comprises the following steps:

Wherein the elements of the adjacency matrix are->

Representing the marked class of data->

The construction mode is that

。

7. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 6, which is characterized in that: the step S4 comprises the following steps:

s42, splicing the picture information features with the feature vectors;

s43, utilizing the correlation matrix

8. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 7, which is characterized in that: the step S5 comprises the following steps:

9. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 8, which is characterized in that: in the step S51, the overall feature is decoded, and a Multi-layer Multi-Head attention network is used as a decoder.

10. The method for quickly identifying abnormal data of environmental water affair monitoring and sampling according to claim 9, which is characterized in that: the step S6 comprises the following steps:

s61, using the existing training set data to generate new data

Extracting q batches of data in the neighborhood of (1)>

And prediction standard deviation->

Q predicted values +.>

And prediction standard deviation->

；

；

S63, using new data

As Gaussian distribution->