CN114332537B

CN114332537B - A multi-cloud data anomaly detection method and system based on deep learning

Info

Publication number: CN114332537B
Application number: CN202111654479.7A
Authority: CN
Inventors: 郭帆; 王胜辉; 熊昌伟
Original assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Current assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2024-11-26
Anticipated expiration: 2041-12-30
Also published as: CN114332537A

Abstract

The present invention proposes a multi-cloud data anomaly detection method and system based on deep learning. The method includes: inputting the time series of data into a trained deep learning network to obtain a reconstructed sequence of data; applying the time series of data and the reconstructed sequence of data to calculate the reconstruction error, and then applying the reconstruction error to calculate the anomaly score threshold; if the value in the time series of real-time data exceeds the anomaly score threshold, the value is considered to be an anomaly. The solution proposed in the present invention is based on the multi-cloud data anomaly detection implemented by the deep learning network to provide more accurate anomaly alarms, thereby improving the accuracy of multi-cloud data component data, quickly locating data anomalies, and ensuring the stability and robustness of the multi-cloud platform.

Description

Deep learning-based multi-cloud data anomaly detection method and system

Technical Field

The invention belongs to the field of multi-cloud data anomaly detection, and particularly relates to a multi-cloud data anomaly detection method and system based on deep learning.

Background

With the continuous landing of cloud computing services, hybrid clouds become the mainstream IT infrastructure architecture of enterprises. The hybrid cloud management components are numerous, and in order to ensure the access efficiency of users, the information of the multi-cloud components needs to be stored in a lasting mode, so that the hybrid cloud resources are managed in a unified mode. When the amount of the multi-cloud resource data is particularly large, the occurrence of abnormality in timing synchronization of the multi-cloud resource data inevitably affects upper-layer projects (such as multi-cloud cost management, a multi-cloud resource application platform and the like) based on multi-cloud management. Although the alarm mode based on the custom rule can realize the function of anomaly detection when the data of the multi-cloud resources are very small, the alarm mode based on the rule has the problem of poor maintenance after the multi-cloud resources are continuously increased. The non-monitoring anomaly detection of the multi-cloud data based on the deep learning is mainly used for more accurate anomaly alarm, so that the quality of the multi-cloud data is improved, the anomalies of the synchronized data are reduced, and the stability and the robustness of the multi-cloud platform are ensured.

Two pain points exist in the multi-cloud data acquisition abnormal detection. Firstly, the data collected by multiple clouds are numerous, the data abnormal labels do not exist, and only manual marking or non-supervision learning can be selected to realize modeling. Secondly, the data collected by multiple clouds has the characteristics of time, trend, period and the like, and most machine learning algorithms cannot be directly used.

Defects existing in the prior art

1. The data acquired by multiple clouds are numerous, the data abnormal labels do not exist, and only manual marking or selection of business supervision learning can be realized for modeling;

2. The data collected by multiple clouds has the characteristics of time, trend, period and the like, and most machine learning algorithms cannot be directly used.

Disclosure of Invention

In order to solve the technical problems, the invention provides a technical scheme of a cloud data anomaly detection method based on deep learning, so as to solve the technical problems.

The first aspect of the invention discloses a method for detecting anomaly of cloud data based on deep learning; the method comprises the following steps:

Step S1, acquiring a time sequence of data of a plurality of cloud platforms;

Dividing the time sequence of normal data into four subsets, wherein an N1 subset is used for super-parameter selection, an N2 subset is used for deep learning network training, an N3 subset is used for learning of a normal score threshold value, and an N4 subset is used for testing; dividing the time sequence of the abnormal data into two subsets, wherein the subset A1 is used for learning a constant score threshold value, and the subset A2 is used for testing;

S2, inputting the time sequence of the data of the N2 subset into a deep learning network for training, and outputting a reconstruction sequence;

s3, inputting the time sequence of the data of the A1 subset into a trained deep learning network to obtain a reconstruction sequence of the data of the A1 subset;

S4, calculating a reconstruction error of the A1 subset by using the time sequence of the data of the A1 subset and the reconstruction sequence of the data of the A1 subset, and calculating an anomaly score threshold tau by using the reconstruction error of the A1 subset;

and S5, if the numerical value in the time sequence of the real-time data exceeds the abnormal score threshold value, the numerical value is considered to be an abnormal value.

According to the method of the first aspect of the present invention, in the step S4, the specific method for calculating the anomaly score threshold τ includes:

calculating an anomaly estimation threshold value a (A1) of the A1 subset by using the reconstruction error of the A1 subset, wherein the anomaly estimation threshold value is used as an anomaly score threshold value tau, and the specific formula comprises:

a(·)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

Wherein,

Μ and Σ are parameters of normal distribution N (μ, Σ);

e ⁽ⁱ⁾ is the reconstruction error for a subset;

e⁽ⁱ⁾＝X⁽ⁱ⁾-X^'(i)；

x ⁽ⁱ⁾ is the time series of data for a subset;

X ^'(i) is the reconstructed sequence of the data for a subset;

t is the transpose.

According to the method of the first aspect of the present invention, in the step S4, the specific method for calculating the anomaly score threshold τ further includes:

calculating an anomaly estimation threshold value a (N3) of the N3 subset by using the reconstruction error of the N3 subset;

Calculating an AUC value by applying an abnormality estimation threshold a (N3) of the N3 subset, wherein the AUC value is an area value under an ROC curve;

the AUC value is taken as the outlier threshold τ.

calculating an anomaly estimation threshold value a (A1) of the A1 subset by applying the reconstruction error of the A1 subset;

Taking the larger value of the anomaly estimation threshold value and the AUC value of the A1 subset as the anomaly score threshold value tau.

According to the method of the first aspect of the present invention, in the step S2, the input of the deep learning network further includes:

statistical indicators of data for the N2 subset.

According to the method of the first aspect of the present invention, in the step S2, the statistical indicator includes:

maximum value of data of the N2 subset, minimum value of data of the N2 subset, mean value of data of the N2 subset, median of data of the N2 subset, and variance of data of the N2 subset.

The reverse order of the time series of data for the N2 subset.

The invention discloses a multi-cloud data anomaly detection system based on deep learning; the system comprises:

a first processing module configured to collect a time series of data for a plurality of cloud platforms;

the second processing module is configured to input the time sequence of the data of the N2 subset into a deep learning network for training and output a reconstruction sequence;

The third processing module is configured to input the time sequence of the data of the A1 subset into a trained deep learning network to obtain a reconstruction sequence of the data of the A1 subset;

A fourth processing module configured to calculate a reconstruction error of the A1 subset using the time sequence of the data of the A1 subset and the reconstruction sequence of the data of the A1 subset, and then calculate an anomaly score threshold τ using the reconstruction error of the A1 subset;

and a fifth processing module configured to consider the value as an outlier if the value in the time series of real-time data exceeds the outlier score threshold.

A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps in a deep learning-based multi-cloud data anomaly detection method of any one of the first aspects of the present disclosure when the processor executes the computer program.

A fourth aspect of the invention discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in a deep learning based multi-cloud data anomaly detection method of any one of the first aspects of the present disclosure.

According to the scheme provided by the invention, the abnormal detection of the multi-cloud data is realized based on the deep learning network, so that the accuracy of the data of the multi-cloud data assembly is improved, the data abnormality is rapidly positioned, and the stability and the robustness of the multi-cloud platform are ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting anomalies in cloud data based on deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the LSTM encoding and decoding process according to an embodiment of the invention;

FIG. 3 is a block diagram of a deep learning based multi-cloud data anomaly detection system according to an embodiment of the present invention;

fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention discloses a cloud data anomaly detection method based on deep learning. Fig. 1 is a flowchart of a method for detecting multi-cloud data anomalies based on deep learning according to an embodiment of the present invention, as shown in fig. 1, the method includes:

Step S1, acquiring a time sequence of data of a plurality of cloud platforms;

in some embodiments, in the step S2, the input of the deep learning network further includes:

statistical indicators of data of the N2 subset;

in some embodiments, in the step S2, the statistical indicator includes:

Maximum value of data of the N2 subset, minimum value of data of the N2 subset, mean value of data of the N2 subset, median of data of the N2 subset and variance of data of the N2 subset;

In some embodiments, the count_above_mean and the count_below_mean are also selected as features, respectively, to calculate the number of elements greater than the mean and the number of elements less than the mean; the method comprises the steps of firstly calculating a mean value, then calculating the difference value between each time sequence point and the mean value, and then calculating the difference value number larger than zero and the difference value number smaller than zero by a neural network module;

reverse order of the time series of data for the N2 subset;

In some embodiments, the deep learning network employs LSTM, autoencoder-LSTM consisting of two LSTM units, one as encoder Encoder and the other as Decoder; encoder (encoder) encodes the training data set into a code (hidden layer), then decodes the code through a Decoder, and simultaneously trains the encoder and the Decoder during training, so that the decoded information and the information before encoding are as similar as possible; the input to the encoder is a dataset with dimensions < MB, T, D >, where MB (Batch size) is the number of data windows contained in the batch, T (LSTM unit input window length) is the number of data points in each data window, D represents the data dimension; in the initialization phase, MB and T perform parameter learning at the decoder;

As shown in fig. 2, because the architecture includes a plurality of data windows, in stage Encoder, the encoded output of the last data window is taken as the input of the next time window; when the last data window is encoded, the output can be considered as a hidden layer data structure; copying the output data to be used as the input of a decoder, and performing decoding operation on each previous data window at one time;

In some embodiments, in the step S4, the specific method for calculating the anomaly score threshold τ includes:

a(·)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

Wherein,

Μ and Σ are parameters of normal distribution N (μ, Σ);

e ⁽ⁱ⁾ is the reconstruction error for a subset;

e⁽ⁱ⁾＝X⁽ⁱ⁾-X^'(i)；

x ⁽ⁱ⁾ is the time series of data for a subset;

X ^'(i) is the reconstructed sequence of the data for a subset;

t is the transpose;

in some embodiments, in the step S4, the specific method for calculating the anomaly score threshold τ further includes:

Taking the AUC value as an anomaly score threshold value tau;

taking the larger value of the abnormality estimation threshold value and the AUC value of the A1 subset as an abnormality score threshold value tau;

In some embodiments, the deep learning network is updated online, if used for streaming data, because the initial training data set is relatively small and simple, and existing models for anomaly detection accuracy may be degraded because anomaly detection varies over time. Therefore, updating of the model of the deep learning network is indispensable.

After initial training Autoencoder-LSTM model, the initial model parameters that have been trained can be predicted online. An online anomaly detection model is managed using a multi-threaded approach.

When real-time abnormality detection is performed, the sub-threads collect data generated in real time, and meanwhile, the main thread works for real-time abnormality detection. For each window in the batch, each data is reconstructed and an anomaly score is calculated by using the formula:

a(i)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

The system will maintain two data buffers, one for the normal data window and the other for the abnormal data window. The window reconstruction error is small in consideration of normal data, while the abnormal data reconstruction error is large/new data features are generated. This reconstruction error level may be measured by a predefined reconstruction error distribution or error distribution.

In summary, the scheme provided by the invention can realize more accurate abnormality warning based on the cloud data abnormality detection realized by the deep learning network, thereby improving the accuracy of the cloud data assembly data, rapidly positioning the data abnormality and ensuring the stability and the robustness of the cloud platform.

The invention discloses a multi-cloud data anomaly detection system based on deep learning. FIG. 3 is a block diagram of a deep learning based multi-cloud data anomaly detection system according to an embodiment of the present invention; as shown in fig. 3, the system 100 includes:

a first processing module 101 configured to collect a time series of data of a plurality of cloud platforms;

A second processing module 102 configured to input a time sequence of data of the N2 subset into a deep learning network for training and output a reconstruction sequence;

a third processing module 103, configured to input the time sequence of the data of the A1 subset into a trained deep learning network, to obtain a reconstructed sequence of the data of the A1 subset;

A fourth processing module 104 configured to calculate a reconstruction error of the A1 subset using the time series of data of the A1 subset and the reconstruction series of data of the A1 subset, and then calculate an anomaly score threshold τ using the reconstruction error of the A1 subset;

a fifth processing module 105 is configured to consider the value as an outlier if the value in the time series of real-time data exceeds the outlier score threshold.

The system according to the second aspect of the present invention, the second processing module 102 is specifically configured such that the input to the deep learning network further includes:

statistical indicators of data of the N2 subset;

The statistical index comprises:

the count_above_mean is also selected as a feature, and the count_below_mean is respectively used for calculating the number of elements larger than the mean value and the number of elements smaller than the mean value; the method comprises the steps of firstly calculating a mean value, then calculating the difference value between each time sequence point and the mean value, and then calculating the difference value number larger than zero and the difference value number smaller than zero by a neural network module;

the input of the deep learning network further comprises:

reverse order of the time series of data for the N2 subset;

The deep learning network adopts LSTM, autoencoder-LSTM consists of two LSTM units, one is used as an encoder Encoder, and the other is used as a Decoder; encoder (encoder) encodes the training data set into a code (hidden layer), then decodes the code through a Decoder, and simultaneously trains the encoder and the Decoder during training, so that the decoded information and the information before encoding are as similar as possible; the input to the encoder is a dataset with dimensions < MB, T, D >, where MB (Batch size) is the number of data windows contained in the batch, T (LSTM unit input window length) is the number of data points in each data window, D represents the data dimension; in the initialization phase, MB and T perform parameter learning at the decoder;

As shown in fig. 2, because the architecture includes a plurality of data windows, in stage Encoder, the encoded output of the last data window is taken as the input of the next time window; when the last data window is encoded, the output can be considered as a hidden layer data structure; the output data is copied one at a time as input to the decoder, and each preceding window of data is decoded at a time.

According to the system of the second aspect of the present invention, the fourth processing module 104 is specifically configured such that the specific method for calculating the anomaly score threshold τ includes:

a(·)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

Wherein,

Μ and Σ are parameters of normal distribution N (μ, Σ);

e ⁽ⁱ⁾ is the reconstruction error for a subset;

e⁽ⁱ⁾＝X⁽ⁱ⁾-X^'(i)；

x ⁽ⁱ⁾ is the time series of data for a subset;

X ^'(i) is the reconstructed sequence of the data for a subset;

t is the transpose;

The specific method for calculating the anomaly score threshold tau further comprises the following steps:

Taking the AUC value as an anomaly score threshold value tau;

A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps in the deep learning-based multi-cloud data anomaly detection method according to any one of the first aspect of the present disclosure when executing the computer program.

Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the electronic device is used for conducting wired or wireless communication with an external terminal, and the wireless communication can be achieved through WIFI, an operator network, near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 4 is merely a block diagram of a portion related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the technical solution of the present disclosure is applied, and a specific electronic device may include more or less components than those shown in the drawings, or may combine some components, or have different component arrangements.

A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the deep learning-based cloudy data anomaly detection method of any one of the first aspects of the present disclosure.

Note that the technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be regarded as the scope of the description. The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The method for detecting the anomaly of the cloud data based on the deep learning is characterized by comprising the following steps of:

Step S1, acquiring a time sequence of data of a plurality of cloud platforms;

Step S5, if the numerical value in the time sequence of the real-time data exceeds the abnormal score threshold value, the numerical value is considered to be an abnormal value;

in the step S4, the specific method for calculating the anomaly score threshold τ includes:

a(·)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

Wherein,

Μ and Σ are parameters of normal distribution N (μ, Σ);

e ⁽ⁱ⁾ is the reconstruction error for a subset;

e⁽ⁱ⁾＝X⁽ⁱ⁾-X^'(i)；

x ⁽ⁱ⁾ is the time series of data for a subset;

X ^'(i) is the reconstructed sequence of the data for a subset;

t is the transpose;

In the step S4, the specific method for calculating the anomaly score threshold τ further includes:

Taking the AUC value as an anomaly score threshold value tau;

2. The deep learning-based cloudy data anomaly detection method according to claim 1, wherein in the step S2, the input of the deep learning network further comprises:

statistical indicators of data for the N2 subset.

3. The deep learning-based cloudy data anomaly detection method according to claim 2, wherein in the step S2, the statistical index comprises:

4. The deep learning-based cloudy data anomaly detection method according to claim 1, wherein in the step S2, the input of the deep learning network further comprises:

The reverse order of the time series of data for the N2 subset.

5. A multi-cloud data anomaly detection system for deep learning, the system comprising:

the calculating of the anomaly score threshold tau specifically includes:

a(·)＝(e⁽ⁱ⁾-μ)^TΣ^-1(e⁽ⁱ⁾-μ)

Wherein,

Μ and Σ are parameters of normal distribution N (μ, Σ);

e ⁽ⁱ⁾ is the reconstruction error for a subset;

e⁽ⁱ⁾＝X⁽ⁱ⁾-X^'(i)；

x ⁽ⁱ⁾ is the time series of data for a subset;

X ^'(i) is the reconstructed sequence of the data for a subset;

t is the transpose;

The calculating the anomaly score threshold tau specifically further includes:

Taking the AUC value as an anomaly score threshold value tau;

The calculating the anomaly score threshold tau specifically further includes:

6. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps in a deep learning based multi-cloud data anomaly detection method of any one of claims 1 to 4 when the computer program is executed.

7. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a deep learning based multi-cloud data anomaly detection method according to any one of claims 1 to 4.