CN118094446B - Anaerobic system running condition intelligent analysis method based on machine learning - Google Patents
Anaerobic system running condition intelligent analysis method based on machine learning Download PDFInfo
- Publication number
- CN118094446B CN118094446B CN202410494747.0A CN202410494747A CN118094446B CN 118094446 B CN118094446 B CN 118094446B CN 202410494747 A CN202410494747 A CN 202410494747A CN 118094446 B CN118094446 B CN 118094446B
- Authority
- CN
- China
- Prior art keywords
- data
- difference
- curve
- correlation
- curves
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 24
- 238000010801 machine learning Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000002159 abnormal effect Effects 0.000 claims abstract description 10
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 7
- 230000000875 corresponding effect Effects 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 28
- 238000013507 mapping Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000007621 cluster analysis Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000010865 sewage Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Discrete Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of data cluster analysis, in particular to an intelligent analysis method for the running condition of an anaerobic system based on machine learning. The method obtains the change correlation between the data curves of the running data with different dimensions, and further obtains the correlation weight. And further analyzing the data points in the neighborhood range of the corresponding time of the data points, and combining the correlation weights to obtain fluctuation consistency. Based on the relative fluctuation consistency between dimensions and the data difference in the neighborhood range of the data points, the fluctuation characteristics of the data points on the time sequence can be accurately represented, further, the accurate clustering distance is obtained, the clustering result is obtained to participate in data prediction, and whether the anaerobic system is abnormal is judged. According to the invention, through analyzing different dimensionalities and time sequences of the operation data of the anaerobic system, accurate data clustering results are obtained through a machine learning algorithm, and data prediction is performed, so that the operation condition of the anaerobic system is effectively evaluated.
Description
Technical Field
The invention relates to the technical field of data cluster analysis, in particular to an intelligent analysis method for the running condition of an anaerobic system based on machine learning.
Background
The anaerobic system is a common system in the sewage treatment process, the anaerobic system needs to monitor and analyze generated data in time in the working process, and system parameters can be adjusted in time by adopting corresponding measures by knowing the operation state of the anaerobic system, so that the operation efficiency and the treatment effect are improved.
The operation data of the anaerobic system has multiple dimensions, the history data can be analyzed by using a machine learning algorithm in the prior art, the history data with the same distribution are used as one type by clustering the history data with multiple dimensions, and the data is predicted by adopting a machine learning method aiming at the data, so that the operation state is determined. However, because the operation data of the anaerobic system is time sequence data, a certain fluctuation change exists in the time sequence data sequence of the same dimension, and a certain relevance and hysteresis of the data change exist between different dimensions, the clustering can be caused by directly utilizing numerical values to carry out clustering, so that the clustering clusters are not clearly divided, and further larger errors are caused in prediction, and the operation condition of the anaerobic system cannot be accurately estimated.
Disclosure of Invention
In order to solve the technical problems that in the prior art, the relevance and hysteresis of the anaerobic system operation data in time sequences are not considered, so that the clustering result is poor, the prediction is generated, and the operation condition of the anaerobic system cannot be accurately estimated, the invention aims to provide an intelligent analysis method for the operation condition of the anaerobic system based on machine learning, which adopts the following technical scheme:
The invention provides an intelligent analysis method for the running condition of an anaerobic system based on machine learning, which comprises the following steps:
acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the dimension class includes at least temperature, pressure and flow;
Obtaining the change correlation of the data curves between different dimensions according to the change trend difference between the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves;
Acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, obtaining a time sequence distribution characteristic of each data point according to the difference weight and the first data difference;
Adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; and respectively taking the data in each cluster as a basis to conduct data prediction, and judging whether the anaerobic system is abnormal or not according to a data prediction result.
Further, the method for acquiring the change correlation includes:
And acquiring the second derivative absolute value of each data point position on the data curve, calculating the second derivative absolute value difference of the data points at the same position between the data curves of two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves.
Further, the method for acquiring the correlation weight comprises the following steps:
Under the same sampling time, acquiring the first derivative difference of data points between each data curve and each other data curve, multiplying the change correlation between the two data curves by the corresponding first derivative difference, and acquiring the weighted change rate difference between the data curve and each other data curve under the same sampling time;
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time.
Further, the method for acquiring the fluctuation consistency comprises the following steps:
Optionally selecting one data curve as a target data curve, wherein data points on the target data curve are target data points, and data points in a preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points; acquiring a second derivative absolute value difference between the target data point and each comparison data point, and selecting the smallest second derivative absolute value difference as a reference change trend difference; taking the correlation weight product at the corresponding time of the reference change trend difference and the target data curve as a weighted change trend difference, accumulating the weighted change trend differences at all times on the target data curve and other data curves, and then carrying out negative correlation mapping and normalization to obtain fluctuation consistency between the target data curve and each other data curve; changing the target data curve to obtain the fluctuation consistency of each data curve relative to each other data curve.
Further, the method for obtaining the difference weight comprises the following steps:
obtaining the difference weight according to a difference weight calculation formula, wherein the difference weight calculation formula comprises:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
Further, the method for acquiring the time sequence distribution characteristic comprises the following steps:
in each data curve, taking the product of the difference weight and the first data difference as a weighted data difference between a corresponding data point and a neighborhood data point, and normalizing the average value of the weighted data differences of all neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point.
Further, the adjusting the cluster distance between data points according to the timing distribution feature includes:
obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; and the adjusted clustering distance is positively correlated with the time sequence distribution characteristic difference.
Further, constructing a connected graph according to the adjusted clustering distance, and splitting the connected graph by utilizing a connected graph dynamic splitting clustering algorithm to obtain the cluster.
Further, a prediction model is constructed by utilizing ARIMA, and the data prediction result is obtained.
Further, the determining whether the anaerobic system is abnormal according to the data prediction result includes:
And obtaining a difference distance between the data prediction result and actual data at the corresponding moment, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value.
The invention has the following beneficial effects:
According to the embodiment of the invention, the change correlation between the data curves of the running data in different dimensions is obtained, and the change correlation can be used for primarily evaluating the change correlation characteristics of the data values between the two dimensions. In order to analyze the data correlation of the data points on the data curves in all dimensions, a correlation weight is further obtained, the correlation weight is analyzed based on the data curves in all dimensions, and the correlation characteristics of the data points on one data curve can be accurately quantified. Considering that certain fluctuation and hysteresis exist in time sequence data, analysis also needs to analyze data points in a neighborhood range of corresponding time of the data points, and fluctuation consistency is obtained by combining correlation weights. The fluctuation consistency can accurately represent the data curve change correlation of one dimension relative to the other dimension on the basis of considering the data hysteresis, and the larger the correlation is, the more the two types of data belong to the same distribution. The time sequence distribution feature can accurately represent the fluctuation feature of the data point on time sequence based on the relative fluctuation consistency among dimensions and the data difference in the neighborhood range of the data point by combining the analysis of the multidimensional data. Based on the time sequence distribution characteristics, accurate clustering distance can be obtained, excellent clustering results are obtained to participate in data prediction, and whether the anaerobic system is abnormal or not is accurately judged.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an intelligent analysis method for an anaerobic system operation condition based on machine learning according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the intelligent analysis method for the anaerobic system operation condition based on machine learning according to the invention, which is provided by the invention, with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the intelligent analysis method for the running condition of the anaerobic system based on machine learning.
Referring to fig. 1, a flow chart of an intelligent analysis method for an anaerobic system operation condition based on machine learning according to an embodiment of the present invention is shown, where the method includes:
Step S1: acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the classes of dimensions include at least temperature, pressure, and flow.
The important operation data of the anaerobic system in the sewage treatment process are temperature, pressure and flow. Anaerobic reactions are affected by temperature and pressure, while flow can characterize the input and output of substances, so data acquisition for anaerobic systems needs to include at least three dimensions. In the embodiment of the invention, the operation data is collected every 30 seconds, and the collected data is used as data points to form a data curve according to the time sequence.
It should be noted that, because the embodiment of the present invention aims to analyze the operation condition of the anaerobic system under the real-time operation, all the data involved in the prediction are the historical data before the real-time data, and in one embodiment of the present invention, the historical time period is set to one hour, that is, the data one hour before the real-time moment is used as the data base of the subsequent clustering and the prediction data.
Step S2: obtaining the change correlation of the data curves among different dimensions according to the change trend difference among the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; and obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves.
The change of the operation data in the anaerobic system affects the result of the water quality index, so that the change trend of the operation data is an important data characteristic for the anaerobic system. Because the invention aims at classifying the data with the same distribution into one type and further predicting and analyzing the data, firstly, the correlation among the running data with different dimensions needs to be analyzed, the stronger the correlation is, the more important the data with the dimension is for an anaerobic system, and the more the correlation among the dimensions needs to be considered when the data on the subsequent analysis time sequence fluctuates.
Firstly, the change correlation of the data curves between different dimensions can be obtained according to the change trend difference between the data curves, namely, the larger the change trend difference is, the smaller the change correlation of the two data curves is.
Preferably, in one embodiment of the present invention, the method for acquiring the change correlation includes:
And acquiring the second derivative absolute value of each data point position on the data curve, namely, utilizing the second derivative absolute value to represent the stability of the change under the corresponding position, wherein the larger the absolute value is, the more drastic the change trend is. And calculating the second derivative absolute value difference of the data points at the same position between the data curves of the two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves. I.e. the larger the average value of the second derivative absolute difference, the more relevant the trend of the data change between the two data curves. In one embodiment of the invention, the change correlation is formulated as:
; wherein, Is the firstData curve and the first dimensionThe varying correlation between the data curves of the individual dimensions,As an exponential function based on natural constants,The number of data points on the data curve,Is the firstOn the data curve of each dimensionThe second derivative of the data point,Is the firstOn the data curve of each dimensionSecond derivative of data points.
In the change correlation formula, the data is mapped and normalized in a negative correlation manner by using an exponential function based on a natural constant, and in other embodiments of the present invention, other basic mathematical operations may be used to achieve the purpose of mapping and normalizing the negative correlation, which is a technical means well known to those skilled in the art, and will not be described herein.
The change correlation is the correlation obtained by initial analysis, and in actual situations, there is hysteresis in the change of anaerobic data in time sequence, for example, the influence of flow data on the water quality index is real-time, and the anaerobic reaction change influenced by the temperature change needs a period of time to obviously change the water quality index, i.e. there is a certain hysteresis in the temperature data. Therefore, for the data point in one dimension, not only the variation trend difference at the same moment between dimensions, but also the variation trend difference in the neighborhood range at the corresponding moment should be analyzed, so that the influence of hysteresis on the data result is avoided. In the embodiment of the invention, before considering hysteresis, the correlation weight at each sampling time on each data curve is required to be obtained, and the correlation feature between the next dimension and other dimensions at the same time is characterized by using the correlation weight. It is therefore necessary to obtain a correlation weight at each sampling time based on the change correlation between each data curve and all other data curves and the change trend difference at the sampling time. And further weighting the variation trend difference in the neighborhood range of the data points corresponding to the data curve and other data curves by the correlation weight, so that the final result comprises the correlation characteristics at the same moment of different dimensions and also comprises the characteristic of data point hysteresis on time sequence, and further, the fluctuation consistency is obtained.
Preferably, in one embodiment of the present invention, the method for acquiring the correlation weight includes:
And acquiring the first derivative difference of the data points between each data curve and each other data curve at the same sampling time, wherein the first derivative represents the change rate of the corresponding position, and the larger the first derivative difference is, the larger the difference of the change degree and the change direction of the data points of the corresponding position is. The change correlation between two data curves is multiplied by the corresponding first derivative difference to obtain a weighted change rate difference between the data curve and each other data curve at the same sampling time. The larger the change correlation is, the stronger the correlation between the corresponding two dimensions is, and the higher the confidence of the data reflected between the two dimensions is, namely, the change correlation is the confidence weight of the first derivative difference.
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time. That is, the larger the weighted change rate difference is, the larger the change trend difference between the weighted change rate difference and other dimensions is at the corresponding sampling time on the data curve, and the smaller the correlation weight is.
In one embodiment of the invention, the exponential function based on natural constant is utilized for carrying out negative correlation mapping and normalization, and the specific correlation weight is expressed as follows by a formula:
; wherein, Is the firstOn the data curve of each dimensionThe correlation weight of the data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The varying correlation between the various other data curves,Is the firstOn the data curveThe first derivative of the data point is used,Is the firstOn the other data curveFirst derivative of data points.
Preferably, in one embodiment of the present invention, the method for acquiring fluctuation consistency includes:
optionally, one data curve is used as a target data curve, the data points on the target data curve are target data points, and the data points in the preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points. In the embodiment of the invention, the neighborhood range is a range with a length of 5 by taking the corresponding time as the center, namely 5 comparison data points exist in one neighborhood range.
Similar to the change correlation in one embodiment of the invention, the second derivative absolute value difference between the target data point and each of the comparison data points is obtained. Since there are a plurality of comparison data points, there are a plurality of second derivative absolute value differences, and in order to avoid data errors caused by excessively large data, it is necessary to select the smallest second derivative absolute value difference as the reference variation trend difference.
And taking the correlation weight product of the reference change trend difference and the corresponding time of the target data curve as the weighted change trend difference. The larger the correlation weight is, the more important the data point at the corresponding position is, and the larger the represented data confidence is, the reference change trend difference needs to be amplified.
And accumulating the weighted change trend differences of the target data curve and all the moments on other data curves, and then carrying out negative correlation mapping and normalization to obtain the fluctuation consistency between the target data curve and each other data curve. That is, the larger the weighted change trend difference is, the smaller the data change correlation between the target data curve and other data curves is, the smaller the fluctuation consistency is. Changing the target data curve to obtain fluctuation consistency of each data curve relative to each other data curve.
In one embodiment of the invention, the following is the firstTaking a data curve of each dimension as an example of a target data curve, the fluctuation consistency is expressed as follows:
; wherein, Represent the firstData curve and the first dimensionFluctuation consistency between data curves of the individual dimensions,As an exponential function based on natural constants,The number of data points on the data curve,A function is selected for the minimum value and,Is the firstOn the data curve of each dimensionThe second derivative of the data point,Is the firstThe first dimension of the data curveThe second derivative of each of the comparison data points.
Step S3: acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, a time sequence distribution characteristic of each data point is obtained according to the difference weight and the first data difference.
In step S2, the correlation of the data curves between the dimensions is obtained by analyzing the variation trend differences and the hysteresis of the data between the different dimensions. Based on the relevance, the fluctuation change condition of each data point on the time sequence can be analyzed, the final cluster division can be clearer by taking the fluctuation change condition on the time sequence as the basis of the subsequent cluster analysis, and fluctuation consistency is required to be introduced when the data fluctuation on the time sequence is analyzed. Therefore, for one moment, a first data difference between a data point corresponding to the moment on each data curve and a neighborhood data point in a preset neighborhood range is obtained, fluctuation consistency between each data curve and other data curves in different dimensions is further considered, and a difference weight of the corresponding neighborhood data point is obtained according to the fluctuation consistency and the first data difference on the other data curves. That is, the difference weights are combined with the relative relationship between one dimension and the other dimension, and the difference weights corresponding to the first data differences in the target dimension are determined through the fluctuation consistency and the first data differences in the other dimension. And weighting the first data difference between the data point and the neighborhood data point on the data curve according to the difference weight, so as to obtain the time sequence distribution characteristic of each data point. The time sequence distribution characteristics represent fluctuation and change characteristic conditions of data points on time sequence, so that clustering based on differences of the time sequence distribution characteristics can optimize a clustering result.
Preferably, in one embodiment of the present invention, the difference weight is obtained according to a difference weight calculation formula including:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
In the calculation formula of the difference weight, the fluctuation consistency is used as the first data difference between the confidence weight and the corresponding dimension to be multiplied, namely, the larger the fluctuation consistency is, the more relevant the data between the two dimensions is, the more important the represented data features are, and the features need to be amplified through multiplication. While a larger first difference on the other data curves indicates a smaller correlation between the data point and the neighborhood data point, the smaller the difference weight.
Preferably, in one embodiment of the present invention, the method for acquiring the timing distribution feature includes:
in each data curve, taking the product of the difference weight and the first data difference as the weighted data difference between the corresponding data point and the neighborhood data point, and normalizing the average value of the weighted data differences of all the neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point. In one embodiment of the invention, the timing distribution characteristics are formulated as:
; wherein the method comprises the steps of Is the firstOn the data curveThe timing distribution characteristics of the individual data points,As the number of data points in the neighborhood,As a function of the normalization,Is the firstOn the data curveThe data value of the data point,Is the firstOn the data curveData point ofThe data values of the individual neighborhood data points,Is the firstOn the data curveData point ofThe differential weight of the individual neighborhood data points. In the formula of the time sequence distribution characteristic, fluctuation of a data value in time sequence is represented through the difference between the data point and the neighborhood data point, and further because the difference weight represents the data relevance of the corresponding position in other dimensions, the difference weight can be used as the confidence weight of the first data difference, namely, the larger the difference weight is, the smaller the data difference of the corresponding position in other dimensions is indicated, the more stable the position is, and the reference degree of the data characteristic represented in the position is larger.
Step S4: adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; and respectively carrying out data prediction by taking the data in each cluster as a basis, and judging whether the anaerobic system is abnormal or not according to the data prediction result.
The clustering distance between the data points can be adjusted based on the time sequence distribution characteristics, and it is to be noted that the initial clustering distance in the conventional clustering process is usually the Euclidean distance between two data points, and the adjusted clustering distance can be obtained by introducing the difference of the time sequence distribution characteristics into the Euclidean distance. And clustering based on the adjusted clustering distance to obtain the clustering clusters with clear classification. Because one type of data is represented in one cluster, data in each cluster can be used as basic data to realize data prediction, and whether the current anaerobic system is abnormal can be judged according to the data prediction result.
Preferably, in one embodiment of the present invention, adjusting the cluster distance between data points according to the timing distribution feature comprises:
Obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; the adjusted clustering distance is positively correlated with the characteristic difference of the time sequence distribution. In one embodiment of the invention, the adjusted cluster distance is formulated as:
; wherein the method comprises the steps of In order to adjust the clustering distance after the adjustment,Is the firstThe initial cluster distance between the data points,Is the firstTime series distribution characteristic differences between data points.
In one embodiment of the invention, a connected graph is constructed according to the adjusted clustering distance, and a connected graph dynamic splitting clustering algorithm is utilized to split the connected graph to obtain a cluster. And constructing a prediction model by utilizing ARIMA to obtain the data prediction result. It should be noted that, the connected graph dynamic split clustering algorithm and the ARIMA data prediction method are all technical means well known to those skilled in the art, and are not described herein.
Preferably, determining whether the anaerobic system is abnormal according to the data prediction result in one embodiment of the present invention includes:
and obtaining a difference distance between a data prediction result and actual data at a corresponding moment, namely, representing the deviation condition of the running data of the current anaerobic system by the difference distance, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value. In one embodiment of the present invention, the difference distance is a euclidean distance between the data prediction result and the actual data, and the judgment threshold is set to 0.6 after normalizing the obtained euclidean distance for convenience of operation.
In summary, the embodiment of the present invention obtains the correlation between the data curves of the running data in different dimensions, so as to analyze the data correlation of the data points on the data curves in all dimensions, and further obtain the correlation weight. In consideration of certain fluctuation and hysteresis of time sequence data, analysis also needs to analyze data points in a neighborhood range of corresponding time of the data points, and fluctuation consistency is obtained by combining correlation weights. The time sequence distribution feature can accurately represent the fluctuation feature of the data point on time sequence based on the relative fluctuation consistency among dimensions and the data difference in the neighborhood range of the data point by combining the analysis of the multidimensional data. Based on the time sequence distribution characteristics, accurate clustering distance can be obtained, clustering results are obtained to participate in data prediction, and whether the anaerobic system is abnormal or not is judged. According to the invention, through analyzing different dimensionalities and time sequences of the operation data of the anaerobic system, accurate data clustering results are obtained through a machine learning algorithm, and data prediction is performed, so that the operation condition of the anaerobic system is effectively evaluated.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
Claims (8)
1. An intelligent analysis method for the running condition of an anaerobic system based on machine learning is characterized by comprising the following steps:
acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the dimension class includes at least temperature, pressure and flow;
Obtaining the change correlation of the data curves between different dimensions according to the change trend difference between the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves;
Acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, obtaining a time sequence distribution characteristic of each data point according to the difference weight and the first data difference;
Adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; respectively taking the data in each cluster as a basis to carry out data prediction, and judging whether the anaerobic system is abnormal or not according to a data prediction result;
The adjusting the clustering distance between the data points according to the time sequence distribution feature comprises:
Obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; the adjusted clustering distance is positively correlated with the time sequence distribution characteristic difference;
Respectively taking the data in each cluster as a basis to carry out data prediction, wherein the method comprises the following steps:
And constructing a prediction model by utilizing ARIMA to obtain the data prediction result.
2. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the obtaining method of the change correlation comprises the following steps:
And acquiring the second derivative absolute value of each data point position on the data curve, calculating the second derivative absolute value difference of the data points at the same position between the data curves of two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves.
3. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the method for acquiring the correlation weight comprises the following steps:
Under the same sampling time, acquiring the first derivative difference of data points between each data curve and each other data curve, multiplying the change correlation between the two data curves by the corresponding first derivative difference, and acquiring the weighted change rate difference between the data curve and each other data curve under the same sampling time;
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time.
4. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the obtaining method for the fluctuation consistency comprises the following steps:
Optionally selecting one data curve as a target data curve, wherein data points on the target data curve are target data points, and data points in a preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points; acquiring a second derivative absolute value difference between the target data point and each comparison data point, and selecting the smallest second derivative absolute value difference as a reference change trend difference; taking the correlation weight product at the corresponding time of the reference change trend difference and the target data curve as a weighted change trend difference, accumulating the weighted change trend differences at all times on the target data curve and other data curves, and then carrying out negative correlation mapping and normalization to obtain fluctuation consistency between the target data curve and each other data curve; changing the target data curve to obtain the fluctuation consistency of each data curve relative to each other data curve.
5. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the method for obtaining the difference weight comprises the following steps:
obtaining the difference weight according to a difference weight calculation formula, wherein the difference weight calculation formula comprises:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
6. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 5, wherein the acquisition method for the time sequence distribution characteristics comprises the following steps:
in each data curve, taking the product of the difference weight and the first data difference as a weighted data difference between a corresponding data point and a neighborhood data point, and normalizing the average value of the weighted data differences of all neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point.
7. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1 is characterized in that a connected graph is constructed according to the adjusted clustering distance, and a connected graph dynamic splitting clustering algorithm is utilized to split the connected graph, so that the cluster is obtained.
8. The intelligent analysis method for the running condition of the anaerobic system based on the machine learning according to claim 1, wherein the judging whether the anaerobic system is abnormal according to the data prediction result comprises:
And obtaining a difference distance between the data prediction result and actual data at the corresponding moment, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410494747.0A CN118094446B (en) | 2024-04-24 | 2024-04-24 | Anaerobic system running condition intelligent analysis method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410494747.0A CN118094446B (en) | 2024-04-24 | 2024-04-24 | Anaerobic system running condition intelligent analysis method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118094446A CN118094446A (en) | 2024-05-28 |
CN118094446B true CN118094446B (en) | 2024-08-02 |
Family
ID=91153436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410494747.0A Active CN118094446B (en) | 2024-04-24 | 2024-04-24 | Anaerobic system running condition intelligent analysis method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118094446B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118311362B (en) * | 2024-06-07 | 2024-08-09 | 湖州积微电子科技有限公司 | Running state monitoring method for energy-saving medium-power direct-current speed regulating device |
CN118365173B (en) * | 2024-06-17 | 2024-09-10 | 湖南防灾科技有限公司 | Intelligent climate effect assessment method for large-scale wind power plant |
CN118708772B (en) * | 2024-09-02 | 2024-10-25 | 济南科金信息技术有限公司 | Data storage method for intelligent chemical field management system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114480103A (en) * | 2022-01-05 | 2022-05-13 | 上海市机电设计研究院有限公司 | Vertical dry-process anaerobic reactor system and control method thereof |
EP4060433A1 (en) * | 2021-03-19 | 2022-09-21 | Siemens Aktiengesellschaft | Method and system for predicting the operation of a technical system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020064681A1 (en) * | 2018-09-24 | 2020-04-02 | Novo Nordisk A/S | System for enhancing data quality of dispense data sets |
CN116226484B (en) * | 2023-05-05 | 2023-07-04 | 北京视酷科技有限公司 | Ultrafiltration water treatment device monitoring data management system |
-
2024
- 2024-04-24 CN CN202410494747.0A patent/CN118094446B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4060433A1 (en) * | 2021-03-19 | 2022-09-21 | Siemens Aktiengesellschaft | Method and system for predicting the operation of a technical system |
CN114480103A (en) * | 2022-01-05 | 2022-05-13 | 上海市机电设计研究院有限公司 | Vertical dry-process anaerobic reactor system and control method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN118094446A (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN118094446B (en) | Anaerobic system running condition intelligent analysis method based on machine learning | |
CN116186634B (en) | Intelligent management system for construction data of building engineering | |
CN115933787B (en) | Indoor multi-terminal intelligent control system based on indoor environment monitoring | |
CN111199016A (en) | DTW-based improved K-means daily load curve clustering method | |
CN116680661B (en) | Multi-dimensional data-based automatic gas regulator pressure monitoring method | |
CN117786584B (en) | Big data analysis-based method and system for monitoring and early warning of water source pollution in animal husbandry | |
CN116307944B (en) | Distribution box remote monitoring system based on artificial intelligence and Internet of things | |
CN116400126B (en) | Low-voltage power box with data processing system | |
CN108460486A (en) | A kind of voltage deviation prediction technique based on improvement clustering algorithm and neural network | |
CN112818608A (en) | Medium-and-long-term runoff forecasting method based on improved particle swarm optimization algorithm and support vector machine | |
CN117493921A (en) | Artificial intelligence energy-saving management method and system based on big data | |
CN118378199A (en) | Real-time anomaly detection method in big data analysis platform | |
CN116796646A (en) | Vibration prediction method of hot continuous rolling mill | |
CN117828371B (en) | Intelligent analysis method for business information of comprehensive operation and maintenance platform | |
CN117609814B (en) | SD-WAN intelligent flow scheduling optimization method and system | |
CN117540325B (en) | Business database anomaly detection method and system based on data variation capture | |
CN116757337B (en) | House construction progress prediction system based on artificial intelligence | |
CN117973899A (en) | Land development and management information intelligent management system based on big data | |
US7155367B1 (en) | Method for evaluating relative efficiency of equipment | |
CN117951631B (en) | Intelligent constant-temperature constant-pressure cooling water circulation system | |
CN117909935B (en) | Kitchen waste liquid state high temperature fermentation stabilization treatment method | |
CN118244725B (en) | Automatic production control method and system for large-scale deflection rounding | |
CN117648657B (en) | Urban planning multi-source data optimization processing method | |
CN118605364B (en) | Calculation monitoring and control system and method for preparation process of porous silicon-carbon anode material | |
CN118395345B (en) | Multistage heat recovery and energy monitoring system of coating machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |