Nothing Special   »   [go: up one dir, main page]

CN118094446B - Anaerobic system running condition intelligent analysis method based on machine learning - Google Patents

Anaerobic system running condition intelligent analysis method based on machine learning Download PDF

Info

Publication number
CN118094446B
CN118094446B CN202410494747.0A CN202410494747A CN118094446B CN 118094446 B CN118094446 B CN 118094446B CN 202410494747 A CN202410494747 A CN 202410494747A CN 118094446 B CN118094446 B CN 118094446B
Authority
CN
China
Prior art keywords
data
difference
curve
correlation
curves
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410494747.0A
Other languages
Chinese (zh)
Other versions
CN118094446A (en
Inventor
王善杰
葛扬帆
葛鼎盛
李洋
苏云友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fitec Tianjin Environmental Technology Co ltd
Original Assignee
Fitec Tianjin Environmental Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fitec Tianjin Environmental Technology Co ltd filed Critical Fitec Tianjin Environmental Technology Co ltd
Priority to CN202410494747.0A priority Critical patent/CN118094446B/en
Publication of CN118094446A publication Critical patent/CN118094446A/en
Application granted granted Critical
Publication of CN118094446B publication Critical patent/CN118094446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Discrete Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data cluster analysis, in particular to an intelligent analysis method for the running condition of an anaerobic system based on machine learning. The method obtains the change correlation between the data curves of the running data with different dimensions, and further obtains the correlation weight. And further analyzing the data points in the neighborhood range of the corresponding time of the data points, and combining the correlation weights to obtain fluctuation consistency. Based on the relative fluctuation consistency between dimensions and the data difference in the neighborhood range of the data points, the fluctuation characteristics of the data points on the time sequence can be accurately represented, further, the accurate clustering distance is obtained, the clustering result is obtained to participate in data prediction, and whether the anaerobic system is abnormal is judged. According to the invention, through analyzing different dimensionalities and time sequences of the operation data of the anaerobic system, accurate data clustering results are obtained through a machine learning algorithm, and data prediction is performed, so that the operation condition of the anaerobic system is effectively evaluated.

Description

Anaerobic system running condition intelligent analysis method based on machine learning
Technical Field
The invention relates to the technical field of data cluster analysis, in particular to an intelligent analysis method for the running condition of an anaerobic system based on machine learning.
Background
The anaerobic system is a common system in the sewage treatment process, the anaerobic system needs to monitor and analyze generated data in time in the working process, and system parameters can be adjusted in time by adopting corresponding measures by knowing the operation state of the anaerobic system, so that the operation efficiency and the treatment effect are improved.
The operation data of the anaerobic system has multiple dimensions, the history data can be analyzed by using a machine learning algorithm in the prior art, the history data with the same distribution are used as one type by clustering the history data with multiple dimensions, and the data is predicted by adopting a machine learning method aiming at the data, so that the operation state is determined. However, because the operation data of the anaerobic system is time sequence data, a certain fluctuation change exists in the time sequence data sequence of the same dimension, and a certain relevance and hysteresis of the data change exist between different dimensions, the clustering can be caused by directly utilizing numerical values to carry out clustering, so that the clustering clusters are not clearly divided, and further larger errors are caused in prediction, and the operation condition of the anaerobic system cannot be accurately estimated.
Disclosure of Invention
In order to solve the technical problems that in the prior art, the relevance and hysteresis of the anaerobic system operation data in time sequences are not considered, so that the clustering result is poor, the prediction is generated, and the operation condition of the anaerobic system cannot be accurately estimated, the invention aims to provide an intelligent analysis method for the operation condition of the anaerobic system based on machine learning, which adopts the following technical scheme:
The invention provides an intelligent analysis method for the running condition of an anaerobic system based on machine learning, which comprises the following steps:
acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the dimension class includes at least temperature, pressure and flow;
Obtaining the change correlation of the data curves between different dimensions according to the change trend difference between the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves;
Acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, obtaining a time sequence distribution characteristic of each data point according to the difference weight and the first data difference;
Adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; and respectively taking the data in each cluster as a basis to conduct data prediction, and judging whether the anaerobic system is abnormal or not according to a data prediction result.
Further, the method for acquiring the change correlation includes:
And acquiring the second derivative absolute value of each data point position on the data curve, calculating the second derivative absolute value difference of the data points at the same position between the data curves of two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves.
Further, the method for acquiring the correlation weight comprises the following steps:
Under the same sampling time, acquiring the first derivative difference of data points between each data curve and each other data curve, multiplying the change correlation between the two data curves by the corresponding first derivative difference, and acquiring the weighted change rate difference between the data curve and each other data curve under the same sampling time;
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time.
Further, the method for acquiring the fluctuation consistency comprises the following steps:
Optionally selecting one data curve as a target data curve, wherein data points on the target data curve are target data points, and data points in a preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points; acquiring a second derivative absolute value difference between the target data point and each comparison data point, and selecting the smallest second derivative absolute value difference as a reference change trend difference; taking the correlation weight product at the corresponding time of the reference change trend difference and the target data curve as a weighted change trend difference, accumulating the weighted change trend differences at all times on the target data curve and other data curves, and then carrying out negative correlation mapping and normalization to obtain fluctuation consistency between the target data curve and each other data curve; changing the target data curve to obtain the fluctuation consistency of each data curve relative to each other data curve.
Further, the method for obtaining the difference weight comprises the following steps:
obtaining the difference weight according to a difference weight calculation formula, wherein the difference weight calculation formula comprises:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
Further, the method for acquiring the time sequence distribution characteristic comprises the following steps:
in each data curve, taking the product of the difference weight and the first data difference as a weighted data difference between a corresponding data point and a neighborhood data point, and normalizing the average value of the weighted data differences of all neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point.
Further, the adjusting the cluster distance between data points according to the timing distribution feature includes:
obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; and the adjusted clustering distance is positively correlated with the time sequence distribution characteristic difference.
Further, constructing a connected graph according to the adjusted clustering distance, and splitting the connected graph by utilizing a connected graph dynamic splitting clustering algorithm to obtain the cluster.
Further, a prediction model is constructed by utilizing ARIMA, and the data prediction result is obtained.
Further, the determining whether the anaerobic system is abnormal according to the data prediction result includes:
And obtaining a difference distance between the data prediction result and actual data at the corresponding moment, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value.
The invention has the following beneficial effects:
According to the embodiment of the invention, the change correlation between the data curves of the running data in different dimensions is obtained, and the change correlation can be used for primarily evaluating the change correlation characteristics of the data values between the two dimensions. In order to analyze the data correlation of the data points on the data curves in all dimensions, a correlation weight is further obtained, the correlation weight is analyzed based on the data curves in all dimensions, and the correlation characteristics of the data points on one data curve can be accurately quantified. Considering that certain fluctuation and hysteresis exist in time sequence data, analysis also needs to analyze data points in a neighborhood range of corresponding time of the data points, and fluctuation consistency is obtained by combining correlation weights. The fluctuation consistency can accurately represent the data curve change correlation of one dimension relative to the other dimension on the basis of considering the data hysteresis, and the larger the correlation is, the more the two types of data belong to the same distribution. The time sequence distribution feature can accurately represent the fluctuation feature of the data point on time sequence based on the relative fluctuation consistency among dimensions and the data difference in the neighborhood range of the data point by combining the analysis of the multidimensional data. Based on the time sequence distribution characteristics, accurate clustering distance can be obtained, excellent clustering results are obtained to participate in data prediction, and whether the anaerobic system is abnormal or not is accurately judged.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an intelligent analysis method for an anaerobic system operation condition based on machine learning according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the intelligent analysis method for the anaerobic system operation condition based on machine learning according to the invention, which is provided by the invention, with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the intelligent analysis method for the running condition of the anaerobic system based on machine learning.
Referring to fig. 1, a flow chart of an intelligent analysis method for an anaerobic system operation condition based on machine learning according to an embodiment of the present invention is shown, where the method includes:
Step S1: acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the classes of dimensions include at least temperature, pressure, and flow.
The important operation data of the anaerobic system in the sewage treatment process are temperature, pressure and flow. Anaerobic reactions are affected by temperature and pressure, while flow can characterize the input and output of substances, so data acquisition for anaerobic systems needs to include at least three dimensions. In the embodiment of the invention, the operation data is collected every 30 seconds, and the collected data is used as data points to form a data curve according to the time sequence.
It should be noted that, because the embodiment of the present invention aims to analyze the operation condition of the anaerobic system under the real-time operation, all the data involved in the prediction are the historical data before the real-time data, and in one embodiment of the present invention, the historical time period is set to one hour, that is, the data one hour before the real-time moment is used as the data base of the subsequent clustering and the prediction data.
Step S2: obtaining the change correlation of the data curves among different dimensions according to the change trend difference among the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; and obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves.
The change of the operation data in the anaerobic system affects the result of the water quality index, so that the change trend of the operation data is an important data characteristic for the anaerobic system. Because the invention aims at classifying the data with the same distribution into one type and further predicting and analyzing the data, firstly, the correlation among the running data with different dimensions needs to be analyzed, the stronger the correlation is, the more important the data with the dimension is for an anaerobic system, and the more the correlation among the dimensions needs to be considered when the data on the subsequent analysis time sequence fluctuates.
Firstly, the change correlation of the data curves between different dimensions can be obtained according to the change trend difference between the data curves, namely, the larger the change trend difference is, the smaller the change correlation of the two data curves is.
Preferably, in one embodiment of the present invention, the method for acquiring the change correlation includes:
And acquiring the second derivative absolute value of each data point position on the data curve, namely, utilizing the second derivative absolute value to represent the stability of the change under the corresponding position, wherein the larger the absolute value is, the more drastic the change trend is. And calculating the second derivative absolute value difference of the data points at the same position between the data curves of the two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves. I.e. the larger the average value of the second derivative absolute difference, the more relevant the trend of the data change between the two data curves. In one embodiment of the invention, the change correlation is formulated as:
; wherein, Is the firstData curve and the first dimensionThe varying correlation between the data curves of the individual dimensions,As an exponential function based on natural constants,The number of data points on the data curve,Is the firstOn the data curve of each dimensionThe second derivative of the data point,Is the firstOn the data curve of each dimensionSecond derivative of data points.
In the change correlation formula, the data is mapped and normalized in a negative correlation manner by using an exponential function based on a natural constant, and in other embodiments of the present invention, other basic mathematical operations may be used to achieve the purpose of mapping and normalizing the negative correlation, which is a technical means well known to those skilled in the art, and will not be described herein.
The change correlation is the correlation obtained by initial analysis, and in actual situations, there is hysteresis in the change of anaerobic data in time sequence, for example, the influence of flow data on the water quality index is real-time, and the anaerobic reaction change influenced by the temperature change needs a period of time to obviously change the water quality index, i.e. there is a certain hysteresis in the temperature data. Therefore, for the data point in one dimension, not only the variation trend difference at the same moment between dimensions, but also the variation trend difference in the neighborhood range at the corresponding moment should be analyzed, so that the influence of hysteresis on the data result is avoided. In the embodiment of the invention, before considering hysteresis, the correlation weight at each sampling time on each data curve is required to be obtained, and the correlation feature between the next dimension and other dimensions at the same time is characterized by using the correlation weight. It is therefore necessary to obtain a correlation weight at each sampling time based on the change correlation between each data curve and all other data curves and the change trend difference at the sampling time. And further weighting the variation trend difference in the neighborhood range of the data points corresponding to the data curve and other data curves by the correlation weight, so that the final result comprises the correlation characteristics at the same moment of different dimensions and also comprises the characteristic of data point hysteresis on time sequence, and further, the fluctuation consistency is obtained.
Preferably, in one embodiment of the present invention, the method for acquiring the correlation weight includes:
And acquiring the first derivative difference of the data points between each data curve and each other data curve at the same sampling time, wherein the first derivative represents the change rate of the corresponding position, and the larger the first derivative difference is, the larger the difference of the change degree and the change direction of the data points of the corresponding position is. The change correlation between two data curves is multiplied by the corresponding first derivative difference to obtain a weighted change rate difference between the data curve and each other data curve at the same sampling time. The larger the change correlation is, the stronger the correlation between the corresponding two dimensions is, and the higher the confidence of the data reflected between the two dimensions is, namely, the change correlation is the confidence weight of the first derivative difference.
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time. That is, the larger the weighted change rate difference is, the larger the change trend difference between the weighted change rate difference and other dimensions is at the corresponding sampling time on the data curve, and the smaller the correlation weight is.
In one embodiment of the invention, the exponential function based on natural constant is utilized for carrying out negative correlation mapping and normalization, and the specific correlation weight is expressed as follows by a formula:
; wherein, Is the firstOn the data curve of each dimensionThe correlation weight of the data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The varying correlation between the various other data curves,Is the firstOn the data curveThe first derivative of the data point is used,Is the firstOn the other data curveFirst derivative of data points.
Preferably, in one embodiment of the present invention, the method for acquiring fluctuation consistency includes:
optionally, one data curve is used as a target data curve, the data points on the target data curve are target data points, and the data points in the preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points. In the embodiment of the invention, the neighborhood range is a range with a length of 5 by taking the corresponding time as the center, namely 5 comparison data points exist in one neighborhood range.
Similar to the change correlation in one embodiment of the invention, the second derivative absolute value difference between the target data point and each of the comparison data points is obtained. Since there are a plurality of comparison data points, there are a plurality of second derivative absolute value differences, and in order to avoid data errors caused by excessively large data, it is necessary to select the smallest second derivative absolute value difference as the reference variation trend difference.
And taking the correlation weight product of the reference change trend difference and the corresponding time of the target data curve as the weighted change trend difference. The larger the correlation weight is, the more important the data point at the corresponding position is, and the larger the represented data confidence is, the reference change trend difference needs to be amplified.
And accumulating the weighted change trend differences of the target data curve and all the moments on other data curves, and then carrying out negative correlation mapping and normalization to obtain the fluctuation consistency between the target data curve and each other data curve. That is, the larger the weighted change trend difference is, the smaller the data change correlation between the target data curve and other data curves is, the smaller the fluctuation consistency is. Changing the target data curve to obtain fluctuation consistency of each data curve relative to each other data curve.
In one embodiment of the invention, the following is the firstTaking a data curve of each dimension as an example of a target data curve, the fluctuation consistency is expressed as follows:
; wherein, Represent the firstData curve and the first dimensionFluctuation consistency between data curves of the individual dimensions,As an exponential function based on natural constants,The number of data points on the data curve,A function is selected for the minimum value and,Is the firstOn the data curve of each dimensionThe second derivative of the data point,Is the firstThe first dimension of the data curveThe second derivative of each of the comparison data points.
Step S3: acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, a time sequence distribution characteristic of each data point is obtained according to the difference weight and the first data difference.
In step S2, the correlation of the data curves between the dimensions is obtained by analyzing the variation trend differences and the hysteresis of the data between the different dimensions. Based on the relevance, the fluctuation change condition of each data point on the time sequence can be analyzed, the final cluster division can be clearer by taking the fluctuation change condition on the time sequence as the basis of the subsequent cluster analysis, and fluctuation consistency is required to be introduced when the data fluctuation on the time sequence is analyzed. Therefore, for one moment, a first data difference between a data point corresponding to the moment on each data curve and a neighborhood data point in a preset neighborhood range is obtained, fluctuation consistency between each data curve and other data curves in different dimensions is further considered, and a difference weight of the corresponding neighborhood data point is obtained according to the fluctuation consistency and the first data difference on the other data curves. That is, the difference weights are combined with the relative relationship between one dimension and the other dimension, and the difference weights corresponding to the first data differences in the target dimension are determined through the fluctuation consistency and the first data differences in the other dimension. And weighting the first data difference between the data point and the neighborhood data point on the data curve according to the difference weight, so as to obtain the time sequence distribution characteristic of each data point. The time sequence distribution characteristics represent fluctuation and change characteristic conditions of data points on time sequence, so that clustering based on differences of the time sequence distribution characteristics can optimize a clustering result.
Preferably, in one embodiment of the present invention, the difference weight is obtained according to a difference weight calculation formula including:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
In the calculation formula of the difference weight, the fluctuation consistency is used as the first data difference between the confidence weight and the corresponding dimension to be multiplied, namely, the larger the fluctuation consistency is, the more relevant the data between the two dimensions is, the more important the represented data features are, and the features need to be amplified through multiplication. While a larger first difference on the other data curves indicates a smaller correlation between the data point and the neighborhood data point, the smaller the difference weight.
Preferably, in one embodiment of the present invention, the method for acquiring the timing distribution feature includes:
in each data curve, taking the product of the difference weight and the first data difference as the weighted data difference between the corresponding data point and the neighborhood data point, and normalizing the average value of the weighted data differences of all the neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point. In one embodiment of the invention, the timing distribution characteristics are formulated as:
; wherein the method comprises the steps of Is the firstOn the data curveThe timing distribution characteristics of the individual data points,As the number of data points in the neighborhood,As a function of the normalization,Is the firstOn the data curveThe data value of the data point,Is the firstOn the data curveData point ofThe data values of the individual neighborhood data points,Is the firstOn the data curveData point ofThe differential weight of the individual neighborhood data points. In the formula of the time sequence distribution characteristic, fluctuation of a data value in time sequence is represented through the difference between the data point and the neighborhood data point, and further because the difference weight represents the data relevance of the corresponding position in other dimensions, the difference weight can be used as the confidence weight of the first data difference, namely, the larger the difference weight is, the smaller the data difference of the corresponding position in other dimensions is indicated, the more stable the position is, and the reference degree of the data characteristic represented in the position is larger.
Step S4: adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; and respectively carrying out data prediction by taking the data in each cluster as a basis, and judging whether the anaerobic system is abnormal or not according to the data prediction result.
The clustering distance between the data points can be adjusted based on the time sequence distribution characteristics, and it is to be noted that the initial clustering distance in the conventional clustering process is usually the Euclidean distance between two data points, and the adjusted clustering distance can be obtained by introducing the difference of the time sequence distribution characteristics into the Euclidean distance. And clustering based on the adjusted clustering distance to obtain the clustering clusters with clear classification. Because one type of data is represented in one cluster, data in each cluster can be used as basic data to realize data prediction, and whether the current anaerobic system is abnormal can be judged according to the data prediction result.
Preferably, in one embodiment of the present invention, adjusting the cluster distance between data points according to the timing distribution feature comprises:
Obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; the adjusted clustering distance is positively correlated with the characteristic difference of the time sequence distribution. In one embodiment of the invention, the adjusted cluster distance is formulated as:
; wherein the method comprises the steps of In order to adjust the clustering distance after the adjustment,Is the firstThe initial cluster distance between the data points,Is the firstTime series distribution characteristic differences between data points.
In one embodiment of the invention, a connected graph is constructed according to the adjusted clustering distance, and a connected graph dynamic splitting clustering algorithm is utilized to split the connected graph to obtain a cluster. And constructing a prediction model by utilizing ARIMA to obtain the data prediction result. It should be noted that, the connected graph dynamic split clustering algorithm and the ARIMA data prediction method are all technical means well known to those skilled in the art, and are not described herein.
Preferably, determining whether the anaerobic system is abnormal according to the data prediction result in one embodiment of the present invention includes:
and obtaining a difference distance between a data prediction result and actual data at a corresponding moment, namely, representing the deviation condition of the running data of the current anaerobic system by the difference distance, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value. In one embodiment of the present invention, the difference distance is a euclidean distance between the data prediction result and the actual data, and the judgment threshold is set to 0.6 after normalizing the obtained euclidean distance for convenience of operation.
In summary, the embodiment of the present invention obtains the correlation between the data curves of the running data in different dimensions, so as to analyze the data correlation of the data points on the data curves in all dimensions, and further obtain the correlation weight. In consideration of certain fluctuation and hysteresis of time sequence data, analysis also needs to analyze data points in a neighborhood range of corresponding time of the data points, and fluctuation consistency is obtained by combining correlation weights. The time sequence distribution feature can accurately represent the fluctuation feature of the data point on time sequence based on the relative fluctuation consistency among dimensions and the data difference in the neighborhood range of the data point by combining the analysis of the multidimensional data. Based on the time sequence distribution characteristics, accurate clustering distance can be obtained, clustering results are obtained to participate in data prediction, and whether the anaerobic system is abnormal or not is judged. According to the invention, through analyzing different dimensionalities and time sequences of the operation data of the anaerobic system, accurate data clustering results are obtained through a machine learning algorithm, and data prediction is performed, so that the operation condition of the anaerobic system is effectively evaluated.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (8)

1. An intelligent analysis method for the running condition of an anaerobic system based on machine learning is characterized by comprising the following steps:
acquiring a data curve of operation data of the anaerobic system in each dimension on a time sequence; the dimension class includes at least temperature, pressure and flow;
Obtaining the change correlation of the data curves between different dimensions according to the change trend difference between the data curves; under each sampling time, obtaining the correlation weight of each data curve under each sampling time according to the change correlation between each data curve and all other data curves and the change trend difference under the corresponding sampling time; obtaining fluctuation consistency between the data curve and other data curves according to the correlation weight corresponding to the data point at each sampling time in the data curve and the variation trend difference in the neighborhood range of the data point corresponding to the data curve and other data curves;
Acquiring a first data difference between each data point and a neighborhood data point in a preset neighborhood range in each data curve, and acquiring a difference weight of each neighborhood data point of each data point according to the first data difference and fluctuation consistency of each data point in other data curves of each data curve; in each data curve, obtaining a time sequence distribution characteristic of each data point according to the difference weight and the first data difference;
Adjusting the clustering distance between the data points according to the time sequence distribution characteristics, and clustering according to the adjusted clustering distance to obtain a cluster; respectively taking the data in each cluster as a basis to carry out data prediction, and judging whether the anaerobic system is abnormal or not according to a data prediction result;
The adjusting the clustering distance between the data points according to the time sequence distribution feature comprises:
Obtaining time sequence distribution characteristic differences between two data points, and adjusting initial clustering distances between the data points according to the time sequence distribution characteristic differences to obtain adjusted clustering distances; the adjusted clustering distance is positively correlated with the time sequence distribution characteristic difference;
Respectively taking the data in each cluster as a basis to carry out data prediction, wherein the method comprises the following steps:
And constructing a prediction model by utilizing ARIMA to obtain the data prediction result.
2. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the obtaining method of the change correlation comprises the following steps:
And acquiring the second derivative absolute value of each data point position on the data curve, calculating the second derivative absolute value difference of the data points at the same position between the data curves of two dimensions, carrying out negative correlation mapping on the average value of the second derivative absolute value difference, and normalizing to obtain the change correlation between the data curves.
3. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the method for acquiring the correlation weight comprises the following steps:
Under the same sampling time, acquiring the first derivative difference of data points between each data curve and each other data curve, multiplying the change correlation between the two data curves by the corresponding first derivative difference, and acquiring the weighted change rate difference between the data curve and each other data curve under the same sampling time;
And under the same sampling time, carrying out negative correlation mapping and normalization on the accumulated sum of the weighted change rate differences between each data curve and all other data curves to obtain the correlation weight of each data curve under the corresponding sampling time.
4. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the obtaining method for the fluctuation consistency comprises the following steps:
Optionally selecting one data curve as a target data curve, wherein data points on the target data curve are target data points, and data points in a preset neighborhood range at the corresponding time of the target data points on other data curves except the target data curve are comparison data points; acquiring a second derivative absolute value difference between the target data point and each comparison data point, and selecting the smallest second derivative absolute value difference as a reference change trend difference; taking the correlation weight product at the corresponding time of the reference change trend difference and the target data curve as a weighted change trend difference, accumulating the weighted change trend differences at all times on the target data curve and other data curves, and then carrying out negative correlation mapping and normalization to obtain fluctuation consistency between the target data curve and each other data curve; changing the target data curve to obtain the fluctuation consistency of each data curve relative to each other data curve.
5. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1, wherein the method for obtaining the difference weight comprises the following steps:
obtaining the difference weight according to a difference weight calculation formula, wherein the difference weight calculation formula comprises:
; wherein the method comprises the steps of Is the firstOn the data curveThe sequence number of the neighborhood of data points of the data point,Is the firstOn the data curveData point ofThe differential weights of the individual neighborhood data points,As an exponential function based on natural constants,To be except the firstThe number of other data curves than the data curve,Is the firstData curve(s)The consistency of the fluctuations between the various other data curves,Is the firstOn the other data curveThe data value of the data point,Is the firstOn the other data curveData values for the individual neighborhood data points.
6. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 5, wherein the acquisition method for the time sequence distribution characteristics comprises the following steps:
in each data curve, taking the product of the difference weight and the first data difference as a weighted data difference between a corresponding data point and a neighborhood data point, and normalizing the average value of the weighted data differences of all neighborhood data points of each data point to obtain the time sequence distribution characteristic of each data point.
7. The intelligent analysis method for the running condition of the anaerobic system based on machine learning according to claim 1 is characterized in that a connected graph is constructed according to the adjusted clustering distance, and a connected graph dynamic splitting clustering algorithm is utilized to split the connected graph, so that the cluster is obtained.
8. The intelligent analysis method for the running condition of the anaerobic system based on the machine learning according to claim 1, wherein the judging whether the anaerobic system is abnormal according to the data prediction result comprises:
And obtaining a difference distance between the data prediction result and actual data at the corresponding moment, and judging that the anaerobic system goes out abnormally at the moment if the difference distance is larger than a preset judgment threshold value.
CN202410494747.0A 2024-04-24 2024-04-24 Anaerobic system running condition intelligent analysis method based on machine learning Active CN118094446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410494747.0A CN118094446B (en) 2024-04-24 2024-04-24 Anaerobic system running condition intelligent analysis method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410494747.0A CN118094446B (en) 2024-04-24 2024-04-24 Anaerobic system running condition intelligent analysis method based on machine learning

Publications (2)

Publication Number Publication Date
CN118094446A CN118094446A (en) 2024-05-28
CN118094446B true CN118094446B (en) 2024-08-02

Family

ID=91153436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410494747.0A Active CN118094446B (en) 2024-04-24 2024-04-24 Anaerobic system running condition intelligent analysis method based on machine learning

Country Status (1)

Country Link
CN (1) CN118094446B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118311362B (en) * 2024-06-07 2024-08-09 湖州积微电子科技有限公司 Running state monitoring method for energy-saving medium-power direct-current speed regulating device
CN118365173B (en) * 2024-06-17 2024-09-10 湖南防灾科技有限公司 Intelligent climate effect assessment method for large-scale wind power plant
CN118708772B (en) * 2024-09-02 2024-10-25 济南科金信息技术有限公司 Data storage method for intelligent chemical field management system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114480103A (en) * 2022-01-05 2022-05-13 上海市机电设计研究院有限公司 Vertical dry-process anaerobic reactor system and control method thereof
EP4060433A1 (en) * 2021-03-19 2022-09-21 Siemens Aktiengesellschaft Method and system for predicting the operation of a technical system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020064681A1 (en) * 2018-09-24 2020-04-02 Novo Nordisk A/S System for enhancing data quality of dispense data sets
CN116226484B (en) * 2023-05-05 2023-07-04 北京视酷科技有限公司 Ultrafiltration water treatment device monitoring data management system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4060433A1 (en) * 2021-03-19 2022-09-21 Siemens Aktiengesellschaft Method and system for predicting the operation of a technical system
CN114480103A (en) * 2022-01-05 2022-05-13 上海市机电设计研究院有限公司 Vertical dry-process anaerobic reactor system and control method thereof

Also Published As

Publication number Publication date
CN118094446A (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN118094446B (en) Anaerobic system running condition intelligent analysis method based on machine learning
CN116186634B (en) Intelligent management system for construction data of building engineering
CN115933787B (en) Indoor multi-terminal intelligent control system based on indoor environment monitoring
CN111199016A (en) DTW-based improved K-means daily load curve clustering method
CN116680661B (en) Multi-dimensional data-based automatic gas regulator pressure monitoring method
CN117786584B (en) Big data analysis-based method and system for monitoring and early warning of water source pollution in animal husbandry
CN116307944B (en) Distribution box remote monitoring system based on artificial intelligence and Internet of things
CN116400126B (en) Low-voltage power box with data processing system
CN108460486A (en) A kind of voltage deviation prediction technique based on improvement clustering algorithm and neural network
CN112818608A (en) Medium-and-long-term runoff forecasting method based on improved particle swarm optimization algorithm and support vector machine
CN117493921A (en) Artificial intelligence energy-saving management method and system based on big data
CN118378199A (en) Real-time anomaly detection method in big data analysis platform
CN116796646A (en) Vibration prediction method of hot continuous rolling mill
CN117828371B (en) Intelligent analysis method for business information of comprehensive operation and maintenance platform
CN117609814B (en) SD-WAN intelligent flow scheduling optimization method and system
CN117540325B (en) Business database anomaly detection method and system based on data variation capture
CN116757337B (en) House construction progress prediction system based on artificial intelligence
CN117973899A (en) Land development and management information intelligent management system based on big data
US7155367B1 (en) Method for evaluating relative efficiency of equipment
CN117951631B (en) Intelligent constant-temperature constant-pressure cooling water circulation system
CN117909935B (en) Kitchen waste liquid state high temperature fermentation stabilization treatment method
CN118244725B (en) Automatic production control method and system for large-scale deflection rounding
CN117648657B (en) Urban planning multi-source data optimization processing method
CN118605364B (en) Calculation monitoring and control system and method for preparation process of porous silicon-carbon anode material
CN118395345B (en) Multistage heat recovery and energy monitoring system of coating machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant