Disclosure of Invention
In view of the above problems, the present invention aims to provide an industrial internet data processing method and system based on big data.
The aim of the invention is realized by adopting the following technical scheme:
In a first aspect, the present invention provides an industrial internet data processing method based on big data, including:
s1, acquiring industrial Internet data;
s2, preprocessing the acquired industrial Internet data, and storing the preprocessed industrial Internet data into a data warehouse;
s3, carrying out data management on industrial Internet data in a data warehouse;
And S4, based on the data analysis model, extracting corresponding industrial Internet data from the data warehouse and carrying out data analysis to obtain a data analysis result.
In one embodiment, the industrial internet data includes operational data of the industrial equipment and status data of the industrial equipment;
The step S1 comprises the following steps:
s11, acquiring state data of industrial equipment through a sensor arranged on the industrial equipment, wherein the state data of the industrial equipment comprise temperature data, humidity data and vibration signals;
s12, acquiring operation data of the industrial equipment through an intelligent terminal arranged on the industrial equipment, wherein the operation data of the industrial equipment comprise operation data and equipment state data of the industrial equipment.
In one embodiment, the industrial internet data includes warehouse material inventory data and equipment management data;
wherein, step S1 includes:
s13, acquiring material inventory data from a warehouse material management system;
S14, acquiring the entered industrial equipment basic data from the equipment management system.
In one embodiment, step S2 includes:
s21, decrypting the acquired industrial Internet data to obtain decrypted industrial Internet data;
S22, carrying out standardization processing on the decrypted industrial Internet data to obtain standardized industrial Internet data;
S23, carrying out data cleaning treatment on the standardized industrial Internet data to obtain cleaned industrial Internet data;
S24, carrying out data integration processing on the industrial Internet data after the data cleaning processing, and storing the industrial Internet data into a data warehouse.
In one embodiment, in step S23, the data cleaning process is performed on the industrial internet data after the normalization process, and specifically includes:
S231, acquiring continuous industrial Internet data which are acquired by different data acquisition sources and aimed at the same target, and respectively forming an industrial Internet data sequence, wherein the target comprises industrial equipment or a production place comprising a plurality of industrial equipment; the data acquisition source comprises a sensor of the industrial equipment and an intelligent terminal on the industrial equipment; each data acquisition source corresponds to an industrial Internet data sequence;
S232, carrying out combined abnormal data detection on the industrial Internet data sequence to obtain an industrial Internet data sequence abnormal detection result; and carrying out anomaly marking on the industrial Internet data sequence with the detected anomaly;
S233, carrying out outlier processing on the abnormally marked industrial Internet data sequence, and cleaning the abnormally marked industrial Internet data sequence into data meeting the quality requirement.
In one embodiment, step S3 includes:
S31, quality inspection is carried out on industrial Internet data in the data warehouse to obtain quality inspection results, and the industrial Internet data with unqualified quality is isolated according to the quality inspection results;
S32, managing metadata associated with industrial Internet data in a data warehouse according to the set data specification;
S33, performing blood margin analysis on the industrial Internet data in the data warehouse to obtain blood margin analysis results and generate a blood margin relation graph.
In one embodiment, step S4 includes:
And calling a corresponding data set from the data warehouse according to the set analysis task, and analyzing based on the set data analysis model to obtain a data analysis result.
In one embodiment, step S4 further comprises:
and visually displaying according to the obtained data analysis result.
In a second aspect, the present invention proposes an industrial internet data processing system based on big data, comprising:
The acquisition module is used for acquiring industrial Internet data;
the pretreatment module is used for carrying out pretreatment on the acquired industrial Internet data and storing the pretreated industrial Internet data into the data warehouse;
The management module is used for carrying out data management on the industrial Internet data in the data warehouse;
And the analysis module is used for extracting corresponding industrial Internet data from the data warehouse based on the data analysis model and carrying out data analysis to obtain a data analysis result.
The beneficial effects of the invention are as follows: the method can perform unified acquisition, pretreatment, treatment and analysis treatment on the industrial Internet data generated in the daily production process of enterprises, realize the collection and effective utilization of the industrial Internet data, lay a foundation for further performing large data processing based on the acquired industrial Internet data and realizing the construction of an industrial Internet information processing framework, and is beneficial to improving the adaptability of modern industrial Internet construction.
Detailed Description
The invention is further described in connection with the following application scenario.
Referring to fig. 1, an industrial internet data processing method based on big data is shown, which includes:
s1, acquiring industrial Internet data;
s2, preprocessing the acquired industrial Internet data, and storing the preprocessed industrial Internet data into a data warehouse;
s3, carrying out data management on industrial Internet data in a data warehouse;
And S4, based on the data analysis model, extracting corresponding industrial Internet data from the data warehouse and carrying out data analysis to obtain a data analysis result.
According to the embodiment of the invention, the industrial Internet data processing method based on big data is provided, unified acquisition, pretreatment, treatment and analysis processing can be carried out on industrial Internet data generated in the daily production process of enterprises, collection and effective utilization of the industrial Internet data are realized, a foundation is laid for further carrying out big data processing based on the acquired industrial Internet data and building an industrial Internet information processing frame, and the adaptability of modern industrial Internet construction is improved.
In one embodiment, the industrial internet data includes operational data of the industrial equipment and status data of the industrial equipment;
The step S1 comprises the following steps:
S11, acquiring state data of the industrial equipment through a sensor arranged on the industrial equipment, wherein the state data of the industrial equipment comprise temperature data, humidity data, vibration signals and the like;
S12, collecting operation data of the industrial equipment through an intelligent terminal arranged on the industrial equipment, wherein the operation data of the industrial equipment comprise operation data of the industrial equipment, equipment state data (such as current, voltage and part monitoring data) and the like.
In one embodiment, the industrial internet data includes warehouse material inventory data and equipment management data;
wherein, step S1 includes:
s13, acquiring material inventory data from a warehouse material management system;
S14, acquiring the entered industrial equipment basic data from the equipment management system.
In a scene, the corresponding operation data and state data of the industrial equipment in the production scene are acquired in the daily production process of the enterprise, so that the construction of a corresponding industrial equipment production database for the enterprise is facilitated; through the comprehensive collection of the production data, a foundation can be laid for the targeted analysis (such as production analysis, safety analysis, production management and the like) further according to the collected industrial equipment data.
The collection mode of the related data of the industrial equipment can be that the real-time data is collected through a sensor or a sensor group arranged on the industrial equipment, or can be that an intelligent terminal directly connected to the industrial equipment is used for acquiring the related operation data of the industrial equipment in real time. Or by means of database transfer or human entry, to enter useful or desired industrial internet data into the system.
In one embodiment, referring to fig. 2, step S2, preprocessing the acquired industrial internet data, and storing the preprocessed industrial internet data in a data warehouse, includes:
s21, decrypting the acquired industrial Internet data to obtain decrypted industrial Internet data;
S22, carrying out standardization processing on the decrypted industrial Internet data to obtain standardized industrial Internet data;
S23, carrying out data cleaning treatment on the standardized industrial Internet data to obtain cleaned industrial Internet data;
S24, carrying out data integration processing on the industrial Internet data after the data cleaning processing, and storing the industrial Internet data into a data warehouse.
According to the embodiment of the invention, after the acquisition of the industrial Internet data is realized, the acquired industrial Internet data is subjected to preliminary pretreatment such as decryption, standardization treatment and the like, and the industrial Internet data is further subjected to data cleaning, so that the obtained data resources can be subjected to preliminary arrangement under the condition of acquiring massive industrial Internet data, the quality of the industrial Internet data can be improved, meanwhile, the industrial Internet data is uniformly stored in the data warehouse built by utilizing the cloud technology, the distributed technology and the like, the data collection can be realized, and the reliable data support can be provided for the subsequent large data analysis and treatment through the built data warehouse.
In a scenario, for performing data cleaning processing on industrial internet data with time sequence, such as status data of industrial equipment and operation data of industrial equipment, which are acquired in a daily production process of an enterprise, a specific targeted data cleaning method is also provided, and in step S23, the data cleaning processing is performed on the standardized industrial internet data, which specifically includes:
S231, acquiring continuous industrial Internet data which are acquired by different data acquisition sources and aimed at the same target, and respectively forming an industrial Internet data sequence, wherein the target comprises industrial equipment or a production place comprising a plurality of industrial equipment; the data acquisition source comprises a sensor of the industrial equipment and an intelligent terminal on the industrial equipment; each data acquisition source corresponds to an industrial Internet data sequence;
In one scenario, the acquired industrial internet data may be industrial internet data acquired by different data acquisition sources acquired in real-time by an acquisition module.
S232, carrying out combined abnormal data detection on the industrial Internet data sequence to obtain an industrial Internet data sequence abnormal detection result; and carrying out anomaly marking on the industrial Internet data sequence with the detected anomaly;
S233, carrying out outlier processing on the abnormally marked industrial Internet data sequence, and cleaning the abnormally marked industrial Internet data sequence into data meeting the quality requirement.
In the prior art, a single data cleaning mode is mostly adopted for data cleaning, namely, independent data cleaning processing is carried out for data obtained by a single data source; however, the data cleaning technology of only a single data source, whether the data cleaning mode is based on standard rules or the data cleaning mode aiming at deep learning, cannot well detect and identify dirty data when the data cleaning is carried out on industrial internet data, so that the performance and effect of the data cleaning are not ideal.
Considering that industrial internet data is characterized by being associated with industrial production scenes, there may be a high association between industrial internet data (for example, operation data and status data collected for the same production equipment, which have a certain logic or influence relationship before); therefore, for the real-time data collected in the production scene, based on the time sequence and the relevance of the real-time production data, the embodiment also provides a targeted data cleaning method, which can continuously combine abnormal data detection based on the data acquired by different data collection sources of the same target, and effectively improve the detection and identification accuracy of dirty data in the data cleaning process; after the abnormal data is detected, abnormal value processing is further carried out on the abnormal data, and the abnormal data is cleaned in a deleting, correcting, replacing and other modes, so that the quality of the industrial Internet data is improved.
In one embodiment, in step S232, the joint anomaly data detection for the industrial internet data sequence includes:
a) Cleaning pretreatment is carried out according to the acquired industrial Internet data sequence, and the method comprises the following steps:
performing time mark alignment according to the acquired industrial Internet data sequence, and filling the missing data to obtain a primarily finished industrial Internet data sequence;
adopting a gradual sequence reduction method to respectively reduce each initially-finished industrial Internet data sequence to obtain an industrial Internet data sequence after cleaning pretreatment;
In a scene, aligning data acquired by different data acquisition sources arranged on a large intelligent production device according to identified time information, acquiring corresponding data to form an industrial Internet data sequence based on a set time length, wherein the industrial Internet data sequence for the data acquisition source A is a sequence formed by 2000 pieces of current data acquired in time sequence in a time period from t 1 to t 2; the industrial internet data sequence for the data acquisition source B is a sequence of 2000 industrial equipment engine temperature data acquired in time sequence in the time period from t 1 to t 2.
B) Dividing the industrial Internet data sequence after cleaning pretreatment into different data groups based on the trained data group distribution model, wherein the method comprises the following steps:
each data group comprises at least one industrial Internet data sequence, the data group distribution model comprises belonging grouping information corresponding to different data attributes, and the industrial Internet data sequences are divided into corresponding data groups according to the data attributes corresponding to the industrial Internet data sequences; the data attribute comprises data acquisition source information of industrial Internet data;
In a scene, the data group distribution model comprises data group information aiming at different data acquisition sources, and the industrial Internet data sequences are divided into corresponding data groups according to the data acquisition sources corresponding to the obtained industrial Internet data sequences.
C) For each data group, carrying out detection processing on abnormal data in the data group according to an industrial Internet data sequence in the data group to obtain a detection result of the abnormal data in the data group, wherein the detection result comprises the following steps:
Normalizing each industrial Internet data sequence, normalizing the industrial Internet data sequence into a normalized sequence with the average value of 0 and the standard deviation of 1;
For normalized sequences corresponding to a plurality of industrial Internet data sequences in the same data set, respectively calculating association parameters among the plurality of industrial Internet data sequences, and constructing a first association feature matrix in the set according to the association parameters among the sequences:
Wherein Z c represents the first correlation matrix of the c-th data set, Representing an association parameter between an ith industrial internet data sequence and a jth industrial internet data sequence within a c-th data set, wherein i=1, 2, … D, j=1, 2, … D, D representing a total number of industrial internet data sequences within the data set; wherein,
Wherein,Represents the mth data in the normalized sequence corresponding to the ith industrial internet data sequence in the c-th data set, n represents the total number of data in the industrial internet data sequence,Represents the average value of each data in the normalized sequence corresponding to the ith industrial Internet data sequence in the c-th data group,Represents the mth data in the normalized sequence corresponding to the jth industrial internet data sequence in the c-th data set,Representing the average value of each data in a normalization sequence corresponding to the j industrial Internet data sequence in the c data group; Representing the mth data in the low frequency IMF component derived from the ith industrial internet data sequence within the c-th data set; representing the average value of each data in the low frequency IMF component obtained from the ith industrial internet data sequence in the c-th data set, Representing mth data in the low frequency IMF component derived from the jth industrial internet data sequence within the c-th data set; representing an average value of each data in the low-frequency IMF component obtained from the j-th industrial internet data sequence in the c-th data set; omega 1 and omega 2 respectively represent set associated adjustment factors, wherein omega 1+ω2∈[1,1.1],ω1≥ω2;
Comparing each association parameter of the first association feature matrix with a set standard threshold range, and when the association parameter exceeds the set standard threshold range, marking that the association parameter is abnormal and marking that abnormal data exists in a data set corresponding to the first association feature matrix;
Counting the number of associated parameters marked as abnormal corresponding to each industrial Internet data sequence aiming at a data group with abnormal data, marking the industrial Internet data sequence corresponding to the industrial Internet data sequence with the largest number of associated parameters marked as abnormal data, and adding the industrial Internet data sequence marked as abnormal data into the abnormal data group;
Removing the industrial Internet data sequence marked as abnormal data from the data group, and carrying out abnormal data detection processing in the data group on the residual industrial Internet data sequences in the data group again until the abnormal data detection processing results in the data group are normal, and continuing the abnormal data detection processing in the data group of the next data group until the abnormal data detection processing in the data group of all the data groups is completed;
In a scenario, the method comprises: acquiring a low-frequency IMF component and a high-frequency IMF component of an industrial Internet data sequence:
According to the obtained industrial Internet data sequence x (m) empirical mode decomposition, K IMF components { IMF 1,imf2,imf3,…imfK } and remainder y of the industrial Internet data sequence x (m) are obtained;
Taking the obtained IMF 1 as a high-frequency IMF component IMF g;
the reconstruction is performed as a low-frequency IMF component IMF d from the obtained IMF component { IMF 2,imf3,…imfK } and remainder y.
D) Performing detection processing on abnormal data among the data groups according to each data group to obtain detection results of the abnormal data among the data groups, wherein the detection results comprise:
For each data set, respectively counting the sum of associated parameters of each industrial Internet data sequence corresponding to other industrial Internet data sequences: Wherein the method comprises the steps of Representing the sum of associated parameters of the ith industrial internet data sequence in the c-th data set; taking the industrial Internet data sequence with the largest sum of the associated parameters as a characteristic sequence of the data set;
respectively calculating association parameters among the feature sequences of each data set, and constructing a second association feature matrix according to the association parameters among the sequences:
Wherein, Representing a second correlation feature matrix, v ab representing correlation parameters of the feature sequence of the a-th data set and the feature sequence of the b-th data set, wherein a=1, 2, … F, b=1, 2, … F, F representing the total number of data sets; wherein,
Wherein u (m) a represents the m-th data in the normalized sequence corresponding to the feature sequence of the a-th data set, and n represents the total number of data in the normalized sequence corresponding to the feature sequence; Representing the average value of each data in the normalized sequence corresponding to the characteristic sequence of the a-th data set; u (m) b represents the mth data in the normalized sequence corresponding to the signature sequence of the b-th data set; representing the average value of each data in the normalized sequence corresponding to the characteristic sequence of the b data set; zero (IMF a-g) represents the zero-crossing rate of the high-frequency IMF component obtained from the signature sequence of the a-th dataset; zero (IMF b-g) represents the zero-crossing rate of the high-frequency IMF component obtained from the signature sequence of the b-th dataset; omega 3 and omega 4 represent relevant regulatory factors, wherein omega 3+ω4=1,ω3>2ω4;
comparing the association parameters among the feature sequences of each data set with the corresponding association threshold ranges, and marking that the association parameters are abnormal when the association parameters exceed the corresponding association threshold ranges;
Counting the number of associated parameters marked as abnormal corresponding to the feature sequence, marking a data group corresponding to the feature sequence with the largest number of associated parameters marked as abnormal as a problem data group, marking all industrial Internet data sequences in the problem data group as abnormal data, and adding the industrial Internet data sequences marked as abnormal data into the abnormal data group;
Removing the data groups marked as abnormal data, and carrying out detection processing on abnormal data among the data groups according to the rest data groups again until the detection results of the abnormal data among the data groups are normal;
e) Obtaining an industrial Internet data sequence abnormality detection result according to an abnormality data detection result in the data group and an abnormality data detection result among the data groups, wherein the industrial Internet data sequence abnormality detection result comprises:
And marking the abnormal detection results marked by the industrial Internet data sequences contained in the abnormal data set as abnormal, and marking the abnormal detection results of the rest industrial Internet data sequences as normal.
According to the combined abnormal data detection method, firstly, cleaning pretreatment is carried out according to the obtained industrial Internet data of different data acquisition sources, so that the quality of an industrial Internet data sequence can be primarily improved; the method comprises the steps of carrying out data grouping according to an acquired industrial Internet data sequence, carrying out intra-group abnormality detection on the industrial Internet data sequence in the same data group based on the data grouping, carrying out abnormality detection on intra-group data according to association characteristics among the intra-group data sequences, taking data with strong association in the industrial Internet data into consideration, carrying out abnormality detection on the data with strong association by taking multidimensional data as a basis, and effectively improving objectivity and accuracy of abnormal data detection; when calculating the correlation parameters of the data in the group, the influence of the noise data points on the variation trend is not overcome by taking the similar variation trend among the same data group into consideration, but the traditional correlation parameter calculation mode based on the data variation cannot overcome, so that the characteristic of the variation trend of the data sequence reflected by taking the low-frequency IMF component as a parameter is particularly added when calculating the correlation parameters, the influence of the noise data points on the variation trend of the data sequence can be effectively avoided, and the accuracy of the correlation characteristic expression among the industrial Internet data sequences in the data group is effectively improved. The method is beneficial to improving the accuracy of abnormal data detection in the data set.
After the abnormal data in the group is detected, further carrying out comprehensive abnormal detection on independent or weaker-relevance data according to the relevance characteristics among the data groups, and screening abnormal data; in the process of calculating the correlation parameters among the data sets, the correlation of the characteristic sequences among the data sets based on the change trend is not strong, so that the high-frequency IMF component based on the characteristic sequences is particularly added as the change fluctuation influence relation among the sequences of parametric response, the relation among the characteristic sequences can be further reflected, and the accuracy of abnormal data detection is improved.
Meanwhile, compared with the traditional association anomaly detection technical scheme, the detection efficiency of the anomaly data can be effectively improved, and the performance of real-time anomaly detection of the industrial Internet can be improved.
When the abnormal data detection is started, firstly, a model required by the abnormal data detection needs to be built, and through the building of the model, the standardized model building can be carried out on the association characteristics and grouping conditions of a plurality of data acquisition source data of different targets, so that reasonable references are provided for grouping standard and standard threshold range calculation required in the abnormal detection process.
In one embodiment, step S232 further includes:
training a data set distribution model, comprising:
constructing a training set, wherein the training set comprises standard industrial Internet data corresponding to different data acquisition sources aiming at the same target, the training set comprises standard industrial Internet data in different time periods, and the industrial Internet data in each time period comprises standard industrial Internet data sequences corresponding to different data acquisition sources in the time period; the standard industrial Internet data sequence is a sequence with an average value of 0 and a standard deviation of 1 after normalization treatment;
In a scene, historical data acquired by different data acquisition sources of large intelligent production equipment are acquired from a data warehouse, abnormal conditions of the data are analyzed through expert research, judgment and other modes, and the abnormal conditions are associated with corresponding historical data to form training set data.
In one scenario, a training set contains device operational data collected by 10 sensors for a large intelligent production device, wherein the device operational data collected by each sensor corresponds to 3 time periods, each time period containing 1000 data collected in time order.
According to the obtained training set, standard association parameters among different data acquisition sources are calculated, and a first standard association feature matrix is constructed:
Wherein Z' represents a first standard correlation feature matrix, H ij represents a standard correlation parameter between an i-th data acquisition source and a j-th data acquisition source, where i=1, 2, … N, j=1, 2, … N, N represents a total number of data acquisition sources; wherein,
Wherein x (t, m) i represents the mth data of the standard industrial Internet data sequence corresponding to the ith data acquisition source in the t period, n represents the total number of data in the standard industrial Internet data sequence,Representing the average value of each data of a standard industrial Internet data sequence corresponding to the ith data acquisition source in the t period; x (t, m) j represents the mth data of the standard industrial internet data sequence corresponding to the jth data acquisition source in the t period; representing the average value of each data of a standard industrial Internet data sequence corresponding to a jth data acquisition source in a t period; IMF i-d (t, m) represents the mth data in the low-frequency IMF component obtained according to the standard industrial Internet data sequence corresponding to the ith data acquisition source in the t period; The average value of all data in the low-frequency IMF component obtained according to the standard industrial Internet data sequence corresponding to the ith data acquisition source in the t period is represented, and IMF j-d (t, m) represents the mth data in the low-frequency IMF component obtained according to the standard industrial Internet data sequence corresponding to the jth data acquisition source in the t period; Representing the average value of all data in the low-frequency IMF component obtained according to the standard industrial Internet data sequence corresponding to the jth data acquisition source of the t period; omega 1 and omega 2 respectively represent set associated adjustment factors, wherein omega 1+ω2∈[1,1.1],ω1≥ω2;
Traversing the obtained first standard association feature matrix based on the set condition, and solving a standard grouping result of a data acquisition source, wherein the adopted grouping condition function is as follows:
the standard industrial Internet data sequences corresponding to the data acquisition sources meeting the condition function are divided into the same data group, A represents a target data group, a i,aj represents an ith data acquisition source and a jth data acquisition source respectively, H ij represents standard association parameters between the ith data acquisition source and the jth data acquisition source, and gamma represents a set first standard threshold value, wherein gamma is [0.7,0.8]; num (H ij. Gtoreq. Gtoreq.) represents the number of data acquisition sources in data set A satisfying a criterion correlation parameter between any two acquisition sources greater than beta, where beta represents a second criterion threshold, beta ε [0.4,0.6], and num (A) represents the number of data acquisition sources contained in data set A;
When the same data acquisition source simultaneously meets the conditions existing in a plurality of data groups, the data acquisition source is preferentially divided into the data groups with more data acquisition sources.
In a scenario, the method comprises: the low frequency IMF component and the high frequency IMF component of the standard industrial internet data sequence are acquired. The method in which the low-frequency IMF component and the high-frequency IMF component are acquired corresponds to the method in which the low-frequency IMF component and the high-frequency IMF component of the industrial internet data sequence shown in the above-described embodiment are acquired.
In one embodiment, the range of standard thresholds within each standard data set is calculated based on the partitioned standard data sets.
In a scene, a standard threshold range adopts a standard threshold of a fixed value as a standard, wherein the standard threshold T1E [0.68,0.8] and the standard threshold range are [ T1,1];
In one scenario, the standard threshold range uses an adaptive standard threshold as a standard, wherein the adaptive standard threshold t2=max (T1, H ij - α), where T1 e [0.68,0.8], α e [0.08,0.2], the standard threshold range is [ T2,1];
In one embodiment, according to the divided data set, respectively counting the sum of standard association parameters of each data acquisition source and other data acquisition sources in the set, and taking the data acquisition source with the largest sum of standard association parameters as a characteristic data acquisition source;
Standard association parameters among characteristic data acquisition sources of each standard data set are calculated respectively, and a second standard association characteristic matrix is constructed according to the standard association parameters:
Wherein Φ represents the second standard correlation feature matrix, V ab represents the standard correlation parameters of the a-th standard data set and the b-th standard data set, wherein a=1, 2, … F, b=1, 2, … F, and F represents the total number of data sets; wherein,
Wherein u (t, m) a represents the mth data of the standard industrial Internet data sequence corresponding to the characteristic data acquisition source of the a standard data group at the t moment, n represents the total number of data in the standard industrial Internet data sequence,The average value of the data of the standard industrial Internet data sequence corresponding to the characteristic data acquisition source of the a standard data group at the t moment is represented, u (t, m) b represents the m data of the standard industrial Internet data sequence corresponding to the characteristic data acquisition source of the b standard data group at the t moment,Representing the average value of each data of the standard industrial Internet data sequence corresponding to the characteristic data acquisition source of the a standard data set at the t moment; zero (IMF a-g (t)) represents the zero-crossing rate of the high-frequency IMF component obtained from the standard industrial internet data sequence corresponding to the characteristic data acquisition source representing the a-th standard data set at time t, and zero (IMF b-g (t)) represents the zero-crossing rate of the high-frequency IMF component obtained from the standard industrial internet data sequence corresponding to the characteristic data acquisition source representing the b-th standard data set at time t; omega 3 and omega 4 represent relevant regulatory factors, wherein omega 3+ω4=1,ω3>2ω4;
In one embodiment, a standard association threshold range of each data set is calculated according to a second standard association feature matrix, wherein the standard association threshold range adopts an adaptive standard association threshold as a standard, and the standard association threshold range is [ max (-1, V ab-δ),min(Vab +delta, 1) ]; wherein, delta is 0.2, 0.4.
Based on the implementation mode, the data group distribution model is built and trained, a training set can be built according to standard industrial Internet data of a target, and data grouping is carried out in advance according to standard data acquired by different data sources, so that relevant standard grouping information is obtained. The method can accurately group the data according to the association characteristics among the industrial Internet data acquired by different data acquisition sources, accurately divide the data with strong association into the same data group, and distinguish the data with weak association line of the data by different data groups, thereby providing support for the subsequent detection of the combined abnormal data aiming at the data acquired in real time.
When calculating the correlation parameters of the data in the group, the influence of the noise data points on the variation trend is not overcome by taking the similar variation trend among the same data group into consideration, but the traditional correlation parameter calculation mode based on the data variation cannot overcome, so that the characteristic of the variation trend of the data sequence reflected by taking the low-frequency IMF component as a parameter is particularly added when calculating the correlation parameters, the influence of the noise data points on the variation trend of the data sequence can be effectively avoided, and the accuracy of the correlation characteristic expression among the industrial Internet data sequences in the data group is effectively improved. The method is beneficial to improving the accuracy of abnormal data detection in the data set.
In the process of calculating the correlation parameters among the data sets, the correlation of the characteristic sequences among the data sets based on the change trend is not strong, so that the change fluctuation influence relationship among the sequences based on the characteristic sequences and taking the high-frequency IMF component based on the characteristic sequences as the parametric response is particularly added, the relation among the characteristic sequences can be further reflected, and the accuracy of abnormal data detection is improved.
In one embodiment, step S3 includes:
S31, quality inspection is carried out on industrial Internet data in the data warehouse to obtain quality inspection results, and the industrial Internet data with unqualified quality is isolated according to the quality inspection results;
S32, managing metadata associated with industrial Internet data in a data warehouse according to the set data specification;
S33, performing blood margin analysis on the industrial Internet data in the data warehouse to obtain blood margin analysis results and generate a blood margin relation graph.
In a scene, in order to improve the industrial Internet data stored in the data warehouse, the invention also carries out data management on the industrial Internet data in the data warehouse, including quality inspection, metadata management, blood edge analysis and other management on the line data in the data warehouse, which can be helpful for improving the data management level of the data warehouse and improving the industrial Internet data value.
In one embodiment, step S4 includes:
And calling a corresponding data set from the data warehouse according to the set analysis task, and analyzing based on the set data analysis model to obtain a data analysis result.
In one embodiment, step S4 further comprises:
and visually displaying according to the obtained data analysis result.
After the data volume in the data warehouse reaches a certain scale, further large data analysis processing can be performed based on the industrial Internet data in the data warehouse, and data analysis meeting different requirements can be performed according to the requirements of enterprises in different scenes in the production process, so that the utilization value of the industrial Internet data can be improved.
Meanwhile, statistics and visual display can be performed according to data in a data warehouse, so that the management level of industrial Internet data is improved.
Referring to the embodiment of FIG. 3, an industrial Internet data processing system based on big data, comprising:
The acquisition module is used for acquiring industrial Internet data;
the pretreatment module is used for carrying out pretreatment on the acquired industrial Internet data and storing the pretreated industrial Internet data into the data warehouse;
The management module is used for carrying out data management on the industrial Internet data in the data warehouse;
And the analysis module is used for extracting corresponding industrial Internet data from the data warehouse based on the data analysis model and carrying out data analysis to obtain a data analysis result.
It should be noted that, the collection module, the preprocessing module, the treatment module and the analysis module included in the industrial internet data processing system based on big data provided by the present invention are further configured to correspondingly implement the specific embodiments corresponding to the steps of the industrial internet data processing method based on big data shown in fig. 1, and the present invention is not repeated herein.
The system provided by the invention can be built based on a cloud server, an edge server, an intelligent terminal and the like.
The industrial Internet data processing system based on the big data can perform unified acquisition, pretreatment, treatment and analysis processing on the industrial Internet data generated in the daily production process of enterprises, realize the collection and effective utilization of the industrial Internet data, lay a foundation for further carrying out big data processing based on the acquired industrial Internet data and realizing the construction of an industrial Internet information processing framework, and contribute to improving the adaptability of modern industrial Internet construction.
It should be noted that, in each embodiment of the present invention, each functional unit/module may be integrated in one processing unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated in one unit/module. The integrated units/modules described above may be implemented either in hardware or in software functional units/modules.
From the description of the embodiments above, it will be apparent to those skilled in the art that the embodiments described herein may be implemented in hardware, software, firmware, middleware, code, or any suitable combination thereof. For a hardware implementation, the processor may be implemented in one or more of the following units: an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, other electronic units designed to perform the functions described herein, or a combination thereof. For a software implementation, some or all of the flow of an embodiment may be accomplished by a computer program to instruct the associated hardware. When implemented, the above-described programs may be stored in or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. Computer-readable media can include, but are not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.