CN118898030A

CN118898030A - Maintenance method and system for edge computing equipment

Info

Publication number: CN118898030A
Application number: CN202411018464.5A
Authority: CN
Inventors: 周密; 焦良葆; 陈烨; 孟琳; 徐轩宇; 童心语; 张琪裕; 刘剑辉; 刘建丰
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2024-07-29
Filing date: 2024-07-29
Publication date: 2024-11-05

Abstract

The invention discloses a maintenance method of edge computing equipment, which belongs to the field of edge computing and intelligent monitoring and comprises the following steps: acquiring current system monitoring data related to maintenance of the edge computing device; preprocessing the current system monitoring data; inputting the preprocessed current system monitoring data into a pre-trained prediction model to acquire prediction system monitoring data; matching the monitoring data of the prediction system with a pre-constructed maintenance decision tree, determining the fault type and triggering a corresponding maintenance decision; wherein the predictive model is obtained by introducing a self-attention mechanism module in the encoder of the Informer model. The invention can realize early warning of the edge computing equipment before the occurrence of faults, take precautionary measures, reduce the frequency of equipment shutdown and related maintenance cost, thereby improving the overall equipment operation efficiency and reliability.

Description

Maintenance method and system for edge computing equipment

Technical Field

The invention relates to the field of edge computing and intelligent monitoring, in particular to a maintenance method and system of edge computing equipment.

Background

The focus of the initial edge computing study is mainly on how to efficiently distribute computing tasks and process data. With the continued development of technology, maintenance and management problems of equipment have begun to be appreciated. Traditional device management systems rely primarily on centralized monitoring solutions, which appear to be frustrating when dealing with edge computing devices. In addition, early systems were unable to predict equipment failure effectively, often reacting after equipment problems, resulting in system downtime and high maintenance costs.

After entering the intellectualization era, existing device management systems began to integrate more automation tools and technologies, such as sensor technology and internet of things, so that device state data could be collected, transmitted and processed in real time. However, the level of intelligence and predictive maintenance capabilities remain limited, and most existing systems remain in the monitoring and reactive maintenance phase, lacking deep data analysis and fault prediction capabilities. And these systems can collect plant operational data but lack sufficient algorithmic support to analyze such data so that early prediction of failure and preventative maintenance cannot be achieved. Therefore, it is important to develop a highly intelligent management system and integrate the latest artificial intelligence algorithm to realize predictive maintenance.

Disclosure of Invention

In order to solve the technical problems, the invention provides a maintenance method and a maintenance system for edge computing equipment, which are used for judging whether faults occur and executing corresponding maintenance decisions by predicting the preprocessed system monitoring data, aiming at improving the intelligent level of equipment management and maintenance and reducing the manual maintenance cost.

The invention is realized by the following technical scheme:

In a first aspect, the present invention provides a method for maintaining an edge computing device, including:

acquiring current system monitoring data related to maintenance of the edge computing device;

Preprocessing the current system monitoring data;

inputting the preprocessed current system monitoring data into a pre-trained prediction model to acquire prediction system monitoring data;

Matching the monitoring data of the prediction system with a pre-constructed maintenance decision tree, determining the fault type and triggering a corresponding maintenance decision;

Wherein the predictive model is obtained by introducing a self-attention mechanism module in the encoder of the Informer model.

According to the prediction result of the system monitoring data, whether the fault type is to be queried from a pre-established maintenance decision tree is judged, and if the fault occurs or the risk of possible fault exists, a corresponding maintenance decision is executed. The method not only realizes the efficient management of the edge computing equipment, but also provides predictive maintenance of the edge equipment by combining a deep learning model, reduces the maintenance cost brought by the traditional repairability maintenance and preventive maintenance, and improves the intelligent level of the equipment management system.

Optionally, after the preprocessed current system monitoring data is input into the pre-trained prediction model, the following operations are performed by using the self-attention mechanism module:

Performing dot multiplication operation on the input query Q and the key K, and scaling by the square root of the dimension D to obtain an attention score matrix;

applying the mask to the attention score matrix and processing with a softmax function to obtain a new matrix;

Based on the new matrix, for each query Q, selecting the top n keys K and the value V with the highest attention score, and recalculating the attention score for them;

Finally, the value V is weighted and summed and output.

Optionally, the maintenance decision tree includes a threshold condition that the edge computing device may fail, a corresponding failure type, and a corresponding maintenance decision.

Optionally, the matching the prediction system monitoring data with a pre-constructed maintenance decision tree, determining the fault type and triggering a corresponding maintenance decision, including,

Comparing the monitoring data of the prediction system with the threshold condition that the edge computing equipment set in the maintenance decision tree is likely to have faults, determining the fault type according to the comparison result and triggering the corresponding maintenance decision.

Optionally, the acquiring system monitoring data of the edge computing device includes: and analyzing the system file through the python script to acquire the system monitoring data.

Optionally, the training method of the prediction model includes:

acquiring a system monitoring data sample related to maintenance of the edge computing device;

dividing a system monitoring data sample into a training set sample and a test set sample;

preprocessing system monitoring data in the training set sample and the testing set sample;

Inputting system monitoring data in the preprocessed training set sample into a prediction model obtained based on Informer model improvement, and training the prediction model;

detecting the prediction precision of the trained prediction model through the test set sample, and stopping training if the prediction precision meets the preset precision requirement; otherwise, continuing to train the prediction model through the training set sample.

Optionally, the preprocessing includes:

Screening the acquired system monitoring data by adopting an isolated forest abnormal value detection algorithm, and deleting burst abnormal data;

and carrying out normalization processing on the screened system monitoring data.

In a second aspect, the present invention provides an intelligent maintenance unit for an edge computing device, comprising

The storage module is used for storing the information of the edge computing equipment and acquiring and storing system monitoring data related to maintenance of the edge computing equipment;

The preprocessing module is used for preprocessing the system monitoring data, wherein the system monitoring data comprises current system monitoring data;

The anomaly prediction module is used for inputting the preprocessed current system monitoring data into a pre-trained prediction model to acquire prediction system monitoring data;

The maintenance decision generation module is used for matching the monitoring data of the prediction system with a pre-constructed maintenance decision tree, determining the fault type and generating a corresponding maintenance decision;

and the maintenance instruction issuing module is used for triggering maintenance decisions.

In a third aspect, the present invention provides an intelligent maintenance system for an edge computing device, comprising an intelligent maintenance unit as described in the second aspect,

The intelligent maintenance unit is connected with the intelligent management unit in series, and the intelligent maintenance unit receives the system monitoring data of the edge equipment unit and transmits the system monitoring data to the intelligent management unit;

The intelligent maintenance unit further comprises an alarm module, wherein the alarm module detects a threshold value of system monitoring data in the storage module and gives an alarm to abnormal data;

The intelligent management unit comprises a user login/registration module, a device management module, a monitoring module, an abnormal alarm list generation module and a user instruction uploading module;

After a user registers and logs in through the user login/registration module, the device management module requests to acquire the edge computing device information stored in the storage module; the monitoring module requests to acquire system monitoring data stored in the storage module through the edge computing equipment information acquired by the equipment management module; the abnormal alarm list generation module acquires the edge computing equipment information and the system monitoring data in the equipment management module and the monitoring module, screens the alarm information and displays the alarm information on a page; the maintenance decision generation module analyzes the alarm information displayed on the page, outputs a maintenance decision to the maintenance instruction issuing module, converts the maintenance decision into a shell command, and sends the shell command to the specific edge computing equipment; the user command uploading module sends shell commands input by a user to the maintenance command issuing module, and the maintenance command issuing module sends the shell commands to the corresponding edge computing equipment

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program/instructions, characterized in that the computer program, when executed by a processor, implements the steps of the method for maintaining an edge computing device according to the first aspect.

Compared with the prior art, the maintenance method and the system for the edge computing equipment provided by the invention have the following beneficial effects:

According to the invention, the self-attention mechanism module is introduced into the Informer model to be improved, so that accurate prediction of system monitoring data is realized; establishing a maintenance decision tree through the possible fault types and maintenance decisions of the edge computing equipment; further, the data prediction result is matched with the data in the maintenance decision tree, and whether the equipment fails or not and whether the risk of failure exists or not are judged; if the equipment fails or the risk of failure exists, a corresponding maintenance decision is selected for maintenance, so that the effects of intelligent management and maintenance are achieved, the intelligent level of equipment management and maintenance can be improved, and the manual maintenance cost is reduced.

Drawings

FIG. 1 is a flow chart of one embodiment of a method for maintaining an edge computing device provided by the present invention;

FIG. 2 is a block diagram illustrating one embodiment of a maintenance system for an edge computing device provided by the present invention;

FIG. 3 is a flow chart of a method of maintaining an edge computing device of the present invention;

FIG. 4 is a flow chart of a self-attention mechanism processing data;

FIG. 5 is a schematic diagram showing the model predictive effect of Informer according to the present invention;

FIG. 6 is a schematic diagram showing the predictive effect of the improved Informer model of the present invention;

FIG. 7 is a schematic diagram of a maintenance function of an embodiment of the present invention;

FIG. 8 is a schematic diagram of a portion of a user's functionality in accordance with one embodiment of the present invention;

FIG. 9 is a diagram of derived partial historical data according to one embodiment of the invention.

Detailed Description

Further description is provided below in connection with the drawings and the specific embodiments. In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature.

Embodiment 1, this embodiment describes a maintenance method for an edge computing device, which may be performed by an intelligent maintenance unit, and the intelligent maintenance unit may be configured as a separate server, as shown in fig. 1, and the method specifically includes the following steps:

Preprocessing the current system monitoring data;

The following describes a specific implementation manner of each step in the method of this embodiment as follows:

1. acquisition and processing of data sets

By deploying the python script on the edge computing device, the python script is used to collect system monitoring data of the edge computing device, such as CPU utilization, GPU utilization, network latency, network bandwidth, hardware temperature, fan speed, etc. There are two ways to obtain edge data: (1) The required data is obtained by parsing system files such as/proc/cpuinfo,/proc/meminfo,/sys/class/hwmon/hwmon/device/fan, etc. files through the python script. (2) The command output obtained by the system command such as python is processed subprocess by the re module, and the required data is extracted by the corresponding regular expression. In this embodiment, the (1) th mode is adopted to acquire the required data. And temporarily storing the result to the SQLite lightweight database by using the python list structure after data acquisition, and waiting for uploading. In order to effectively relieve the data pressure of uploading to a server, acquired data is directly processed in a distributed mode at an edge end, and delay and network load of data uploading are reduced. If the threshold value judgment method is adopted, judging the data generated by the edge computing equipment, and directly carrying out local alarm processing; and adopting a characteristic selection algorithm to eliminate default data, compressing all data into a [0,1] interval, reducing high-dimensional data to one dimension, dynamically adjusting uploading frequency according to monitored network bandwidth and network delay, and reducing network load. The data is converted to JSON format before uploading, and then sent to the server by HTTP POST request using the functionality of the ulllb.

The predictive model should be trained with a data set prior to use, and when the data set is constructed, default values and outliers may be encountered for the data uploaded to the server, and the data volume is large, so that the present invention, according to the embodiment of the invention, an isolated forest outlier detection algorithm is adopted, training is carried out by inputting an outlier sample, collected data is screened, burst outlier data is deleted, and therefore noise is reduced, and data quality is improved. In order to optimize the accuracy of model prediction, the screened data is normalized, a sliding window mechanism is introduced to extract characteristics, the data is converted into a series of data segments with fixed length, the data sets are reasonably divided, and the data sets are divided into training sets and test sets according to the ratio of 6:4.

When the prediction model is actually used, the collected current system monitoring data can also be preprocessed by adopting the processing mode before being input into the prediction model, and the method specifically comprises the following steps: screening the acquired system monitoring data by adopting an isolated forest abnormal value detection algorithm, and deleting burst abnormal data; and carrying out normalization processing on the screened system monitoring data.

2. Construction and training of predictive models

2.1 Construction of the model

The embodiment of the invention improves the Informer model of long-time sequence prediction, so that the Informer model is suitable for the prediction task of monitoring data of an edge computing equipment system. The improved Informer model comprises an input embedded layer, an encoder, a decoder and an output layer which are sequentially arranged; wherein a self-attention mechanism module is introduced in the encoder.

The input embedding layer is used for receiving and processing the current system monitoring data, converting the current system monitoring data into a numerical form which can be understood by a model, and providing a proper input representation for a subsequent encoder part;

The encoder is used for receiving data from the input embedded layer, extracting characteristics in the data through transformation operation, and utilizing the introduced self-attention mechanism module to enable the encoder to consider other time steps in the time sequence when processing the data of the current time step, so that long-term dependency relationship among the data is better captured. The self-attention mechanism module may be fused with the original Informer model through a specific interface, in this embodiment, a self-attention mechanism may be inserted at the beginning of the encoder layer of the encoder, so as to ensure that data can be subjected to self-attention processing when flowing through the encoder;

The decoder is used for receiving the output of the encoder and predicting future system monitoring data based on the extracted characteristics;

The output layer is used for converting the system monitoring data output by the decoder, namely the prediction system monitoring data into the same format or dimension as the current system monitoring data so as to be directly compared with the actual situation or used for further decision making.

The flow chart of the self-attention mechanism module processing data is shown in fig. 4, and the following specific operations are performed:

Performing dot multiplication operation on the input query Q and the key K, and scaling by the square root of the dimension D to obtain an attention score matrix; applying the mask to the attention score matrix and processing with a softmax function to obtain a new matrix; based on the new matrix, for each query Q, selecting the top n keys K and the value V with the highest attention score, and recalculating the attention score for them; the value V is weighted and summed and output according to the recalculated attention score.

Evaluation indexes of the improved Informer model and the original Informer model are compared as shown in the following table 1: mean Absolute Error (MAE), mean Square Error (MSE), root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) were selected as evaluation indices. MAE represents the average of the absolute values of the actual and predicted differences, MSE calculates the average of the squares of the differences between the actual and predicted values, RMSE is the square root of MSE, and is more sensitive to larger errors, MAPE represents the ratio of the error magnitude to the actual value. The smaller the value of the above metric, the more accurate the model prediction.

2.2 Training of models

The divided training set is input into the improved Informer model for training, and after each round of training, the testing set is input into the model for evaluation. Repeating the above process, and continuously optimizing the improved Informer model parameters, for example: the hidden layer feature dimension is set to 512, the attention header number is 10, the stacking layer number of the encoder and the decoder is 2, the full connection layer dimension is 2048, the sampling factor number is 5, the data filling is 0, the Dropout rate is 0.05, and the learning rate is 0.0001. During training, the encoder input length is 96, the decoder input length is 48, the predicted sequence length is 12, and the epoch is set to 10, and if the loss of 3 epochs in succession is not improved, the training is stopped to prevent overfitting. Adam is selected as an optimizer and MAE is selected as a loss function, so that the learning strategy can be adaptively adjusted, the training efficiency is improved, and a training completed model is finally obtained.

The training effect is as shown in fig. 5 and 6. The predicted effect of Informer model is shown in fig. 5, and the predicted effect of Informer model after modification is shown in fig. 6. Fig. 5 and 6. The blue curve represents real data, the red curve represents data predicted by Informer models, and comparison shows that the data predicted by the Informer models after improvement is closer to the real data, so that the accuracy is higher.

And inputting the system monitoring data to be predicted into the model after training, predicting the input system monitoring data, and outputting the system monitoring data which possibly appears in the future.

3. Intelligent maintenance and management

3.1 Building maintenance decision Tree

And pre-constructing a maintenance decision tree according to the threshold condition of possible faults of the edge computing equipment, the fault type and the corresponding maintenance decision. Firstly, historical data of system monitoring data of edge computing equipment is obtained, wherein the historical data can comprise characteristic values such as CPU utilization rate, GPU utilization rate, network delay, network bandwidth, hardware temperature, fan rotation speed and the like. Thresholds for the respective feature values are defined and stored based on these history data and environmental features in advance. The method comprises the steps of presetting thresholds of various performance indexes in a Spring Boot service class, and integrating the thresholds into a Spring Boot controller. The fault type can be network abnormality, overhigh temperature, insufficient memory and the like; the corresponding maintenance decision may be to re-network, increase fan speed, shut down unnecessary processes, etc.

3.2 Intelligent management

Evaluating whether a corresponding maintenance decision needs to be retrieved from a maintenance decision tree according to the predicted result of the improved Informer model: and matching the prediction result with the conditions in the decision tree, and triggering a corresponding alarm mechanism and maintenance decision if a certain characteristic value exceeds a preset threshold value. The Spring Boot controller extracts key performance indexes by analyzing the output data of the model, and activates a corresponding alarm mechanism and maintenance decision through logic judgment. Finally, the server converts the maintenance decision into a shell command, establishes a connection with the edge computing device through the JSch library, and executes the shell command.

Taking Jeston Nano as an example, if the hardware temperature is detected to be higher, the fan rotating speed is increased, the corresponding shell command is 'sudo sh-c' echo 255>/sys/devices/pwm-fan/target_pwm ', if the equipment network condition is detected to be abnormal, the network connection is detected and the network is reconnected, and the corresponding shell command is' sudo NMCLI DEV WIFI connection 'wifi_name' password 'wifi_password' IFNAME WLAN0, wherein the wifi_name and the wifi_password are the preset wifi names and passwords.

Embodiment 2, which is based on the same inventive concept as embodiment 1, introduces an intelligent maintenance unit of an edge computing device, including:

And the maintenance instruction issuing module is used for triggering maintenance decisions. The implementation steps of the method of this embodiment may refer to fig. 3.

Embodiment 3, based on the same inventive concept as embodiment 2, introduces an intelligent maintenance system for an edge computing device, as shown in fig. 2, including an intelligent maintenance unit and an intelligent management unit, where the intelligent maintenance unit and the intelligent management unit are connected in series, and the intelligent maintenance unit receives system monitoring data of the edge device unit and transmits the system monitoring data to the intelligent management unit;

The intelligent maintenance unit is generated by a storage module, an alarm module, an abnormality prediction module, a maintenance decision production module and a maintenance instruction issuing module; the storage module receives system monitoring data from the edge computing equipment, the alarm module detects the threshold value of the system monitoring data in the storage module and gives an alarm to abnormal data, the abnormal prediction module inputs a model to the data in the storage module and outputs a prediction result, the maintenance decision generation module analyzes the result output by the abnormal prediction module and outputs a maintenance decision to the maintenance instruction issuing module, and the maintenance instruction issuing module converts the maintenance decision into a shell command and sends the shell command to the specific edge computing equipment;

The intelligent management unit consists of a user login/registration module, a device management module, a monitoring module, an abnormal alarm list generation module and a user instruction uploading module; after the user registers and logs in through the user login/registration module, the standby management module, the monitoring module and the abnormal alarm list generation module request to acquire system monitoring data in the intelligent maintenance unit storage module and display the system monitoring data on a page; the user inputs the shell command in the page, sends the shell command to the server, and forwards the shell command to the corresponding edge computing device by the server.

The intelligent maintenance unit and the intelligent management unit are presented in the form of Web pages, partial screenshot is shown in figures 7-9, and an intuitive and easy-to-operate interface is provided for a user so as to monitor and manage the edge equipment. Through a well-designed user interface, including clear interface elements, menus, and toolbars, the system ensures that the user can quickly and intuitively find and use the desired functionality. Meanwhile, by utilizing a data visualization technology, such as charts and reports, a user can intuitively know the running trend and performance index of the equipment, and the equipment monitoring efficiency is improved.

The monitoring module and the abnormal alarm list generating module of the intelligent management unit shown in fig. 7 display the information such as the CPU utilization rate, the memory utilization rate, the fan rotation speed, the hardware temperature change trend, the network condition and the like of the equipment in visual forms such as a pie chart and a graph, and the alarm module comprises an alarm list and a specific alarm information popup window. The design effectively integrates the functions of real-time data monitoring and abnormal alarm, and can help a user to quickly understand the condition of equipment.

The user interface of the system of the present invention shown in fig. 8 allows a user to manage the edge computing device in the console after registration and login, perform operations such as adding, deleting, renaming, viewing real-time data, and deriving historical data, and the user may select to derive the historical data of the device in the form of excel table in the last day, week, month, half year or year, and the derived part of the historical data is shown in fig. 9.

Embodiment 4, this embodiment describes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the maintenance method of an edge computing device as described in embodiment 1.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.

Claims

1. A method of maintaining an edge computing device, comprising:

Preprocessing the current system monitoring data;

2. The method of claim 1, wherein after inputting the preprocessed current system monitoring data into the pre-trained predictive model, the following is performed using the self-attention mechanism module:

Finally, the value V is weighted and summed and output.

3. The method of claim 1, wherein the maintenance decision tree comprises a threshold condition for which an edge computing device may fail, a corresponding failure type, and a corresponding maintenance decision.

4. The method of claim 1, wherein said matching said predictive system monitoring data with a pre-built maintenance decision tree, determining a fault type and triggering a corresponding maintenance decision, comprises,

5. The method of claim 1, wherein the obtaining system monitoring data of the edge computing device comprises: and analyzing the system file through the python script to acquire the system monitoring data.

6. The method of claim 1, wherein the training method of the predictive model comprises:

7. The method according to any one of claims 1 or 6, wherein the pre-treatment comprises:

8. An intelligent maintenance unit for an edge computing device, comprising

9. An intelligent maintenance system of an edge computing device, comprising an intelligent management unit and the intelligent maintenance unit of claim 8;

After a user registers and logs in through the user login/registration module, the device management module requests to acquire the edge computing device information stored in the storage module; the monitoring module requests to acquire system monitoring data stored in the storage module through the edge computing equipment information acquired by the equipment management module; the abnormal alarm list generation module acquires the edge computing equipment information and the system monitoring data in the equipment management module and the monitoring module, screens the alarm information and displays the alarm information on a page; the user command uploading module sends shell commands corresponding to the maintenance decisions, which are input by a user, to the maintenance command issuing module, and the maintenance command issuing module sends the shell commands to the corresponding edge computing devices.

10. A computer readable storage medium having stored thereon a computer program/instructions, which when executed by a processor, performs the steps of the method of maintaining an edge computing device as claimed in any of claims 1-7.