WO2019137052A1 - Method and device for network operation and maintenance - Google Patents
Method and device for network operation and maintenance Download PDFInfo
- Publication number
- WO2019137052A1 WO2019137052A1 PCT/CN2018/109903 CN2018109903W WO2019137052A1 WO 2019137052 A1 WO2019137052 A1 WO 2019137052A1 CN 2018109903 W CN2018109903 W CN 2018109903W WO 2019137052 A1 WO2019137052 A1 WO 2019137052A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fault information
- fault
- maintenance
- information
- network
- Prior art date
Links
- 238000012423 maintenance Methods 0.000 title claims abstract description 218
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000011156 evaluation Methods 0.000 claims description 36
- 238000002372 labelling Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 abstract description 19
- 238000001514 detection method Methods 0.000 description 31
- 230000006835 compression Effects 0.000 description 22
- 238000007906 compression Methods 0.000 description 22
- 230000015556 catabolic process Effects 0.000 description 21
- 238000006731 degradation reaction Methods 0.000 description 21
- 230000005856 abnormality Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 208000018910 keratinopathic ichthyosis Diseases 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000009792 diffusion process Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
Definitions
- the present application relates to the field of network technologies, and in particular, to a method and apparatus for network operation and maintenance.
- the user experience is the core of the service.
- the stable and reliable network and the good user experience can help operators to rapidly develop services.
- the network operation and maintenance is used to ensure the safe operation of the network and services, and how to carry out network operation and maintenance.
- User experience is a very important issue.
- This network operation and maintenance method first uses an unsupervised learning model to perform abnormality detection on the network data of the service, and then presents the detection result to the staff, and the staff performs the accuracy of the detection result. Judging, the correct detection result is taken as a training sample, and then the training sample is trained to obtain a supervised learning model, and then the supervised learning model is used to perform abnormality detection on the network data of the service.
- the embodiment of the invention provides a method and a device for network operation and maintenance, which can solve the problem that the network operation and maintenance mode cannot comprehensively process a plurality of services in the related art, and the technical solution is as follows:
- the first aspect provides a method for network operation and maintenance, the method includes: the server first acquires network data of the m types of services, m ⁇ 2, and then determines n first fault information according to the network data of the m types of services, and each The first fault information is used to indicate that the corresponding service has a network fault, 1 ⁇ n ⁇ m. Then, the server divides part or all of the n first fault information into k sets of fault information, and the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information.
- the superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ⁇ k ⁇ n. After that, the server outputs k sets of fault information and k upper level faults, and k upper level faults correspond one-to-one with the k group fault information.
- the m services may include a predictive service, an alarm compression service, and an abnormality detection service.
- the server can display k sets of fault information and k superior faults.
- the server can determine the first fault information and the superior fault according to the network data of the multiple services, so that the staff can perform fault processing. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
- the method may further include: the server acquiring, according to the k upper-level faults and the first fault information corresponding to each of the upper-level faults, the association related to each superior fault.
- the network data is further predicted according to the associated network data, and the second fault information is different from the first fault information.
- the server outputs k superior faults, k sets of fault information, and all predicted second fault information.
- the network fault indicated by the second fault information related to the superior fault refers to a network fault that can be caused by the superior fault.
- the server may display k upper faults, k sets of fault information, and all predicted second fault information.
- the server when the server obtains the superior fault and the first fault information, the server may predict the remaining network faults that may be caused by the superior fault according to the superior fault and the first fault information.
- the superior diffusion labeling selection method enables the staff to timely deal with faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal operation of the network.
- the method may further include: the server receiving the first labeling instruction, where the first labeling instruction is used to indicate that the k-group fault information predicts the correct first fault information. Predict the correct superior fault with k superior faults.
- the server acquires a first sample set based on the first annotation instruction, the first sample set including information indicated by the first annotation instruction.
- the server acquires the associated network data related to each superior fault in the first sample set according to the first sample set, and then predicts the second fault information related to each superior fault according to the associated network data, and the second fault information and the second fault information A fault message is different.
- the server then outputs the first sample set and all of the predicted second failure information.
- the server may send a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct superior failure and predict the correct first failure information, and use the second annotation symbol to mark the superior of the server prediction error.
- the first fault message for faults and predicted errors.
- the server may display the first sample set and all the second fault information predicted.
- the server since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .
- the server can predict the network fault that may be caused by the correct superior fault according to the staff's labeling instruction, so that the staff can predict the correct first fault information, predict the correct superior fault, and predict all.
- the second fault information timely processes faults and potential faults in the network. Moreover, since the accuracy of the second fault information is high, the processing efficiency of the fault is also improved.
- the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, after outputting the first sample set and the predicted second fault information, the method may further include: the server determining all the predicted second fault information as the sample set to be labeled, and then receiving the second labeling instruction, and second The labeling instruction is used to indicate the second fault information that is correctly predicted within the sample set to be labeled.
- the server acquires a second sample set based on the second annotation instruction, the second sample set includes information indicated by the second annotation instruction, and then the server determines the first sample set and the second sample set as the target sample set, and then the server
- the evaluation index of the first operation and maintenance model is determined according to the target sample set, and the first operation and maintenance model is any operation and maintenance model of the m operation and maintenance models.
- the server uses the target sample set to update the first operation and maintenance model.
- the evaluation index of the first operation and maintenance model may be the accuracy, the precision, or the false discovery rate of the first operation and maintenance model.
- the specified evaluation index range can be determined according to the determined evaluation index of the first operation and maintenance model.
- the server may obtain the second fault information that is predicted correctly according to the labeling instruction of the staff, and further, according to the predicted first fault information, predict the correct superior fault, and predict the correct second fault information to the evaluation index.
- the operation and maintenance model that does not meet the business requirements is updated to improve the accuracy of fault prediction, thereby improving the processing efficiency of the fault.
- the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non-
- the fault information, the server predicting the second fault information related to each of the upper faults according to the associated network data may include: the server first inputs the associated network data to the associated operation and maintenance model to obtain the information output by the associated operation and maintenance model, and the associated operation and maintenance
- the model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models.
- the information output by the associated operation and maintenance model is fault information
- the information output by the associated operation and maintenance model is determined to be related to each superior failure.
- the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- the server determines the n first fault information according to the network data of the m types of services, and may include: the server inputs the network data of the corresponding service to the m operation and maintenance models, to obtain the information output by the m operation and maintenance models, and the output of each operation and maintenance model.
- the information is fault information or non-fault information, and the information output by the m operation and maintenance models includes n fault information.
- the server determines the n pieces of failure information as n pieces of first failure information.
- a device for network operation and maintenance in a second aspect, includes at least one module, and at least one module is used to implement the network operation and maintenance method described in the first aspect.
- an apparatus for network operation and maintenance comprising a processor, a memory, a network interface, and a bus.
- the bus is used to connect the processor, memory and network interface.
- the network interface is used to implement a communication connection between the server and the communication device.
- the processor is configured to execute a program stored in a memory to implement the method of network operation and maintenance described in the first aspect.
- a computer readable storage medium in a fourth aspect, storing instructions for causing a computer to perform the network operation and maintenance described in the first aspect when the computer readable storage medium is run on a computer Methods.
- a computer program product comprising instructions for causing a computer to perform the method of network operation and maintenance described in the first aspect when the computer program product is run on a computer is provided.
- the server can determine n (1 ⁇ n ⁇ m) first fault information according to network data of m (m ⁇ 2) services, and then divide part or all of the n first fault information into k (1 ⁇ k ⁇ n)
- the group failure information is the same as the upper fault of the network fault indicated by the first fault information in each set of fault information.
- the server outputs k sets of fault information and k upper faults, k upper faults and k sets of fault information.
- the staff can process faults and potential faults in the network in time.
- various services can be comprehensively processed, and the operation and maintenance model whose evaluation index does not meet the service requirements can be automatically updated and improved. The accuracy of fault prediction improves the processing efficiency of faults.
- FIG. 1 is a schematic diagram of an implementation environment according to an embodiment of the present invention.
- FIG. 2 is a flowchart of a method for a network operation and maintenance method provided by an implementation of the present invention
- FIG. 3 is a flowchart of a method for determining first fault information provided by an implementation of the present invention
- FIG. 4 is a schematic diagram of a first fault information and a superior fault provided by the implementation of the present invention.
- FIG. 5 is a flowchart of a method for predicting second fault information according to an embodiment of the present invention.
- FIG. 6 is a flowchart of a method for another network operation and maintenance method provided by the implementation of the present invention.
- FIG. 7 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention.
- FIG. 8 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention.
- FIG. 9 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention.
- FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention.
- the implementation environment may include a server 001 and a communication device 002.
- the communication device 002 may be a base station.
- the base station is configured to enable the terminal 10 in the cell to communicate, and the server 001 can acquire network data of multiple services from the base station.
- Server 001 can be a server, or a server cluster consisting of several servers, or a cloud computing service center.
- the server 001 is configured to obtain network data of m (m ⁇ 2) services, and determine n (1 ⁇ n ⁇ m) first fault information according to the network data of the multiple services, and then Part or all of the n first fault information is divided into k (1 ⁇ k ⁇ n) group fault information, and the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the k-group fault is output.
- the information and the k upper-level faults, the k upper-level faults and the k-group fault information are in one-to-one correspondence, thereby enabling the staff to perform fault processing.
- the server may also predict potential faults according to the superior fault and the first fault information; in another achievable manner, in order to improve the accuracy of the fault prediction Sex, the server can also determine the correct superior fault and predict the correct first fault information according to the staff's labeling instructions, and then predict the potential fault based on predicting the correct superior fault and predicting the correct first fault information.
- the method for network operation and maintenance provided by the embodiment of the present invention is described below by taking the two implementations as an example.
- the method for network operation and maintenance provided by the embodiment of the present invention is as shown in FIG. 2, and may include:
- Step 201 The server acquires network data of m types of services, where m ⁇ 2.
- the server obtains network data of the service from the communication device.
- the server may acquire network data of the service from the base station.
- the m services acquired by the server may include a predictive service, an alarm compression service, and an abnormality detection service.
- the predictive service may include a hardware failure prediction service, a performance prediction service, and a resource prediction service.
- the alarm compression service may include a single domain alarm compression service, an inter-area alarm compression service, and a root cause alarm analysis service. It can include key performance indicator (KPI) anomaly detection service and service degradation anomaly detection service. A brief description of each service is provided below.
- KPI key performance indicator
- the hardware failure prediction service is used to predict the hardware that is about to fail, and then replace or repair the hardware that is about to fail in time.
- the hardware performance data and the hardware data collected by the sensor can be used for prediction, for example, prediction.
- the hardware can be a single board, a hard disk, or an optical module.
- Performance prediction services are used to predict network performance metrics such as bandwidth, throughput, and latency.
- the resource prediction service is used to predict network resources (such as the central processing unit (CPU) occupancy rate, etc.).
- the alarm compression service is used to compress a large amount of alarm data generated in the network to obtain important alarm data that affects the network.
- the single-domain alarm compression service in the alarm compression service is used to compress alarm data in the same product domain.
- the network devices of the access layer can be regarded as communication devices of the same product domain.
- the inter-area alarm compression service is used to compress alarm data of different product domains.
- the root cause alarm analysis service is used to analyze the basic alarm data that affects the network.
- the anomaly detection service is used to monitor various indicators in the network and report abnormal information.
- the KPI anomaly detection service in the anomaly detection service is used to monitor KPIs (such as KPIs of packet loss rate and KPIs of call quality) in real time.
- the service degradation anomaly detection service is used to monitor key quality indicators (KQI) in real time. Among them, KPI is used to monitor the running status of the network, and KQI is used to measure the quality of the business.
- the network data of the hardware failure prediction service acquired by the server may include related performance indicators of the hardware and hardware data collected by the sensor, etc.
- the network data of the obtained performance prediction service may include data such as network performance indicators
- the obtained resource prediction service may include data such as network resources
- the network data of the obtained single-domain alarm compression service may include alarm data in the same product domain
- the acquired network data of the cross-domain alarm compression service may include alarm data of different product domains, and obtained.
- the network data of the KPI abnormality detecting service may include data such as KPI
- the acquired network data of the service degradation abnormality detecting service may include data such as KQI.
- the period in which the server obtains the network data of each service may be determined according to the corresponding service requirement, for example, the period may be 20 minutes or 1 hour.
- Step 202 The server determines, according to the network data of the m types of services, n first fault information, where each first fault information is used to indicate that a network fault occurs in the corresponding service, where 1 ⁇ n ⁇ m.
- the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- the step 202 may include:
- Step 2021 The server inputs network data of the corresponding service to the m operation and maintenance models to obtain information outputted by the m operation and maintenance models, and the information output by each operation and maintenance model is fault information or non-fault information, and m operation and maintenance model outputs.
- the information includes n fault information.
- the server may use the operation and maintenance model to determine the first fault information according to the network data of the service, and assume that the types of services in the step 201 include the predictive service, the alarm compression service, and the abnormality detection service, and then
- the operation and maintenance model for determining the first failure information may include: a prediction class model, an alarm compression class model, and an anomaly detection class model.
- the server obtains network data of eight types of services: hardware failure prediction service, performance prediction service, resource prediction service, single domain alarm compression service, cross-domain alarm compression service, and root cause.
- the alarm analysis service, the KPI anomaly detection service, and the service degradation anomaly detection service, the prediction class model may include a hardware failure prediction model, a performance prediction model, and a resource prediction model; the alarm compression model may include a single domain alarm compression model and an inter-domain alarm. The compression model and the root cause alarm analysis model; the anomaly detection class model may include a KPI anomaly detection model and a service degradation anomaly detection model, and the total number of operation and maintenance models is 8.
- the network data of the eight kinds of services corresponds to the eight operation and maintenance models one by one. The eight operation and maintenance models are different from each other.
- the server inputs the network data of the corresponding service to the eight operation and maintenance models to obtain the information outputted by the eight operation and maintenance models.
- the server inputs the network data of the hardware failure prediction service to the hardware failure prediction model, and obtains the output of the hardware failure prediction model. accident details.
- the server inputs the network data of the performance prediction service to the performance prediction model, and obtains the failure information output by the performance prediction model.
- Step 2022 The server determines n pieces of fault information as n pieces of first fault information.
- the server can obtain m first fault information.
- Step 203 The server divides part or all of the n first fault information into k group fault information, and the fault of the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information.
- the superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ⁇ k ⁇ n.
- the upper fault of the network fault indicated by the first fault information may be a base station equipment fault.
- the cell managed by the base station includes a cell 231.
- the network data and eight operation and maintenance models of the eight services in step 2021 are taken as an example.
- the server inputs the network data of the corresponding service to the eight operation and maintenance models, and assumes that the information output by the eight operation and maintenance models is fault information. In this way, the server got 8 first failure information.
- the server groups all of the eight first fault information, for example, the eight first fault information is divided into two sets of fault information, and the first group of fault information includes three first fault information, and the three first fault information.
- the superordinate fault of the indicated network fault is a fault of the base station equipment, and the second set of fault information includes five first fault information, and the fault of the upper fault of the network fault indicated by the five first fault information is another fault of the transport equipment.
- FIG. 4 exemplarily shows a set of fault information and a schematic diagram of a superior fault corresponding to the set of fault information, the set of fault information including three first fault information: “cell 231 service degradation”, “Ethernet (ETH)
- the link connection is abnormal, and the CPU usage is high.
- the cell 231 service degradation is the network data that the server inputs the corresponding service to the service degradation abnormality detection model, and the service degradation abnormality detection model outputs the failure information.
- the ETH link connection abnormality is the network data that the server inputs the corresponding service to the KPI abnormality detection model, and the KPI abnormality detection model outputs the fault information.
- the "high CPU usage” is the network data that the server inputs to the resource prediction model, and the resource predicts the fault information output by the model.
- the superior fault of the network fault indicated by the three first fault information is a base station equipment fault.
- Step 204 The server outputs k sets of fault information and k upper faults, and the k upper faults are in one-to-one correspondence with the k sets of fault information.
- the server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to the k sets of fault information and k superior faults. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
- the server can display k sets of fault information and k superior faults.
- the result of the 1 set of fault information and the corresponding superior fault displayed by the server may be as shown in FIG. 4 .
- Step 205 The server acquires associated network data related to each superior fault according to the k upper faults and the first fault information corresponding to each upper fault.
- the server can further determine the potential fault in the network. In order to identify potential failures, the server may first obtain associated network data related to the superior failure.
- the server determines eight first fault information according to the network data of the eight types of services.
- the server divides the eight first fault information into two sets of fault information, and the first group of fault information.
- the first fault information includes: x1, x2, and x3.
- the upper fault of the network fault indicated by the three first fault information is A11;
- the second fault information includes five first fault information: y1, y2, and y3.
- y4 and y5 the upper fault of the network fault indicated by the five first fault information is B11.
- the server obtains the associated network data related to A11 and the associated network data related to B11.
- the first fault information is: “cell 231 service degradation”
- the upper fault of the network fault indicated by the first fault information is: the base station equipment fault
- the associated network data related to the upper fault obtained by the server may be: a cell. 232 KQI.
- the cell managed by the base station includes a cell 232 and a cell 231.
- Step 206 The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.
- the network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure.
- the present step is described by taking the upper faults A11 and B11 in step 205 as an example.
- the server acquires the associated network data p1 related to A11, and then predicts the second fault information related to A11 according to the associated network data p1. Meanwhile, the server obtains The associated network data p2 associated with B11 then predicts the second failure information associated with B11 based on the associated network data p2.
- the first fault information is: "cell 231 service degradation”
- the upper fault of the network fault indicated by the first fault information is: the base station equipment fault
- the associated network data related to the superior fault acquired by the server is: the cell 232 KQI
- the second fault information related to the superior fault predicted by the server according to the associated network data may be: “Cell 232 service degradation”.
- the cell managed by the base station includes a cell 232 and a cell 231.
- step 206 may include:
- Step 2061 The server inputs the associated network data to the associated operation and maintenance model to obtain information output by the associated operation and maintenance model, where the associated operation and maintenance model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models.
- Step 2062 When the information output by the associated operation and maintenance model is fault information, the server determines the information output by the associated operation and maintenance model as the second fault information related to each superior fault.
- the eight services are: hardware failure prediction service, performance prediction service, resource prediction service, single domain alarm compression service, cross-domain alarm compression service, root cause alarm analysis service, KPI abnormality detection service, and service degradation abnormality detection service.
- operation and maintenance models which are: hardware failure prediction model, performance prediction model, resource prediction model, single domain alarm compression model, cross-domain alarm compression model, root cause alarm analysis model, KPI anomaly Detection model and business degradation anomaly detection model.
- the upper-level fault and the first fault information shown in FIG. 4 are taken as an example.
- the upper-level fault is a fault of the base station equipment, and the three first fault information are: “cell 231 service degradation” and “ETH link connection abnormality”. And "high CPU usage.”
- the associated network data obtained by the server related to the superior fault may be: KQI of the cell 232.
- the server inputs the associated network data to the corresponding service degradation anomaly detection model, and obtains the fault information output by the service degradation anomaly detection model: “cell 232 service degradation”, after which the server determines “cell 232 service degradation” as the second. accident details.
- Step 207 The server outputs k upper faults, k sets of fault information, and all predicted second fault information.
- the server outputs k upper-level faults, k-group fault information, and all predicted second fault information, so that the worker performs fault processing according to k upper-level faults, k-group fault information, and all predicted second fault information.
- the server may display k upper faults, k sets of fault information, and all predicted second fault information.
- the server may predict, according to the superior fault and the first fault information, the remaining network faults that may be caused by the superior fault.
- the superior diffusion labeling selection method provided by the embodiment of the invention enables the worker to process faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal network. run.
- the server can determine n (1 ⁇ n ⁇ m) first fault information according to network data of m (m ⁇ 2) services, and then n Part or all of the first fault information is divided into k (1 ⁇ k ⁇ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information.
- k upper-level faults, k superordinate faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the method can comprehensively process multiple services.
- the method for network operation and maintenance provided by the embodiment of the present invention may include:
- Step 601 The server acquires network data of m types of services, where m ⁇ 2.
- Step 601 can refer to step 201.
- Step 602 The server determines, according to network data of the m types of services, n first fault information.
- Each first fault information is used to indicate that a network fault occurs in the corresponding service, 1 ⁇ n ⁇ m.
- the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- the step 602 may include: the server inputs the network data of the corresponding service to the m operation and maintenance models.
- the information output by each operation and maintenance model is fault information or non-fault information, and the information output by the m operation and maintenance models includes n fault information; the server determines the n fault information. For n first fault messages.
- Step 602 can refer to step 202.
- Step 603 The server divides part or all of the n first fault information into k sets of fault information.
- the superior fault of the network fault indicated by the first fault information in each set of fault information is the same.
- the superior fault of the network fault indicated by any of the first fault information is a fault causing the network fault indicated by any of the first fault information, 1 ⁇ k ⁇ n.
- Step 603 can refer to step 203.
- Step 604 The server outputs k sets of fault information and k upper faults.
- the k upper level faults correspond to the k group fault information one by one.
- the server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to k upper faults and k sets of fault information. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
- the server can display k sets of fault information and k superior faults.
- Step 605 The server receives a first labeling instruction, where the first labeling instruction is used to indicate that the first fault information in the k group fault information is correctly predicted and the upper fault in the k upper fault faults are correctly predicted.
- the staff can mark the first fault information and the superior fault displayed by the server according to the actual fault condition of the network, and mark the first fault that the server predicts correctly.
- Information and superior failures For example, the server may issue a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and use the second annotation symbol to mark the first fault of the server prediction error.
- Information and predicting the fault of the superior fault after which the staff uses the first call symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and uses the second call symbol to mark the first fault information of the server prediction error and Predict the wrong superior failure.
- the first label symbol and the second label symbol are different.
- the first annotation symbol may be a checkmark " ⁇ ”
- the second annotation symbol may be a wrong identifier " ⁇ ".
- Step 606 The server acquires a first sample set according to the first annotation instruction, where the first sample set includes information indicated by the first annotation instruction.
- the server acquires a first sample set based on the first annotation instruction in step 605, the first sample set including the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults.
- k is equal to 2
- the first group of fault information includes three first fault information: x1, x2, and x3, and the upper fault of the network fault indicated by the three first fault information is A11
- the second group of fault information includes The five first fault information: y1, y2, y3, y4, and y5, and the upper fault of the network fault indicated by the five first fault information is B11.
- the first annotation instruction is used to indicate x1 and x2 in the first set of fault information, y4 and y5 in the second set of fault information, and the prediction of the superior fault A11 is correct
- the information included in the first sample set is :x1, x2, y4, y5, and A11.
- Step 607 The server acquires, according to the first sample set, associated network data related to each superior fault in the first sample set.
- the server may acquire associated network data related to the superior fault A11 according to the first sample set, for example, A11 is a base station. If the device fails, then the associated network data associated with A11 may be: KQI of cell 232.
- the cell managed by the base station includes a cell 232.
- Step 608 The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.
- the network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure.
- the server since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .
- step 608 may include: the server inputs the associated network data to the associated operation and maintenance model to obtain the information output by the associated operation and maintenance model, wherein the associated operation and maintenance model is associated with the associated network data in the m operation and maintenance models.
- Corresponding operation and maintenance model when the information output by the associated operation and maintenance model is fault information, the server determines the information output by the associated operation and maintenance model as the second failure information related to each superior fault.
- Step 608 can refer to step 206.
- Step 609 The server outputs the first sample set and all the predicted second fault information.
- the first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults.
- the server outputs the first sample set and all the predicted second fault information, so that the worker performs fault processing according to the server predicting the correct first fault information, predicting the correct superior fault, and predicting all the second fault information.
- the server may display the first sample set and all the second fault information predicted.
- the server may predict a network fault that may be caused by a correct superior fault according to the staffing instruction, so that the staff can correctly predict the first fault information according to the prediction. All of the superior faults and predicted second fault information are processed in time for faults and potential faults in the network. Moreover, since the accuracy of the second fault information is high, the processing efficiency of the fault is also improved.
- Step 610 The server determines all the predicted second fault information as a sample set to be labeled.
- the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details.
- the server may use the second fault information predicted by the operation and maintenance model in step 608.
- the sample set to be labeled is determined, so that the worker labels the sample set to be labeled, and obtains the second fault information that is predicted correctly.
- Step 611 The server receives a second labeling instruction, where the second labeling instruction is used to indicate that the second fault information is correctly predicted in the sample set to be labeled.
- the staff can mark the second fault information displayed by the server according to the actual fault condition of the network, and mark the second fault information that the server predicts correctly.
- the staff can mark the second fault information displayed by the server according to the actual fault condition of the network, and mark the second fault information that the server predicts correctly. Refer to Figure 7 and Figure 8 in step 605 for the labeling method.
- Step 612 The server acquires a second sample set according to the second annotation instruction, where the second sample set includes information indicated by the second annotation instruction.
- the server acquires a second sample set based on the second annotation instruction in step 611, the second sample set includes second fault information that is correctly predicted within the sample set to be labeled.
- all of the second fault information predicted in step 608 includes z1, z2, z3, and z4. Assuming that the second annotation instruction is used to indicate that the predictions of z1 and z2 are correct, then the information included in the second sample set is: z1 and z2.
- Step 613 The server determines the first sample set and the second sample set as the target sample set.
- the first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults
- the second sample set includes the second fault information that is correctly predicted within the sample set to be labeled
- the server will The first sample set and the second sample set are determined as a target sample set, and the target sample set is used to update an operation and maintenance model in which the evaluation index does not satisfy the business requirement.
- Step 614 The server determines, according to the target sample set, an evaluation index of the first operation and maintenance model, where the first operation and maintenance model is any one of the m operation and maintenance models.
- the server determines an evaluation index of the first operation and maintenance model according to the predicted first failure information, the predicted correct superior failure, and the predicted correct second failure information.
- the evaluation index of the first operation and maintenance model may be the accuracy of the first operation and maintenance model.
- the accuracy of the model is the ratio of the number of correct results predicted by the model to the total number of predicted results. The higher the accuracy of the model, the better the prediction effect of the model.
- Step 615 When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the server updates the first operation and maintenance model by using the target sample set.
- the corresponding specified evaluation index range may be [f, 1], for example, f may be equal to 0.4, and the server may be in the first operation and maintenance model.
- the evaluation index is less than 0.4, the first operation and maintenance model is updated by using the target sample set.
- the supervised learning algorithm in the machine learning algorithm can be used to train the first operation and maintenance model.
- the model training process can refer to related technologies, and details are not described herein.
- the evaluation index of the first operation and maintenance model may also be the precision of the first operation and maintenance model, and the higher the precision of the model, the better the prediction effect of the model.
- the evaluation index of the first operation dimension model can also be the false discovery rate, and the smaller the error detection rate of the model, the better the prediction effect of the model.
- the evaluation index of the first operation and maintenance model may also be an error omission rate, etc.
- the embodiment of the present invention does not limit the evaluation index of the first operation and maintenance model, and the specified evaluation index range may be based on the determined evaluation index of the first operation and maintenance model. determine.
- each operation and maintenance model in the m operation and maintenance models is managed by a pair of application units and a model trainer, and the application unit is configured to determine an evaluation index of the first operation and maintenance model according to the target sample set, and in the first operation
- the model training device sends a model update request
- the model training device is configured to update the first operation and maintenance model by using the target sample set according to the model update request sent by the application unit.
- the server may obtain the second fault information that is predicted correctly according to the labeling instruction of the staff, and then predict the correct first fault information and predict the correct superior fault and the prediction according to the prediction.
- the second fault information updates the operation and maintenance model that the evaluation index does not meet the business requirements, improves the accuracy of the fault prediction, and further improves the fault processing efficiency.
- the embodiment of the present invention effectively predicts faults and potential faults in the network by using the operation and maintenance experience of the staff.
- the server can update the operation and maintenance model in time, and achieve the purpose of timely prediction and accurate prediction. Reduce labor costs and improve the processing efficiency of faults.
- the staff can quickly know the running state of the network, timely deal with faults and potential faults in the network, improve the stability of the network, and ensure the network. normal operation.
- the server can determine n (1 ⁇ n ⁇ m) first fault information according to network data of m (m ⁇ 2) services, and then n Part or all of the first fault information is divided into k (1 ⁇ k ⁇ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information.
- k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and through this method, comprehensive processing of multiple services can be performed, and The operation and maintenance model whose evaluation index does not meet the business requirements is automatically updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.
- the embodiment of the present invention provides a device for network operation and maintenance.
- the network operation and maintenance device can be used for the server shown in FIG. 1.
- the network operation and maintenance device 900 includes:
- the first obtaining module 910 is configured to perform step 201 or step 601 in the foregoing embodiment.
- the first determining module 920 is configured to perform step 202 or step 602 in the foregoing embodiment.
- the dividing module 930 is configured to perform step 203 or step 603 in the foregoing embodiment.
- the first output module 940 is configured to perform step 204 or step 604 in the foregoing embodiment.
- the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- the first determining module 920 is configured to perform step 2021 or step 2022 in the foregoing embodiment.
- the network operation and maintenance apparatus 900 may further include:
- the second obtaining module 950 is configured to perform step 205 in the foregoing embodiment.
- the first prediction module 960 is configured to perform step 206 in the foregoing embodiment.
- the second output module 970 is configured to perform step 207 in the foregoing embodiment.
- FIG. 10 The meaning of other marks in FIG. 10 can be referred to FIG.
- the device 900 of the network operation and maintenance may further include:
- the first receiving module 980 is configured to perform step 605 in the foregoing embodiment.
- the third obtaining module 990 is configured to perform step 606 in the foregoing embodiment.
- the fourth obtaining module 991 is configured to perform step 607 in the foregoing embodiment.
- the second prediction module 992 is configured to perform step 608 in the foregoing embodiment.
- the third output module 993 is configured to perform step 609 in the foregoing embodiment.
- the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details.
- the device 900 of the network operation and maintenance may further include:
- the second determining module 994 is configured to perform step 610 in the foregoing embodiment.
- the second receiving module 995 is configured to perform step 611 in the foregoing embodiment.
- the fifth obtaining module 996 is configured to perform step 612 in the foregoing embodiment.
- the third determining module 997 is configured to perform step 613 in the foregoing embodiment.
- the fourth determining module 998 is configured to perform step 614 in the foregoing embodiment.
- the update module 999 is configured to perform step 615 in the above embodiment.
- the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other.
- Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non-
- the fault information, the first prediction module 960 in FIG. 10 or the second prediction module 992 in FIG. 11 is configured to perform step 2061 and step 2062 in the foregoing embodiment, including:
- the information output by the associated operation and maintenance model is fault information
- the information output by the associated operation and maintenance model is determined as the second fault information related to each superior fault.
- the network operation and maintenance device provided by the embodiment of the present invention can determine n (1 ⁇ n ⁇ m) first fault information according to network data of m (m ⁇ 2) services, and then n Part or all of the first fault information is divided into k (1 ⁇ k ⁇ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information.
- k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the device can comprehensively process multiple services, and can also The operation and maintenance model whose evaluation index does not meet the business requirements is updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.
- FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance provided by an embodiment of the present invention, and the apparatus may be used in the server shown in FIG. 1.
- the apparatus includes a processor 1201 (such as a CPU), a memory 1202, a network interface 1203, and a bus 1204.
- the bus 1204 is used to connect the processor 1201, the memory 1202, and the network interface 1203.
- the memory 1202 may include a random access memory (RAM), and may also include a non-volatile memory, such as at least one disk storage.
- the communication connection between the server and the communication device is implemented through a network interface 1203, which may be wired or wireless.
- the program 12021 is stored in the memory 1202.
- the program 12021 is used to implement various application functions.
- the processor 1201 is configured to execute the program 12021 stored in the memory 1202 to implement the network operation and maintenance method shown in FIG. 2 or FIG. 6.
- the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- software it may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
- the computer can be a general purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a readable storage medium of a computer or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data
- the center transmits to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium, or a semiconductor medium (eg, a solid state hard disk) or the like.
- the disclosed apparatus and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the modules is only a logical function division.
- there may be another division manner for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
本申请要求于2018年01月11日提交的申请号为201810026962.2、申请名称为“网络运维的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. PCT Application No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
本申请涉及网络技术领域,特别涉及一种网络运维的方法及装置。The present application relates to the field of network technologies, and in particular, to a method and apparatus for network operation and maintenance.
在数据业务时代,用户体验是服务的核心,稳定可靠的网络配合良好的用户体验,能够帮助运营商快速发展业务,网络运维用于保证网络与业务安全有效运行,如何进行网络运维,保障用户体验是十分重要的问题。In the data service era, the user experience is the core of the service. The stable and reliable network and the good user experience can help operators to rapidly develop services. The network operation and maintenance is used to ensure the safe operation of the network and services, and how to carry out network operation and maintenance. User experience is a very important issue.
相关技术中有一种网络运维方式,这种网络运维方式是先采用非监督学习模型对业务的网络数据进行异常检测,然后将检测结果呈现给工作人员,工作人员对检测结果的准确性进行判断,将正确的检测结果作为训练样本,接着对该训练样本进行训练得到监督学习模型,之后采用该监督学习模型对业务的网络数据进行异常检测。There is a network operation and maintenance method in the related art. This network operation and maintenance method first uses an unsupervised learning model to perform abnormality detection on the network data of the service, and then presents the detection result to the staff, and the staff performs the accuracy of the detection result. Judging, the correct detection result is taken as a training sample, and then the training sample is trained to obtain a supervised learning model, and then the supervised learning model is used to perform abnormality detection on the network data of the service.
但上述网络运维方式仅能够对一种业务进行处理,无法对多种业务进行综合处理,而随着网络技术的快速发展,网络业务越来越丰富,亟需一种针对多种业务进行综合处理的网络运维方式。However, the above network operation and maintenance mode can only process one type of service, and cannot comprehensively process multiple services. With the rapid development of network technology, network services are becoming more and more abundant, and it is urgent to integrate multiple services. The network operation and maintenance method handled.
发明内容Summary of the invention
本发明实施例提供了一种网络运维的方法及装置,可以解决相关技术中网络运维方式无法对多种业务进行综合处理的问题,所述技术方案如下:The embodiment of the invention provides a method and a device for network operation and maintenance, which can solve the problem that the network operation and maintenance mode cannot comprehensively process a plurality of services in the related art, and the technical solution is as follows:
第一方面,提供了一种网络运维的方法,该方法包括:服务器先获取m种业务的网络数据,m≥2,再根据m种业务的网络数据确定n个第一故障信息,每个第一故障信息用于指示对应的业务出现网络故障,1≤n≤m。然后,服务器将n个第一故障信息的部分或全部划分为k组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障,1≤k≤n。之后服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应。The first aspect provides a method for network operation and maintenance, the method includes: the server first acquires network data of the m types of services, m≥2, and then determines n first fault information according to the network data of the m types of services, and each The first fault information is used to indicate that the corresponding service has a network fault, 1≤n≤m. Then, the server divides part or all of the n first fault information into k sets of fault information, and the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information. The superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ≤ k ≤ n. After that, the server outputs k sets of fault information and k upper level faults, and k upper level faults correspond one-to-one with the k group fault information.
可选的,m种业务可以包括预测类业务、告警压缩类业务和异常检测类业务等。Optionally, the m services may include a predictive service, an alarm compression service, and an abnormality detection service.
可选的,服务器可以显示k组故障信息以及k个上级故障。Optionally, the server can display k sets of fault information and k superior faults.
在本发明实施例中,服务器能够根据多种业务的网络数据确定第一故障信息和上级故障,以便于工作人员进行故障处理。进一步的,工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障,并对潜在故障进行处理。In the embodiment of the present invention, the server can determine the first fault information and the superior fault according to the network data of the multiple services, so that the staff can perform fault processing. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
可选的,在输出k组故障信息以及k个上级故障之后,该方法还可以包括:服务器根据k个上级故障和每个上级故障对应的第一故障信息,获取与每个上级故障相关的关联网络数据,再根据关联网络数据预测与每个上级故障相关的第二故障信息,第二故障信息与第一故障信息不同。之后,服务器输出k个上级故障、k组故障信息和预测的所有第二故障 信息。Optionally, after outputting the k-group fault information and the k-level faults, the method may further include: the server acquiring, according to the k upper-level faults and the first fault information corresponding to each of the upper-level faults, the association related to each superior fault. The network data is further predicted according to the associated network data, and the second fault information is different from the first fault information. Thereafter, the server outputs k superior faults, k sets of fault information, and all predicted second fault information.
在本发明实施例中,与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。In the embodiment of the present invention, the network fault indicated by the second fault information related to the superior fault refers to a network fault that can be caused by the superior fault.
可选的,服务器可以显示k个上级故障、k组故障信息和预测的所有第二故障信息。Optionally, the server may display k upper faults, k sets of fault information, and all predicted second fault information.
由于网络数据之间的关联性较强,所以在本发明实施例中,服务器在得到上级故障和第一故障信息时,可以根据上级故障和第一故障信息,预测上级故障可能引起的其余网络故障,这种上级扩散标注选择方式使得工作人员能够根据上级故障、第一故障信息和第二故障信息对网络中的故障和潜在故障进行及时处理,提高网络的稳定性,保证网络正常运行。Because the correlation between the network data is strong, in the embodiment of the present invention, when the server obtains the superior fault and the first fault information, the server may predict the remaining network faults that may be caused by the superior fault according to the superior fault and the first fault information. The superior diffusion labeling selection method enables the staff to timely deal with faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal operation of the network.
可选的,在输出k组故障信息以及k个上级故障之后,该方法还可以包括:服务器接收第一标注指令,该第一标注指令用于指示k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。接着,服务器基于第一标注指令获取第一样本集,该第一样本集包括第一标注指令所指示的信息。然后,服务器根据第一样本集获取与第一样本集中每个上级故障相关的关联网络数据,再根据关联网络数据预测与每个上级故障相关的第二故障信息,第二故障信息与第一故障信息不同。之后服务器输出第一样本集和预测的所有第二故障信息。Optionally, after the k-group fault information and the k-level faults are output, the method may further include: the server receiving the first labeling instruction, where the first labeling instruction is used to indicate that the k-group fault information predicts the correct first fault information. Predict the correct superior fault with k superior faults. Next, the server acquires a first sample set based on the first annotation instruction, the first sample set including information indicated by the first annotation instruction. Then, the server acquires the associated network data related to each superior fault in the first sample set according to the first sample set, and then predicts the second fault information related to each superior fault according to the associated network data, and the second fault information and the second fault information A fault message is different. The server then outputs the first sample set and all of the predicted second failure information.
可选的,服务器可以发出提示信息,用于提示工作人员采用第一标注符号来标注服务器预测正确的上级故障和预测正确的第一故障信息,并采用第二标注符号来标注服务器预测错误的上级故障和预测错误的第一故障信息。Optionally, the server may send a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct superior failure and predict the correct first failure information, and use the second annotation symbol to mark the superior of the server prediction error. The first fault message for faults and predicted errors.
可选的,服务器可以显示第一样本集和预测的所有第二故障信息。Optionally, the server may display the first sample set and all the second fault information predicted.
在本发明实施例中,由于服务器是根据k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障得到第二故障信息,所以第二故障信息的准确度更高。In the embodiment of the present invention, since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .
在本发明实施例中,服务器可以根据工作人员的标注指令,预测出正确的上级故障可能引起的网络故障,使得工作人员能够根据预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息对网络中的故障和潜在故障进行及时处理。且由于第二故障信息的准确度较高,因此还提高了故障的处理效率。In the embodiment of the present invention, the server can predict the network fault that may be caused by the correct superior fault according to the staff's labeling instruction, so that the staff can predict the correct first fault information, predict the correct superior fault, and predict all. The second fault information timely processes faults and potential faults in the network. Moreover, since the accuracy of the second fault information is high, the processing efficiency of the fault is also improved.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息,在输出第一样本集和预测的第二故障信息之后,该方法还可以包括:服务器将预测的所有第二故障信息确定为待标注样本集,再接收第二标注指令,第二标注指令用于指示待标注样本集内预测正确的第二故障信息。然后服务器基于第二标注指令获取第二样本集,该第二样本集包括第二标注指令所指示的信息,接着服务器将第一样本集和第二样本集确定为目标样本集,之后,服务器根据目标样本集确定第一运维模型的评价指标,该第一运维模型为m个运维模型中的任一运维模型。当第一运维模型的评价指标不属于指定评价指标范围时,服务器再采用目标样本集对第一运维模型进行更新。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, after outputting the first sample set and the predicted second fault information, the method may further include: the server determining all the predicted second fault information as the sample set to be labeled, and then receiving the second labeling instruction, and second The labeling instruction is used to indicate the second fault information that is correctly predicted within the sample set to be labeled. The server then acquires a second sample set based on the second annotation instruction, the second sample set includes information indicated by the second annotation instruction, and then the server determines the first sample set and the second sample set as the target sample set, and then the server The evaluation index of the first operation and maintenance model is determined according to the target sample set, and the first operation and maintenance model is any operation and maintenance model of the m operation and maintenance models. When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the server uses the target sample set to update the first operation and maintenance model.
可选的,第一运维模型的评价指标可以为第一运维模型的精度、查准率或错误发现率等。指定评价指标范围可以根据确定的第一运维模型的评价指标来确定。Optionally, the evaluation index of the first operation and maintenance model may be the accuracy, the precision, or the false discovery rate of the first operation and maintenance model. The specified evaluation index range can be determined according to the determined evaluation index of the first operation and maintenance model.
在发明实施例中,服务器可以根据工作人员的标注指令,得到预测正确的第二故障信息,进而根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息 对评价指标不满足业务要求的运维模型进行更新,提高故障预测的准确性,进而提高故障的处理效率。In the embodiment of the present invention, the server may obtain the second fault information that is predicted correctly according to the labeling instruction of the staff, and further, according to the predicted first fault information, predict the correct superior fault, and predict the correct second fault information to the evaluation index. The operation and maintenance model that does not meet the business requirements is updated to improve the accuracy of fault prediction, thereby improving the processing efficiency of the fault.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息,服务器根据关联网络数据预测与每个上级故障相关的第二故障信息,可以包括:服务器先向关联运维模型输入关联网络数据,以得到关联运维模型输出的信息,该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型,当该关联运维模型输出的信息为故障信息时,将关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, the server predicting the second fault information related to each of the upper faults according to the associated network data, may include: the server first inputs the associated network data to the associated operation and maintenance model to obtain the information output by the associated operation and maintenance model, and the associated operation and maintenance The model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models. When the information output by the associated operation and maintenance model is fault information, the information output by the associated operation and maintenance model is determined to be related to each superior failure. The second fault information.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同。服务器根据m种业务的网络数据确定n个第一故障信息,可以包括:服务器向m个运维模型输入对应业务的网络数据,以得到m个运维模型输出的信息,每个运维模型输出的信息为故障信息或非故障信息,m个运维模型输出的信息包括n个故障信息。之后,服务器将该n个故障信息确定为n个第一故障信息。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. The server determines the n first fault information according to the network data of the m types of services, and may include: the server inputs the network data of the corresponding service to the m operation and maintenance models, to obtain the information output by the m operation and maintenance models, and the output of each operation and maintenance model. The information is fault information or non-fault information, and the information output by the m operation and maintenance models includes n fault information. Thereafter, the server determines the n pieces of failure information as n pieces of first failure information.
第二方面,提供了一种网络运维的装置,该网络运维的装置包括至少一个模块,至少一个模块用于实现上述第一方面所述的网络运维的方法。In a second aspect, a device for network operation and maintenance is provided. The device for network operation and maintenance includes at least one module, and at least one module is used to implement the network operation and maintenance method described in the first aspect.
第三方面,提供了一种网络运维的装置,该装置包括处理器、存储器、网络接口和总线。其中,总线用于连接处理器、存储器和网络接口。网络接口用于实现服务器与通信设备之间的通信连接。处理器用于执行存储器中存储的程序来实现第一方面所述的网络运维的方法。In a third aspect, an apparatus for network operation and maintenance is provided, the apparatus comprising a processor, a memory, a network interface, and a bus. Among them, the bus is used to connect the processor, memory and network interface. The network interface is used to implement a communication connection between the server and the communication device. The processor is configured to execute a program stored in a memory to implement the method of network operation and maintenance described in the first aspect.
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该计算机可读存储介质在计算机上运行时,使得计算机执行第一方面所述的网络运维的方法。In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing instructions for causing a computer to perform the network operation and maintenance described in the first aspect when the computer readable storage medium is run on a computer Methods.
第五方面,提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行第一方面所述的网络运维的方法。In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the method of network operation and maintenance described in the first aspect when the computer program product is run on a computer is provided.
上述第二方面至第五方面所获得的技术效果与第一方面中对应的技术手段所获得的技术效果近似,在这里不再赘述。The technical effects obtained by the above second to fifth aspects are similar to those obtained by the corresponding technical means in the first aspect, and are not described herein again.
本发明实施例提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solutions provided by the embodiments of the present invention are:
服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息,然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,之后,服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应,进而使得工作人员能够及时处理网络中的故障和潜在故障,通过本发明实施例,能够对多种业务进行综合处理,还能够对评价指标不满足业务要求的运维模型进行自动更新,提高了故障预测的准确性,提高了故障的处理效率。The server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then divide part or all of the n first fault information into k (1 ≤ k ≤ n) The group failure information is the same as the upper fault of the network fault indicated by the first fault information in each set of fault information. After that, the server outputs k sets of fault information and k upper faults, k upper faults and k sets of fault information. Correspondingly, the staff can process faults and potential faults in the network in time. According to the embodiment of the present invention, various services can be comprehensively processed, and the operation and maintenance model whose evaluation index does not meet the service requirements can be automatically updated and improved. The accuracy of fault prediction improves the processing efficiency of faults.
图1是本发明实施例所涉及的实施环境示意图;1 is a schematic diagram of an implementation environment according to an embodiment of the present invention;
图2是本发明实施提供的一种网络运维的方法的方法流程图;2 is a flowchart of a method for a network operation and maintenance method provided by an implementation of the present invention;
图3是本发明实施提供的一种确定第一故障信息的方法流程图;3 is a flowchart of a method for determining first fault information provided by an implementation of the present invention;
图4是本发明实施提供的一种第一故障信息和上级故障的示意图;4 is a schematic diagram of a first fault information and a superior fault provided by the implementation of the present invention;
图5是本发明实施提供的一种预测第二故障信息的方法流程图;FIG. 5 is a flowchart of a method for predicting second fault information according to an embodiment of the present invention; FIG.
图6是本发明实施提供的另一种网络运维的方法的方法流程图;6 is a flowchart of a method for another network operation and maintenance method provided by the implementation of the present invention;
图7是本发明实施提供的图4所示的上级故障和第一故障信息图的标注示意图;7 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention;
图8是本发明实施提供的图4所示的上级故障和第一故障信息图的标注示意图;8 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention;
图9是本发明实施例提供的一种网络运维的装置的结构示意图;FIG. 9 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention;
图10是本发明实施例提供的另一种网络运维的装置的结构示意图;FIG. 10 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention;
图11是本发明实施例提供的再一种网络运维的装置的结构示意图;11 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention;
图12是本发明实施例提供的一种网络运维的装置的结构示意图。FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention.
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
图1是本发明实施例所涉及的实施环境示意图,如图1所示,该实施环境可以包括服务器001和通信设备002,示例的,通信设备002可以为基站。基站用于使小区中的终端10进行通信,服务器001可以从基站上获取多种业务的网络数据。服务器001可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。1 is a schematic diagram of an implementation environment according to an embodiment of the present invention. As shown in FIG. 1 , the implementation environment may include a
在本发明实施例中,服务器001用于获取m(m≥2)种业务的网络数据,根据该多种业务的网络数据确定n(1≤n≤m)个第一故障信息,然后将该n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,之后,输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应,进而使得工作人员进行故障处理。进一步的,在一种可实现方式中,为了避免潜在故障对网络造成影响,服务器还可以根据上级故障和第一故障信息预测潜在故障;在另一种可实现方式中,为了提高故障预测的准确性,服务器还可以根据工作人员的标注指令确定预测正确的上级故障和预测正确的第一故障信息,然后再基于预测正确的上级故障和预测正确的第一故障信息预测潜在故障。下面以这两种可实现方式为例对本发明实施例提供的网络运维的方法进行说明。In the embodiment of the present invention, the
在一种可实现方式中,本发明实施例提供的网络运维的方法如图2所示,可以包括:In an implementation manner, the method for network operation and maintenance provided by the embodiment of the present invention is as shown in FIG. 2, and may include:
步骤201、服务器获取m种业务的网络数据,m≥2。Step 201: The server acquires network data of m types of services, where m≥2.
参见图1,服务器从通信设备上获取业务的网络数据,示例的,服务器可以从基站上获取业务的网络数据。Referring to FIG. 1, the server obtains network data of the service from the communication device. For example, the server may acquire network data of the service from the base station.
示例的,服务器获取的m种业务可以包括预测类业务、告警压缩类业务和异常检测类业务等。其中,预测类业务可以包括硬件失效预测业务、性能预测业务和资源预测业务等;告警压缩类业务可以包括单域告警压缩业务、跨域告警压缩业务和根因告警分析业务等;异常检测类业务可以包括关键性能指标(Key Performance Indicator,KPI)异常检测业务和业务劣化异常检测业务,下面对每种业务做一简要说明。For example, the m services acquired by the server may include a predictive service, an alarm compression service, and an abnormality detection service. The predictive service may include a hardware failure prediction service, a performance prediction service, and a resource prediction service. The alarm compression service may include a single domain alarm compression service, an inter-area alarm compression service, and a root cause alarm analysis service. It can include key performance indicator (KPI) anomaly detection service and service degradation anomaly detection service. A brief description of each service is provided below.
硬件失效预测业务用于对即将失效的硬件进行预测,进而及时更换或维修即将失效的硬件,比如可以根据硬件的相关性能指标和设置在硬件上的传感器采集的硬件数据进行预 测,示例的,预测的硬件可以为单板、硬盘或光模块等。性能预测业务用于对网络性能指标(比如带宽、吞吐量和时延等)进行预测。资源预测业务用于对网络资源(比如中央处理器(Central Processing Unit,CPU)占用率等)进行预测。告警压缩类业务用于对网络中产生的大量告警数据进行压缩,得到影响网络的重要告警数据,告警压缩类业务中的单域告警压缩业务用于对同一产品域内的告警数据进行压缩,比如,可以将接入层的网络设备看作是同一产品域的通信设备。跨域告警压缩业务用于对不同产品域的告警数据进行压缩。根因告警分析业务用于对影响网络的基本告警数据进行分析。异常检测类业务用于对网络中各种指标进行实时监控并上报异常信息。异常检测类业务中的KPI异常检测业务用于对KPI(比如丢包率的KPI和通话质量的KPI等)进行实时监控。业务劣化异常检测业务用于对关键质量指标(Key Quality Indicator,KQI)进行实时监控。其中,KPI用于监测网络的运行状态,KQI用于度量业务的好坏。The hardware failure prediction service is used to predict the hardware that is about to fail, and then replace or repair the hardware that is about to fail in time. For example, the hardware performance data and the hardware data collected by the sensor can be used for prediction, for example, prediction. The hardware can be a single board, a hard disk, or an optical module. Performance prediction services are used to predict network performance metrics such as bandwidth, throughput, and latency. The resource prediction service is used to predict network resources (such as the central processing unit (CPU) occupancy rate, etc.). The alarm compression service is used to compress a large amount of alarm data generated in the network to obtain important alarm data that affects the network. The single-domain alarm compression service in the alarm compression service is used to compress alarm data in the same product domain. For example, The network devices of the access layer can be regarded as communication devices of the same product domain. The inter-area alarm compression service is used to compress alarm data of different product domains. The root cause alarm analysis service is used to analyze the basic alarm data that affects the network. The anomaly detection service is used to monitor various indicators in the network and report abnormal information. The KPI anomaly detection service in the anomaly detection service is used to monitor KPIs (such as KPIs of packet loss rate and KPIs of call quality) in real time. The service degradation anomaly detection service is used to monitor key quality indicators (KQI) in real time. Among them, KPI is used to monitor the running status of the network, and KQI is used to measure the quality of the business.
示例的,服务器获取的硬件失效预测业务的网络数据可以包括硬件的相关性能指标和传感器采集的硬件数据等,获取的性能预测业务的网络数据可以包括网络性能指标等数据,获取的资源预测业务的网路数据可以包括网络资源等数据,获取的单域告警压缩业务的网络数据可以包括同一产品域内的告警数据,获取的跨域告警压缩业务的网络数据可以包括不同产品域的告警数据,获取的KPI异常检测业务的网络数据可以包括KPI等数据,获取的业务劣化异常检测业务的网络数据可以包括KQI等数据。For example, the network data of the hardware failure prediction service acquired by the server may include related performance indicators of the hardware and hardware data collected by the sensor, etc., and the network data of the obtained performance prediction service may include data such as network performance indicators, and the obtained resource prediction service The network data may include data such as network resources, and the network data of the obtained single-domain alarm compression service may include alarm data in the same product domain, and the acquired network data of the cross-domain alarm compression service may include alarm data of different product domains, and obtained. The network data of the KPI abnormality detecting service may include data such as KPI, and the acquired network data of the service degradation abnormality detecting service may include data such as KQI.
需要说明的是,服务器获取每种业务的网络数据的周期可以根据对应业务需要来确定,比如,该周期可以为20分钟或者1小时。It should be noted that the period in which the server obtains the network data of each service may be determined according to the corresponding service requirement, for example, the period may be 20 minutes or 1 hour.
步骤202、服务器根据m种业务的网络数据确定n个第一故障信息,每个第一故障信息用于指示对应的业务出现网络故障,1≤n≤m。Step 202: The server determines, according to the network data of the m types of services, n first fault information, where each first fault information is used to indicate that a network fault occurs in the corresponding service, where 1≤n≤m.
可选的,在本发明实施例中,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,相应的,如图3所示,步骤202可以包括:Optionally, in the embodiment of the present invention, the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. Correspondingly, as shown in FIG. 3, the
步骤2021、服务器向m个运维模型输入对应业务的网络数据,以得到m个运维模型输出的信息,每个运维模型输出的信息为故障信息或非故障信息,m个运维模型输出的信息包括n个故障信息。Step 2021: The server inputs network data of the corresponding service to the m operation and maintenance models to obtain information outputted by the m operation and maintenance models, and the information output by each operation and maintenance model is fault information or non-fault information, and m operation and maintenance model outputs. The information includes n fault information.
在本发明实施例中,服务器可以采用运维模型根据业务的网络数据确定第一故障信息,假设步骤201中的m种业务包括预测类业务、告警压缩类业务和异常检测类业务,那么用于确定第一故障信息的运维模型可以包括:预测类模型、告警压缩类模型和异常检测类模型。假设在步骤201中,服务器获取到8种业务的网络数据,这8种业务分别为:硬件失效预测业务、性能预测业务、资源预测业务、单域告警压缩业务、跨域告警压缩业务、根因告警分析业务、KPI异常检测业务和业务劣化异常检测业务,那么,预测类模型可以包括硬件失效预测模型、性能预测模型和资源预测模型;告警压缩类模型可以包括单域告警压缩模型、跨域告警压缩模型和根因告警分析模型;异常检测类模型可以包括KPI异常检测模型和业务劣化异常检测模型,运维模型的总数量为8。8种业务的网络数据与8个运维模型一一对应,8个运维模型互不相同。In the embodiment of the present invention, the server may use the operation and maintenance model to determine the first fault information according to the network data of the service, and assume that the types of services in the
服务器向这8个运维模型输入对应业务的网络数据,以得到8个运维模型输出的信息,比如,服务器向硬件失效预测模型输入硬件失效预测业务的网络数据,得到硬件失效预测模型输出的故障信息。又比如,服务器向性能预测模型输入性能预测业务的网络数据,得 到性能预测模型输出的故障信息。The server inputs the network data of the corresponding service to the eight operation and maintenance models to obtain the information outputted by the eight operation and maintenance models. For example, the server inputs the network data of the hardware failure prediction service to the hardware failure prediction model, and obtains the output of the hardware failure prediction model. accident details. For another example, the server inputs the network data of the performance prediction service to the performance prediction model, and obtains the failure information output by the performance prediction model.
步骤2022、服务器将n个故障信息确定为n个第一故障信息。Step 2022: The server determines n pieces of fault information as n pieces of first fault information.
如果每个运维模型输出的信息均为故障信息,那么服务器可以得到m个第一故障信息。If the information output by each operation and maintenance model is fault information, the server can obtain m first fault information.
步骤203、服务器将n个第一故障信息的部分或全部划分为k组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障,1≤k≤n。Step 203: The server divides part or all of the n first fault information into k group fault information, and the fault of the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information. The superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ≤ k ≤ n.
示例的,当某一第一故障信息为“小区231业务劣化”,那么该第一故障信息所指示的网络故障的上级故障可以是基站设备故障。该基站管理的小区包括小区231。For example, when a certain first fault information is “cell 231 service degradation”, the upper fault of the network fault indicated by the first fault information may be a base station equipment fault. The cell managed by the base station includes a cell 231.
现以步骤2021中的8种业务的网络数据和8个运维模型为例进行说明,服务器向8个运维模型输入对应业务的网络数据,假设8个运维模型输出的信息均为故障信息,这样一来,服务器得到了8个第一故障信息。假设服务器对这8个第一故障信息的全部进行分组,比如8个第一故障信息被划分为2组故障信息,第一组故障信息包括3个第一故障信息,这3个第一故障信息所指示的网络故障的上级故障为基站设备故障,第二组故障信息包括5个第一故障信息,这5个第一故障信息所指示的网络故障的上级故障为另一传输设备故障。The network data and eight operation and maintenance models of the eight services in
图4示例性示出了1组故障信息和该组故障信息对应的上级故障的示意图,该组故障信息包括3个第一故障信息:“小区231业务劣化”,“以太网(Ethernet,ETH)链路连接异常”,“CPU占用率较高”,其中,“小区231业务劣化”是服务器向业务劣化异常检测模型输入对应业务的网络数据,该业务劣化异常检测模型输出的故障信息。“ETH链路连接异常”是服务器向KPI异常检测模型输入对应业务的网络数据,该KPI异常检测模型输出的故障信息。“CPU占用率较高”是服务器向资源预测模型输入对应业务的网络数据,该资源预测模型输出的故障信息。这3个第一故障信息所指示的网络故障的上级故障为基站设备故障。FIG. 4 exemplarily shows a set of fault information and a schematic diagram of a superior fault corresponding to the set of fault information, the set of fault information including three first fault information: “cell 231 service degradation”, “Ethernet (ETH) The link connection is abnormal, and the CPU usage is high. The cell 231 service degradation is the network data that the server inputs the corresponding service to the service degradation abnormality detection model, and the service degradation abnormality detection model outputs the failure information. The ETH link connection abnormality is the network data that the server inputs the corresponding service to the KPI abnormality detection model, and the KPI abnormality detection model outputs the fault information. The "high CPU usage" is the network data that the server inputs to the resource prediction model, and the resource predicts the fault information output by the model. The superior fault of the network fault indicated by the three first fault information is a base station equipment fault.
步骤204、服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应。Step 204: The server outputs k sets of fault information and k upper faults, and the k upper faults are in one-to-one correspondence with the k sets of fault information.
服务器输出k组故障信息以及k个上级故障,以便于工作人员根据k组故障信息和k个上级故障进行故障处理。进一步的,工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障,并对潜在故障进行处理。The server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to the k sets of fault information and k superior faults. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
可选的,服务器可以显示k组故障信息以及k个上级故障。示例的,服务器显示出的1组故障信息和对应的上级故障的结果可以如图4所示。Optionally, the server can display k sets of fault information and k superior faults. For example, the result of the 1 set of fault information and the corresponding superior fault displayed by the server may be as shown in FIG. 4 .
步骤205、服务器根据k个上级故障和每个上级故障对应的第一故障信息,获取与每个上级故障相关的关联网络数据。Step 205: The server acquires associated network data related to each superior fault according to the k upper faults and the first fault information corresponding to each upper fault.
由于网络数据之间的关联性较强,比如某一基站管理3个小区,当该基站出现故障时,该基站管理的小区可能都会受到影响。因此,服务器在得到上级故障和第一故障信息时,可以进一步确定网络中的潜在故障。为了确定潜在故障,服务器可以先获取与上级故障相关的关联网络数据。Due to the strong correlation between network data, for example, a certain base station manages 3 cells, when the base station fails, the cell managed by the base station may be affected. Therefore, when the server obtains the superior fault and the first fault information, the server can further determine the potential fault in the network. In order to identify potential failures, the server may first obtain associated network data related to the superior failure.
假设,在步骤202中,服务器根据8种业务的网络数据确定出8个第一故障信息,在步骤203中,服务器将这8个第一故障信息划分为2组故障信息,第一组故障信息包括3个第一故障信息:x1、x2和x3,这3个第一故障信息所指示的网络故障的上级故障为A11;第二组故障信息包括5个第一故障信息:y1、y2、y3、y4和y5,这5个第一故障信息所指示的网络故障的上级故障为B11。那么服务器分别获取与A11相关的关联网络数据,以及与 B11相关的关联网络数据。It is assumed that, in
比如第一故障信息为:“小区231业务劣化”,该第一故障信息所指示的网络故障的上级故障为:基站设备故障,那么服务器获取的与该上级故障相关的关联网络数据可以为:小区232的KQI。该基站管理的小区包括小区232和小区231。For example, the first fault information is: “cell 231 service degradation”, the upper fault of the network fault indicated by the first fault information is: the base station equipment fault, and the associated network data related to the upper fault obtained by the server may be: a cell. 232 KQI. The cell managed by the base station includes a cell 232 and a cell 231.
步骤206、服务器根据关联网络数据预测与每个上级故障相关的第二故障信息,该第二故障信息与第一故障信息不同。Step 206: The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.
与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。The network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure.
现以步骤205中的上级故障A11和B11为例对本步骤进行说明,服务器获取与A11相关的关联网络数据p1,然后根据该关联网络数据p1预测与A11相关的第二故障信息;同时,服务器获取与B11相关的关联网络数据p2,然后根据该关联网络数据p2预测与B11相关的第二故障信息。The present step is described by taking the upper faults A11 and B11 in
比如第一故障信息为:“小区231业务劣化”,该第一故障信息所指示的网络故障的上级故障为:基站设备故障,服务器获取的与该上级故障相关的关联网络数据为:小区232的KQI,那么服务器根据该关联网络数据预测的与该上级故障相关的第二故障信息可以为:“小区232业务劣化”。该基站管理的小区包括小区232和小区231。For example, the first fault information is: "cell 231 service degradation", the upper fault of the network fault indicated by the first fault information is: the base station equipment fault, and the associated network data related to the superior fault acquired by the server is: the cell 232 KQI, then the second fault information related to the superior fault predicted by the server according to the associated network data may be: “Cell 232 service degradation”. The cell managed by the base station includes a cell 232 and a cell 231.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息。相应的,如图5所示,步骤206可以包括:Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. Correspondingly, as shown in FIG. 5, step 206 may include:
步骤2061、服务器向关联运维模型输入关联网络数据,以得到该关联运维模型输出的信息,该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型。Step 2061: The server inputs the associated network data to the associated operation and maintenance model to obtain information output by the associated operation and maintenance model, where the associated operation and maintenance model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models.
步骤2062、当关联运维模型输出的信息为故障信息时,服务器将关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Step 2062: When the information output by the associated operation and maintenance model is fault information, the server determines the information output by the associated operation and maintenance model as the second fault information related to each superior fault.
假设8种业务分别为:硬件失效预测业务、性能预测业务、资源预测业务、单域告警压缩业务、跨域告警压缩业务、根因告警分析业务、KPI异常检测业务和业务劣化异常检测业务,那么可以存在8个运维模型,这8个运维模型分别为:硬件失效预测模型、性能预测模型、资源预测模型、单域告警压缩模型、跨域告警压缩模型、根因告警分析模型、KPI异常检测模型和业务劣化异常检测模型。Assume that the eight services are: hardware failure prediction service, performance prediction service, resource prediction service, single domain alarm compression service, cross-domain alarm compression service, root cause alarm analysis service, KPI abnormality detection service, and service degradation abnormality detection service. There are 8 operation and maintenance models, which are: hardware failure prediction model, performance prediction model, resource prediction model, single domain alarm compression model, cross-domain alarm compression model, root cause alarm analysis model, KPI anomaly Detection model and business degradation anomaly detection model.
现以图4所示的上级故障和第一故障信息为例进行说明,上级故障为基站设备故障,3个第一故障信息分别为:“小区231业务劣化”,“ETH链路连接异常”,以及“CPU占用率较高”。服务器获取的与该上级故障相关的关联网络数据可以为:小区232的KQI。然后,服务器向对应的业务劣化异常检测模型输入该关联网络数据,得到该业务劣化异常检测模型输出的故障信息:“小区232业务劣化”,之后,服务器将“小区232业务劣化”确定为第二故障信息。The upper-level fault and the first fault information shown in FIG. 4 are taken as an example. The upper-level fault is a fault of the base station equipment, and the three first fault information are: “cell 231 service degradation” and “ETH link connection abnormality”. And "high CPU usage." The associated network data obtained by the server related to the superior fault may be: KQI of the cell 232. Then, the server inputs the associated network data to the corresponding service degradation anomaly detection model, and obtains the fault information output by the service degradation anomaly detection model: “cell 232 service degradation”, after which the server determines “cell 232 service degradation” as the second. accident details.
步骤207、服务器输出k个上级故障、k组故障信息和预测的所有第二故障信息。Step 207: The server outputs k upper faults, k sets of fault information, and all predicted second fault information.
服务器输出k个上级故障、k组故障信息和预测的所有第二故障信息,以便于工作人员根据k个上级故障、k组故障信息和预测的所有第二故障信息进行故障处理。The server outputs k upper-level faults, k-group fault information, and all predicted second fault information, so that the worker performs fault processing according to k upper-level faults, k-group fault information, and all predicted second fault information.
可选的,服务器可以显示k个上级故障、k组故障信息和预测的所有第二故障信息。Optionally, the server may display k upper faults, k sets of fault information, and all predicted second fault information.
在本发明实施例中,服务器可以根据上级故障和第一故障信息,预测上级故障可能引 起的其余网络故障。本发明实施例提供的这种上级扩散标注选择方式使得工作人员能够根据上级故障、第一故障信息和第二故障信息对网络中的故障和潜在故障进行处理,提高网络的稳定性,保证网络正常运行。In the embodiment of the present invention, the server may predict, according to the superior fault and the first fault information, the remaining network faults that may be caused by the superior fault. The superior diffusion labeling selection method provided by the embodiment of the invention enables the worker to process faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal network. run.
综上所述,本发明实施例提供的网络运维的方法,服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息,然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,之后,服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应,进而使得以便于工作人员能够及时处理网络中的故障和潜在故障,通过该方法,能够对多种业务进行综合处理。In summary, in the network operation and maintenance method provided by the embodiment of the present invention, the server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k superordinate faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the method can comprehensively process multiple services.
在第二种可实现方式中,如图6所示,本发明实施例提供的网络运维的方法可以包括:In a second implementation manner, as shown in FIG. 6, the method for network operation and maintenance provided by the embodiment of the present invention may include:
步骤601、服务器获取m种业务的网络数据,m≥2。Step 601: The server acquires network data of m types of services, where m≥2.
步骤601可以参考步骤201。Step 601 can refer to step 201.
步骤602、服务器根据m种业务的网络数据确定n个第一故障信息。Step 602: The server determines, according to network data of the m types of services, n first fault information.
每个第一故障信息用于指示对应的业务出现网络故障,1≤n≤m。Each first fault information is used to indicate that a network fault occurs in the corresponding service, 1≤n≤m.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,相应的,步骤602可以包括:服务器向m个运维模型输入对应业务的网络数据,以得到m个运维模型输出的信息,每个运维模型输出的信息为故障信息或非故障信息,m个运维模型输出的信息包括n个故障信息;服务器将该n个故障信息确定为n个第一故障信息。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. Correspondingly, the
步骤602可以参考步骤202。Step 602 can refer to step 202.
步骤603、服务器将n个第一故障信息的部分或全部划分为k组故障信息。Step 603: The server divides part or all of the n first fault information into k sets of fault information.
每组故障信息中的第一故障信息所指示的网络故障的上级故障相同。任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障,1≤k≤n。The superior fault of the network fault indicated by the first fault information in each set of fault information is the same. The superior fault of the network fault indicated by any of the first fault information is a fault causing the network fault indicated by any of the first fault information, 1≤k≤n.
步骤603可以参考步骤203。Step 603 can refer to step 203.
步骤604、服务器输出k组故障信息以及k个上级故障。Step 604: The server outputs k sets of fault information and k upper faults.
k个上级故障与k组故障信息一一对应。The k upper level faults correspond to the k group fault information one by one.
服务器输出k组故障信息以及k个上级故障,以便于工作人员根据k个上级故障和k组故障信息进行故障处理。进一步的,工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障,并对潜在故障进行处理。The server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to k upper faults and k sets of fault information. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.
可选的,服务器可以显示k组故障信息以及k个上级故障。Optionally, the server can display k sets of fault information and k superior faults.
步骤605、服务器接收第一标注指令,该第一标注指令用于指示k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。Step 605: The server receives a first labeling instruction, where the first labeling instruction is used to indicate that the first fault information in the k group fault information is correctly predicted and the upper fault in the k upper fault faults are correctly predicted.
示例的,服务器显示出k组故障信息以及k个上级故障之后,工作人员可以根据网络的实际故障情况对服务器显示出来的第一故障信息和上级故障进行标注,标注出服务器预测正确的第一故障信息和上级故障。比如,服务器可以发出提示信息,用于提示工作人员采用第一标注符号来标注服务器预测正确的第一故障信息和预测正确的上级故障,并采用第二标注符号来标注服务器预测错误的第一故障信息和预测错误的上级故障,之后,工作人员采用第一标注符号来标注服务器预测正确的第一故障信息和预测正确的上级故障,采 用第二标注符号来标注服务器预测错误的第一故障信息和预测错误的上级故障。其中,第一标注符号和第二标注符号不同。示例的,第一标注符号可以为对号“√”,第二标注符号可以为错号“×”。For example, after the server displays the k-group fault information and the k-level faults, the staff can mark the first fault information and the superior fault displayed by the server according to the actual fault condition of the network, and mark the first fault that the server predicts correctly. Information and superior failures. For example, the server may issue a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and use the second annotation symbol to mark the first fault of the server prediction error. Information and predicting the fault of the superior fault, after which the staff uses the first call symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and uses the second call symbol to mark the first fault information of the server prediction error and Predict the wrong superior failure. Wherein the first label symbol and the second label symbol are different. For example, the first annotation symbol may be a checkmark "√", and the second annotation symbol may be a wrong identifier "×".
以图4所示的上级故障和该组故障信息为例,假设工作人员确定出服务器关于基站设备故障,ETH链路连接异常,以及CPU占用率较高的预测是正确的,而关于小区231业务劣化的预测是错误的,那么工作人员可以采用“√”对“基站设备故障”,“ETH链路连接异常”,以及“CPU占用率较高”这3个预测结果进行标注,并采用“×”对“小区231业务劣化”这一预测结果进行标注,标注结果如图7所示。Taking the upper-level fault and the fault information shown in FIG. 4 as an example, it is assumed that the staff determines that the server is faulty about the base station equipment, the ETH link connection is abnormal, and the prediction of the high CPU occupancy rate is correct, and the service about the cell 231 is correct. The prediction of degradation is wrong, then the staff can use "√" to mark the three prediction results of "base station equipment failure", "ETH link connection abnormality", and "high CPU occupancy rate", and use "x" The prediction result of "cell 231 business deterioration" is marked, and the result is shown in FIG. 7.
假设工作人员确定出服务器关于ETH链路连接异常的预测是正确的,关于其他3个预测都是错误的,那么工作人员可以采用“√”对“ETH链路连接异常”这一预测结果进行标注,并采用“×”对其他3个预测结果进行标注,标注结果如图8所示。Suppose the staff determines that the server's prediction about the ETH link connection anomaly is correct. If the other three predictions are wrong, the staff can use "√" to mark the prediction result of "ETH link connection anomaly". And use "X" to mark the other three prediction results, the result of which is shown in Figure 8.
步骤606、服务器基于第一标注指令获取第一样本集,该第一样本集包括第一标注指令所指示的信息。Step 606: The server acquires a first sample set according to the first annotation instruction, where the first sample set includes information indicated by the first annotation instruction.
服务器基于步骤605中的第一标注指令获取第一样本集,该第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。The server acquires a first sample set based on the first annotation instruction in
示例的,k等于2,第一组故障信息包括3个第一故障信息:x1、x2和x3,这3个第一故障信息所指示的网络故障的上级故障为A11;第二组故障信息包括5个第一故障信息:y1、y2、y3、y4和y5,这5个第一故障信息所指示的网络故障的上级故障为B11。假设第一标注指令用于指示第一组故障信息中的x1和x2,第二组故障信息中的y4和y5,以及上级故障A11的预测是正确的,那么第一样本集包括的信息为:x1、x2、y4、y5和A11。For example, k is equal to 2, and the first group of fault information includes three first fault information: x1, x2, and x3, and the upper fault of the network fault indicated by the three first fault information is A11; the second group of fault information includes The five first fault information: y1, y2, y3, y4, and y5, and the upper fault of the network fault indicated by the five first fault information is B11. Assuming that the first annotation instruction is used to indicate x1 and x2 in the first set of fault information, y4 and y5 in the second set of fault information, and the prediction of the superior fault A11 is correct, then the information included in the first sample set is :x1, x2, y4, y5, and A11.
步骤607、服务器根据第一样本集获取与第一样本集中每个上级故障相关的关联网络数据。Step 607: The server acquires, according to the first sample set, associated network data related to each superior fault in the first sample set.
假设步骤605中的第一样本集包括的信息为:x1、x2、y4、y5和A11,服务器可以根据该第一样本集获取与上级故障A11相关的关联网络数据,比如,A11为基站设备故障,那么与A11相关的关联网络数据可以是:小区232的KQI。该基站管理的小区包括小区232。It is assumed that the information included in the first sample set in
步骤608、服务器根据关联网络数据预测与每个上级故障相关的第二故障信息,该第二故障信息与第一故障信息不同。Step 608: The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.
与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。在本发明实施例中,由于服务器是根据k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障得到第二故障信息,所以第二故障信息的准确度更高。The network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure. In the embodiment of the present invention, since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息,相应的,步骤608可以包括:服务器向关联运维模型输入关联网络数据,以得到该关联运维模型输出的信息,该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型;当该关联运维模型输出的信息为故障信息时,服务器将该关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, correspondingly,
步骤608可以参考步骤206。Step 608 can refer to step 206.
步骤609、服务器输出第一样本集和预测的所有第二故障信息。Step 609: The server outputs the first sample set and all the predicted second fault information.
第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确 的上级故障。服务器输出第一样本集和预测的所有第二故障信息,以便于工作人员根据服务器预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息进行故障处理。The first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults. The server outputs the first sample set and all the predicted second fault information, so that the worker performs fault processing according to the server predicting the correct first fault information, predicting the correct superior fault, and predicting all the second fault information.
可选的,服务器可以显示第一样本集和预测的所有第二故障信息。Optionally, the server may display the first sample set and all the second fault information predicted.
在本发明实施例中,通过步骤605至步骤609,服务器可以根据工作人员的标注指令,预测出正确的上级故障可能引起的网络故障,使得工作人员能够根据预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息对网络中的故障和潜在故障进行及时处理。且由于第二故障信息的准确度较高,因此还提高了故障的处理效率。In the embodiment of the present invention, through
步骤610、服务器将预测的所有第二故障信息确定为待标注样本集。Step 610: The server determines all the predicted second fault information as a sample set to be labeled.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息。在本发明实施例中,进一步的,为了对评价指标不满足业务要求的运维模型进行更新,进一步提高故障预测的准确性,服务器可以将步骤608中通过运维模型预测的所有第二故障信息确定为待标注样本集,以便于工作人员对该待标注样本集进行标注,得到预测正确的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. In the embodiment of the present invention, in order to further update the operation and maintenance model whose evaluation index does not meet the service requirement, and further improve the accuracy of the fault prediction, the server may use the second fault information predicted by the operation and maintenance model in
步骤611、服务器接收第二标注指令,该第二标注指令用于指示待标注样本集内预测正确的第二故障信息。Step 611: The server receives a second labeling instruction, where the second labeling instruction is used to indicate that the second fault information is correctly predicted in the sample set to be labeled.
示例的,服务器显示出预测的所有第二故障信息之后,工作人员可以根据网络的实际故障情况对服务器显示出来的第二故障信息进行标注,标注出服务器预测正确的第二故障信息。标注方式可以参考步骤605中的图7和图8。For example, after the server displays all the predicted second fault information, the staff can mark the second fault information displayed by the server according to the actual fault condition of the network, and mark the second fault information that the server predicts correctly. Refer to Figure 7 and Figure 8 in
步骤612、服务器基于第二标注指令获取第二样本集,该第二样本集包括第二标注指令所指示的信息。Step 612: The server acquires a second sample set according to the second annotation instruction, where the second sample set includes information indicated by the second annotation instruction.
服务器基于步骤611中的第二标注指令获取第二样本集,该第二样本集包括待标注样本集内预测正确的第二故障信息。The server acquires a second sample set based on the second annotation instruction in
示例的,步骤608中预测的所有第二故障信息包括z1、z2、z3和z4。假设第二标注指令用于指示z1和z2的预测是正确的,那么第二样本集包括的信息为:z1和z2。For example, all of the second fault information predicted in
步骤613、服务器将第一样本集和第二样本集确定为目标样本集。Step 613: The server determines the first sample set and the second sample set as the target sample set.
第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障,第二样本集包括待标注样本集内预测正确的第二故障信息,服务器将第一样本集和第二样本集确定为目标样本集,该目标样本集用于对评价指标不满足业务要求的运维模型进行更新。The first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults, and the second sample set includes the second fault information that is correctly predicted within the sample set to be labeled, and the server will The first sample set and the second sample set are determined as a target sample set, and the target sample set is used to update an operation and maintenance model in which the evaluation index does not satisfy the business requirement.
步骤614、服务器根据目标样本集确定第一运维模型的评价指标,该第一运维模型为m个运维模型中的任一运维模型。Step 614: The server determines, according to the target sample set, an evaluation index of the first operation and maintenance model, where the first operation and maintenance model is any one of the m operation and maintenance models.
服务器根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息确定第一运维模型的评价指标。The server determines an evaluation index of the first operation and maintenance model according to the predicted first failure information, the predicted correct superior failure, and the predicted correct second failure information.
可选的,第一运维模型的评价指标可以为第一运维模型的精度。模型的精度为模型预测正确的结果个数与预测的总结果个数的比值,模型的精度越高,该模型的预测效果就越好。Optionally, the evaluation index of the first operation and maintenance model may be the accuracy of the first operation and maintenance model. The accuracy of the model is the ratio of the number of correct results predicted by the model to the total number of predicted results. The higher the accuracy of the model, the better the prediction effect of the model.
步骤615、当第一运维模型的评价指标不属于指定评价指标范围时,服务器采用目标样 本集对第一运维模型进行更新。Step 615: When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the server updates the first operation and maintenance model by using the target sample set.
当第一运维模型的评价指标为第一运维模型的精度时,对应的指定评价指标范围可以是[f,1],示例的,f可以等于0.4,服务器可以在第一运维模型的评价指标小于0.4时,采用目标样本集对该第一运维模型进行更新。比如,可以采用机器学习算法中的监督学习算法来训练第一运维模型,模型训练过程可以参考相关技术,在此不再赘述。When the evaluation index of the first operation and maintenance model is the accuracy of the first operation and maintenance model, the corresponding specified evaluation index range may be [f, 1], for example, f may be equal to 0.4, and the server may be in the first operation and maintenance model. When the evaluation index is less than 0.4, the first operation and maintenance model is updated by using the target sample set. For example, the supervised learning algorithm in the machine learning algorithm can be used to train the first operation and maintenance model. The model training process can refer to related technologies, and details are not described herein.
可选的,第一运维模型的评价指标也可以为第一运维模型的查准率,模型的查准率越高,该模型的预测效果就越好。第一运维模型的评价指标也可以为错误发现率,模型的错误发现率越小,该模型的预测效果就越好。第一运维模型的评价指标也可以为错误遗漏率等,本发明实施例对第一运维模型的评价指标不做限定,指定评价指标范围可以根据确定的第一运维模型的评价指标来确定。Optionally, the evaluation index of the first operation and maintenance model may also be the precision of the first operation and maintenance model, and the higher the precision of the model, the better the prediction effect of the model. The evaluation index of the first operation dimension model can also be the false discovery rate, and the smaller the error detection rate of the model, the better the prediction effect of the model. The evaluation index of the first operation and maintenance model may also be an error omission rate, etc. The embodiment of the present invention does not limit the evaluation index of the first operation and maintenance model, and the specified evaluation index range may be based on the determined evaluation index of the first operation and maintenance model. determine.
可选的,m个运维模型中每个运维模型由一对应用单元和模型训练器来管理,应用单元用于根据目标样本集确定第一运维模型的评价指标,并在第一运维模型的评价指标不属于指定评价指标范围时,向模型训练器发送模型更新请求,模型训练器用于根据应用单元发送的模型更新请求采用目标样本集对第一运维模型进行更新。Optionally, each operation and maintenance model in the m operation and maintenance models is managed by a pair of application units and a model trainer, and the application unit is configured to determine an evaluation index of the first operation and maintenance model according to the target sample set, and in the first operation When the evaluation index of the dimensional model does not belong to the specified evaluation index range, the model training device sends a model update request, and the model training device is configured to update the first operation and maintenance model by using the target sample set according to the model update request sent by the application unit.
在本发明实施例中,通过步骤610至步骤615,服务器可以根据工作人员的标注指令,得到预测正确的第二故障信息,进而根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息对评价指标不满足业务要求的运维模型进行更新,提高故障预测的准确性,进而提高故障的处理效率。In the embodiment of the present invention, through the
本发明实施例借助工作人员的运维经验对网络中的故障和潜在故障进行有效预测,在本发明实施例中,服务器能够对运维模型进行及时更新,实现了及时预测和准确预测的目的,降低了人力成本,提高了故障的处理效率。通过本发明实施例提供的主动预防被动处理的网络运维的方法,工作人员能够快速获知网络的运行状态,及时对网络中的故障和潜在故障进行处理,提高了网络的稳定性,保证了网络正常运行。The embodiment of the present invention effectively predicts faults and potential faults in the network by using the operation and maintenance experience of the staff. In the embodiment of the present invention, the server can update the operation and maintenance model in time, and achieve the purpose of timely prediction and accurate prediction. Reduce labor costs and improve the processing efficiency of faults. Through the method for actively preventing passive processing of network operation and maintenance provided by the embodiment of the present invention, the staff can quickly know the running state of the network, timely deal with faults and potential faults in the network, improve the stability of the network, and ensure the network. normal operation.
综上所述,本发明实施例提供的网络运维的方法,服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息,然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,之后,服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应,进而使得工作人员能够及时处理网络中的故障和潜在故障,通过该方法,能够对多种业务进行综合处理,还能够对评价指标不满足业务要求的运维模型进行自动更新,提高了故障预测的准确性,提高了故障的处理效率。In summary, in the network operation and maintenance method provided by the embodiment of the present invention, the server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and through this method, comprehensive processing of multiple services can be performed, and The operation and maintenance model whose evaluation index does not meet the business requirements is automatically updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.
需要说明的是,本发明实施例提供的网络运维的方法的步骤的先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。It should be noted that the sequence of the steps of the network operation and maintenance method provided by the embodiment of the present invention may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation, and any technology familiar to those skilled in the art may be disclosed in the present application. The methods that can be easily conceived within the scope of the present invention are covered by the scope of the present application and therefore will not be described again.
本发明实施例提供了一种网络运维的装置,该网络运维的装置可以用于图1所示的服务器,如图9所示,该网络运维的装置900,包括:The embodiment of the present invention provides a device for network operation and maintenance. The network operation and maintenance device can be used for the server shown in FIG. 1. As shown in FIG. 9, the network operation and
第一获取模块910,用于执行上述实施例中的步骤201或步骤601。The first obtaining
第一确定模块920,用于执行上述实施例中的步骤202或步骤602。The first determining
划分模块930,用于执行上述实施例中的步骤203或步骤603。The
第一输出模块940,用于执行上述实施例中的步骤204或步骤604。The
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,第一确定模块920,用于执行上述实施例中的步骤2021或步骤2022。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. The first determining
进一步的,如图10所示,该网络运维的装置900还可以包括:Further, as shown in FIG. 10, the network operation and
第二获取模块950,用于执行上述实施例中的步骤205。The second obtaining
第一预测模块960,用于执行上述实施例中的步骤206。The
第二输出模块970,用于执行上述实施例中的步骤207。The
图10中其他标记含义可以参考图9。The meaning of other marks in FIG. 10 can be referred to FIG.
进一步的,如图11所示,该网络运维的装置900还可以包括:Further, as shown in FIG. 11, the
第一接收模块980,用于执行上述实施例中的步骤605。The
第三获取模块990,用于执行上述实施例中的步骤606。The third obtaining
第四获取模块991,用于执行上述实施例中的步骤607。The fourth obtaining
第二预测模块992,用于执行上述实施例中的步骤608。The
第三输出模块993,用于执行上述实施例中的步骤609。The
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息。进一步的,如图11所示,该网络运维的装置900还可以包括:Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. Further, as shown in FIG. 11, the
第二确定模块994,用于执行上述实施例中的步骤610。The second determining
第二接收模块995,用于执行上述实施例中的步骤611。The
第五获取模块996,用于执行上述实施例中的步骤612。The fifth obtaining
第三确定模块997,用于执行上述实施例中的步骤613。The third determining
第四确定模块998,用于执行上述实施例中的步骤614。The fourth determining
更新模块999,用于执行上述实施例中的步骤615。The
图11中其他标记含义可以参考图9。The meaning of other marks in Fig. 11 can be referred to Fig. 9.
可选的,m种业务的网络数据与m个运维模型一一对应,m个运维模型互不相同,每个运维模型用于对对应业务的网络数据进行预测,输出故障信息或非故障信息,图10中的第一预测模块960或图11中的第二预测模块992,用于执行上述实施例中的步骤2061和步骤2062,包括:Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, the
向关联运维模型输入关联网络数据,以得到关联运维模型输出的信息,该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型;Inputting associated network data to the associated operation and maintenance model to obtain information outputted by the associated operation and maintenance model, wherein the associated operation and maintenance model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models;
当该关联运维模型输出的信息为故障信息时,将该关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。When the information output by the associated operation and maintenance model is fault information, the information output by the associated operation and maintenance model is determined as the second fault information related to each superior fault.
综上所述,本发明实施例提供的网络运维的装置,服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息,然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息,每组故障信息中的第一故障信息所指示的网络故障的上级故障相同,之后,服务器输出k组故障信息以及k个上级故障,k个上级故障与k组故障信息一一对应,进而使得工作人员能够及时处理网络中的故障和潜在故障,通过该装置,能够对多种业务进行综合处理,还能够对评价指标不满足业务要求的运维模型进行更新,提 高了故障预测的准确性,提高了故障的处理效率。In summary, the network operation and maintenance device provided by the embodiment of the present invention can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the device can comprehensively process multiple services, and can also The operation and maintenance model whose evaluation index does not meet the business requirements is updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.
图12是本发明实施例提供的一种网络运维的装置的结构示意图,该装置可以用于图1所示的服务器。如图12所示,该装置包括处理器1201(如CPU)、存储器1202、网络接口1203和总线1204。其中,总线1204用于连接处理器1201、存储器1202和网络接口1203。存储器1202可能包含随机存取存储器(Random Access Memory,RAM),也可能包含非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过网络接口1203(可以是有线或者无线)实现服务器与通信设备之间的通信连接。存储器1202中存储有程序12021,该程序12021用于实现各种应用功能,处理器1201用于执行存储器1202中存储的程序12021来实现图2或图6所示的网络运维的方法。FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance provided by an embodiment of the present invention, and the apparatus may be used in the server shown in FIG. 1. As shown in FIG. 12, the apparatus includes a processor 1201 (such as a CPU), a
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the foregoing apparatus and module can be referred to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质,或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a readable storage medium of a computer or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data The center transmits to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium, or a semiconductor medium (eg, a solid state hard disk) or the like.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description is only an optional embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.
Claims (14)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810026962.2A CN109905268B (en) | 2018-01-11 | 2018-01-11 | Method and device for network operation and maintenance |
CN201810026962.2 | 2018-01-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019137052A1 true WO2019137052A1 (en) | 2019-07-18 |
Family
ID=66943236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/109903 WO2019137052A1 (en) | 2018-01-11 | 2018-10-11 | Method and device for network operation and maintenance |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109905268B (en) |
WO (1) | WO2019137052A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026624A (en) * | 2019-11-11 | 2020-04-17 | 国网甘肃省电力公司信息通信公司 | Fault prediction method and device for power grid information system |
CN112884159A (en) * | 2019-11-30 | 2021-06-01 | 华为技术有限公司 | Model updating system, model updating method and related equipment |
CN114978862A (en) * | 2022-06-21 | 2022-08-30 | 浪潮通信信息系统有限公司 | Fault risk analysis method and device of transmission network and electronic equipment |
CN118042492A (en) * | 2024-04-11 | 2024-05-14 | 深圳市友恺通信技术有限公司 | Network data operation and maintenance management system and method based on 5G communication |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114745731A (en) * | 2020-12-24 | 2022-07-12 | 中国移动通信集团北京有限公司 | Data analysis method, device, equipment and storage medium |
CN116684327B (en) * | 2023-08-03 | 2023-10-27 | 中维建技术有限公司 | Mountain area communication network fault monitoring and evaluating method based on cloud computing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102164051A (en) * | 2011-05-18 | 2011-08-24 | 西安交通大学 | Service-Oriented Fault Detection and Location Method |
US20130286852A1 (en) * | 2012-04-27 | 2013-10-31 | General Instrument Corporation | Estimating Physical Locations of Network Faults |
CN104348667A (en) * | 2014-11-11 | 2015-02-11 | 上海新炬网络技术有限公司 | Fault positioning method based on warning information |
CN106998256A (en) * | 2016-01-22 | 2017-08-01 | 腾讯科技(深圳)有限公司 | A kind of communication failure localization method and server |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9071514B1 (en) * | 2012-12-17 | 2015-06-30 | Juniper Networks, Inc. | Application-specific connectivity loss detection for multicast virtual private networks |
CN106603293A (en) * | 2016-12-20 | 2017-04-26 | 南京邮电大学 | Network fault diagnosis method based on deep learning in virtual network environment |
CN107171831B (en) * | 2017-04-28 | 2020-09-11 | 华为技术有限公司 | Network deployment method and device |
CN107528832B (en) * | 2017-08-04 | 2020-07-07 | 北京中晟信达科技有限公司 | Baseline construction and unknown abnormal behavior detection method for system logs |
-
2018
- 2018-01-11 CN CN201810026962.2A patent/CN109905268B/en active Active
- 2018-10-11 WO PCT/CN2018/109903 patent/WO2019137052A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102164051A (en) * | 2011-05-18 | 2011-08-24 | 西安交通大学 | Service-Oriented Fault Detection and Location Method |
US20130286852A1 (en) * | 2012-04-27 | 2013-10-31 | General Instrument Corporation | Estimating Physical Locations of Network Faults |
CN104348667A (en) * | 2014-11-11 | 2015-02-11 | 上海新炬网络技术有限公司 | Fault positioning method based on warning information |
CN106998256A (en) * | 2016-01-22 | 2017-08-01 | 腾讯科技(深圳)有限公司 | A kind of communication failure localization method and server |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026624A (en) * | 2019-11-11 | 2020-04-17 | 国网甘肃省电力公司信息通信公司 | Fault prediction method and device for power grid information system |
CN111026624B (en) * | 2019-11-11 | 2023-06-02 | 国网甘肃省电力公司信息通信公司 | Fault prediction method and device of power grid information system |
CN112884159A (en) * | 2019-11-30 | 2021-06-01 | 华为技术有限公司 | Model updating system, model updating method and related equipment |
EP4050528A4 (en) * | 2019-11-30 | 2022-12-28 | Huawei Technologies Co., Ltd. | Model update system, model update method, and related device |
CN114978862A (en) * | 2022-06-21 | 2022-08-30 | 浪潮通信信息系统有限公司 | Fault risk analysis method and device of transmission network and electronic equipment |
CN114978862B (en) * | 2022-06-21 | 2024-03-12 | 浪潮通信信息系统有限公司 | Fault risk analysis method and device for transmission network and electronic equipment |
CN118042492A (en) * | 2024-04-11 | 2024-05-14 | 深圳市友恺通信技术有限公司 | Network data operation and maintenance management system and method based on 5G communication |
Also Published As
Publication number | Publication date |
---|---|
CN109905268A (en) | 2019-06-18 |
CN109905268B (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019137052A1 (en) | Method and device for network operation and maintenance | |
CN113328872B (en) | Fault repairing method, device and storage medium | |
Jin et al. | Nevermind, the problem is already fixed: proactively detecting and troubleshooting customer dsl problems | |
WO2014040633A1 (en) | Identifying fault category patterns in a communication network | |
US11985523B2 (en) | Method and device for managing multiple remote radio heads in communication network | |
WO2022061900A1 (en) | Method for determining fault autonomy capability and related device | |
CN116757447B (en) | Test task allocation method and system of intelligent quick-checking device | |
US10841820B2 (en) | Method and test system for mobile network testing as well as prediction system | |
CN102684902A (en) | Network fault positioning method based on probe prediction | |
CN116074178A (en) | Digital twin architecture of network, network session processing method and device | |
US10805186B2 (en) | Mobile communication network failure monitoring system and method | |
CN115334560A (en) | Method, device and equipment for monitoring base station abnormity and computer readable storage medium | |
CN113656252B (en) | Fault positioning method, device, electronic equipment and storage medium | |
CN115865611A (en) | Method, device and electronic device for troubleshooting network equipment | |
CN111901156B (en) | Method and device for monitoring faults | |
CN110337118A (en) | Customer complaint immediate processing method and device | |
CN115086143A (en) | Fault early warning method and device | |
CN104284353B (en) | A kind of wireless local area network service performance test methods and system | |
CN101291255B (en) | Heuristic fault positioning method for network of next generation | |
CN118826298B (en) | Monitoring method, monitoring device and monitoring equipment for electric meter box | |
CN113487085B (en) | Method and device for predicting service life of equipment based on joint learning framework, computer equipment and computer readable storage medium | |
CN112291804B (en) | Service fault diagnosis method for noise network under 5G network slice | |
TWI763177B (en) | Management system and method for a plurality of network devices and computer readable medium | |
CN118740688A (en) | A gateway device testing method, system, computer and readable storage medium | |
CN118200949A (en) | Fault monitoring system and method for communication equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18900259 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18900259 Country of ref document: EP Kind code of ref document: A1 |