WO2019137052A1

WO2019137052A1 - Method and device for network operation and maintenance

Info

Publication number: WO2019137052A1
Application number: PCT/CN2018/109903
Authority: WO
Inventors: 潘璐伽; 张家劲; 张建锋; 叶君健
Original assignee: 华为技术有限公司
Priority date: 2018-01-11
Filing date: 2018-10-11
Publication date: 2019-07-18
Also published as: CN109905268A; CN109905268B

Abstract

The present application provides a method and device for network operation and maintenance, and belongs to the technical field of networks. The method comprises: a server acquiring network data of m types of services, where m ≥ 2; then, the server determining n pieces of first fault information according to the network data of the m types of services, each piece of first fault information being used to indicate that a corresponding service has encountered a network fault, where 1 ≤ n ≤ m; subsequently, the server dividing part or all of the n pieces of first fault information into k sets of fault information, a superordinate fault of a network fault indicated by the first fault information in each set of fault information being the same, where 1 ≤ k ≤ n; the server outputting the k sets of fault information and the k superordinate faults, the k superordinate faults corresponding one-to-one to the k sets of fault information. Further, the server can also predict potential faults in a network. The present application resolves the issue in which a plurality of services cannot be comprehensively processed in network operation and maintenance modes in the art, realizes comprehensive processing a plurality of services, improves the accuracy of fault prediction, and increases the efficiency of fault processing, and is applicable to network operation and maintenance.

Description

Network operation and maintenance method and device

本申请要求于2018年01月11日提交的申请号为201810026962.2、申请名称为“网络运维的方法及装置”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. PCT Application No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

本申请涉及网络技术领域，特别涉及一种网络运维的方法及装置。The present application relates to the field of network technologies, and in particular, to a method and apparatus for network operation and maintenance.

Background technique

在数据业务时代，用户体验是服务的核心，稳定可靠的网络配合良好的用户体验，能够帮助运营商快速发展业务，网络运维用于保证网络与业务安全有效运行，如何进行网络运维，保障用户体验是十分重要的问题。In the data service era, the user experience is the core of the service. The stable and reliable network and the good user experience can help operators to rapidly develop services. The network operation and maintenance is used to ensure the safe operation of the network and services, and how to carry out network operation and maintenance. User experience is a very important issue.

相关技术中有一种网络运维方式，这种网络运维方式是先采用非监督学习模型对业务的网络数据进行异常检测，然后将检测结果呈现给工作人员，工作人员对检测结果的准确性进行判断，将正确的检测结果作为训练样本，接着对该训练样本进行训练得到监督学习模型，之后采用该监督学习模型对业务的网络数据进行异常检测。There is a network operation and maintenance method in the related art. This network operation and maintenance method first uses an unsupervised learning model to perform abnormality detection on the network data of the service, and then presents the detection result to the staff, and the staff performs the accuracy of the detection result. Judging, the correct detection result is taken as a training sample, and then the training sample is trained to obtain a supervised learning model, and then the supervised learning model is used to perform abnormality detection on the network data of the service.

但上述网络运维方式仅能够对一种业务进行处理，无法对多种业务进行综合处理，而随着网络技术的快速发展，网络业务越来越丰富，亟需一种针对多种业务进行综合处理的网络运维方式。However, the above network operation and maintenance mode can only process one type of service, and cannot comprehensively process multiple services. With the rapid development of network technology, network services are becoming more and more abundant, and it is urgent to integrate multiple services. The network operation and maintenance method handled.

发明内容Summary of the invention

本发明实施例提供了一种网络运维的方法及装置，可以解决相关技术中网络运维方式无法对多种业务进行综合处理的问题，所述技术方案如下：The embodiment of the invention provides a method and a device for network operation and maintenance, which can solve the problem that the network operation and maintenance mode cannot comprehensively process a plurality of services in the related art, and the technical solution is as follows:

第一方面，提供了一种网络运维的方法，该方法包括：服务器先获取m种业务的网络数据，m≥2，再根据m种业务的网络数据确定n个第一故障信息，每个第一故障信息用于指示对应的业务出现网络故障，1≤n≤m。然后，服务器将n个第一故障信息的部分或全部划分为k组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障，1≤k≤n。之后服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应。The first aspect provides a method for network operation and maintenance, the method includes: the server first acquires network data of the m types of services, m≥2, and then determines n first fault information according to the network data of the m types of services, and each The first fault information is used to indicate that the corresponding service has a network fault, 1≤n≤m. Then, the server divides part or all of the n first fault information into k sets of fault information, and the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information. The superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ≤ k ≤ n. After that, the server outputs k sets of fault information and k upper level faults, and k upper level faults correspond one-to-one with the k group fault information.

可选的，m种业务可以包括预测类业务、告警压缩类业务和异常检测类业务等。Optionally, the m services may include a predictive service, an alarm compression service, and an abnormality detection service.

可选的，服务器可以显示k组故障信息以及k个上级故障。Optionally, the server can display k sets of fault information and k superior faults.

在本发明实施例中，服务器能够根据多种业务的网络数据确定第一故障信息和上级故障，以便于工作人员进行故障处理。进一步的，工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障，并对潜在故障进行处理。In the embodiment of the present invention, the server can determine the first fault information and the superior fault according to the network data of the multiple services, so that the staff can perform fault processing. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.

可选的，在输出k组故障信息以及k个上级故障之后，该方法还可以包括：服务器根据k个上级故障和每个上级故障对应的第一故障信息，获取与每个上级故障相关的关联网络数据，再根据关联网络数据预测与每个上级故障相关的第二故障信息，第二故障信息与第一故障信息不同。之后，服务器输出k个上级故障、k组故障信息和预测的所有第二故障信息。Optionally, after outputting the k-group fault information and the k-level faults, the method may further include: the server acquiring, according to the k upper-level faults and the first fault information corresponding to each of the upper-level faults, the association related to each superior fault. The network data is further predicted according to the associated network data, and the second fault information is different from the first fault information. Thereafter, the server outputs k superior faults, k sets of fault information, and all predicted second fault information.

在本发明实施例中，与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。In the embodiment of the present invention, the network fault indicated by the second fault information related to the superior fault refers to a network fault that can be caused by the superior fault.

可选的，服务器可以显示k个上级故障、k组故障信息和预测的所有第二故障信息。Optionally, the server may display k upper faults, k sets of fault information, and all predicted second fault information.

由于网络数据之间的关联性较强，所以在本发明实施例中，服务器在得到上级故障和第一故障信息时，可以根据上级故障和第一故障信息，预测上级故障可能引起的其余网络故障，这种上级扩散标注选择方式使得工作人员能够根据上级故障、第一故障信息和第二故障信息对网络中的故障和潜在故障进行及时处理，提高网络的稳定性，保证网络正常运行。Because the correlation between the network data is strong, in the embodiment of the present invention, when the server obtains the superior fault and the first fault information, the server may predict the remaining network faults that may be caused by the superior fault according to the superior fault and the first fault information. The superior diffusion labeling selection method enables the staff to timely deal with faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal operation of the network.

可选的，在输出k组故障信息以及k个上级故障之后，该方法还可以包括：服务器接收第一标注指令，该第一标注指令用于指示k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。接着，服务器基于第一标注指令获取第一样本集，该第一样本集包括第一标注指令所指示的信息。然后，服务器根据第一样本集获取与第一样本集中每个上级故障相关的关联网络数据，再根据关联网络数据预测与每个上级故障相关的第二故障信息，第二故障信息与第一故障信息不同。之后服务器输出第一样本集和预测的所有第二故障信息。Optionally, after the k-group fault information and the k-level faults are output, the method may further include: the server receiving the first labeling instruction, where the first labeling instruction is used to indicate that the k-group fault information predicts the correct first fault information. Predict the correct superior fault with k superior faults. Next, the server acquires a first sample set based on the first annotation instruction, the first sample set including information indicated by the first annotation instruction. Then, the server acquires the associated network data related to each superior fault in the first sample set according to the first sample set, and then predicts the second fault information related to each superior fault according to the associated network data, and the second fault information and the second fault information A fault message is different. The server then outputs the first sample set and all of the predicted second failure information.

可选的，服务器可以发出提示信息，用于提示工作人员采用第一标注符号来标注服务器预测正确的上级故障和预测正确的第一故障信息，并采用第二标注符号来标注服务器预测错误的上级故障和预测错误的第一故障信息。Optionally, the server may send a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct superior failure and predict the correct first failure information, and use the second annotation symbol to mark the superior of the server prediction error. The first fault message for faults and predicted errors.

可选的，服务器可以显示第一样本集和预测的所有第二故障信息。Optionally, the server may display the first sample set and all the second fault information predicted.

在本发明实施例中，由于服务器是根据k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障得到第二故障信息，所以第二故障信息的准确度更高。In the embodiment of the present invention, since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .

在本发明实施例中，服务器可以根据工作人员的标注指令，预测出正确的上级故障可能引起的网络故障，使得工作人员能够根据预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息对网络中的故障和潜在故障进行及时处理。且由于第二故障信息的准确度较高，因此还提高了故障的处理效率。In the embodiment of the present invention, the server can predict the network fault that may be caused by the correct superior fault according to the staff's labeling instruction, so that the staff can predict the correct first fault information, predict the correct superior fault, and predict all. The second fault information timely processes faults and potential faults in the network. Moreover, since the accuracy of the second fault information is high, the processing efficiency of the fault is also improved.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息，在输出第一样本集和预测的第二故障信息之后，该方法还可以包括：服务器将预测的所有第二故障信息确定为待标注样本集，再接收第二标注指令，第二标注指令用于指示待标注样本集内预测正确的第二故障信息。然后服务器基于第二标注指令获取第二样本集，该第二样本集包括第二标注指令所指示的信息，接着服务器将第一样本集和第二样本集确定为目标样本集，之后，服务器根据目标样本集确定第一运维模型的评价指标，该第一运维模型为m个运维模型中的任一运维模型。当第一运维模型的评价指标不属于指定评价指标范围时，服务器再采用目标样本集对第一运维模型进行更新。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, after outputting the first sample set and the predicted second fault information, the method may further include: the server determining all the predicted second fault information as the sample set to be labeled, and then receiving the second labeling instruction, and second The labeling instruction is used to indicate the second fault information that is correctly predicted within the sample set to be labeled. The server then acquires a second sample set based on the second annotation instruction, the second sample set includes information indicated by the second annotation instruction, and then the server determines the first sample set and the second sample set as the target sample set, and then the server The evaluation index of the first operation and maintenance model is determined according to the target sample set, and the first operation and maintenance model is any operation and maintenance model of the m operation and maintenance models. When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the server uses the target sample set to update the first operation and maintenance model.

可选的，第一运维模型的评价指标可以为第一运维模型的精度、查准率或错误发现率等。指定评价指标范围可以根据确定的第一运维模型的评价指标来确定。Optionally, the evaluation index of the first operation and maintenance model may be the accuracy, the precision, or the false discovery rate of the first operation and maintenance model. The specified evaluation index range can be determined according to the determined evaluation index of the first operation and maintenance model.

在发明实施例中，服务器可以根据工作人员的标注指令，得到预测正确的第二故障信息，进而根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息对评价指标不满足业务要求的运维模型进行更新，提高故障预测的准确性，进而提高故障的处理效率。In the embodiment of the present invention, the server may obtain the second fault information that is predicted correctly according to the labeling instruction of the staff, and further, according to the predicted first fault information, predict the correct superior fault, and predict the correct second fault information to the evaluation index. The operation and maintenance model that does not meet the business requirements is updated to improve the accuracy of fault prediction, thereby improving the processing efficiency of the fault.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息，服务器根据关联网络数据预测与每个上级故障相关的第二故障信息，可以包括：服务器先向关联运维模型输入关联网络数据，以得到关联运维模型输出的信息，该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型，当该关联运维模型输出的信息为故障信息时，将关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, the server predicting the second fault information related to each of the upper faults according to the associated network data, may include: the server first inputs the associated network data to the associated operation and maintenance model to obtain the information output by the associated operation and maintenance model, and the associated operation and maintenance The model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models. When the information output by the associated operation and maintenance model is fault information, the information output by the associated operation and maintenance model is determined to be related to each superior failure. The second fault information.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同。服务器根据m种业务的网络数据确定n个第一故障信息，可以包括：服务器向m个运维模型输入对应业务的网络数据，以得到m个运维模型输出的信息，每个运维模型输出的信息为故障信息或非故障信息，m个运维模型输出的信息包括n个故障信息。之后，服务器将该n个故障信息确定为n个第一故障信息。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. The server determines the n first fault information according to the network data of the m types of services, and may include: the server inputs the network data of the corresponding service to the m operation and maintenance models, to obtain the information output by the m operation and maintenance models, and the output of each operation and maintenance model. The information is fault information or non-fault information, and the information output by the m operation and maintenance models includes n fault information. Thereafter, the server determines the n pieces of failure information as n pieces of first failure information.

第二方面，提供了一种网络运维的装置，该网络运维的装置包括至少一个模块，至少一个模块用于实现上述第一方面所述的网络运维的方法。In a second aspect, a device for network operation and maintenance is provided. The device for network operation and maintenance includes at least one module, and at least one module is used to implement the network operation and maintenance method described in the first aspect.

第三方面，提供了一种网络运维的装置，该装置包括处理器、存储器、网络接口和总线。其中，总线用于连接处理器、存储器和网络接口。网络接口用于实现服务器与通信设备之间的通信连接。处理器用于执行存储器中存储的程序来实现第一方面所述的网络运维的方法。In a third aspect, an apparatus for network operation and maintenance is provided, the apparatus comprising a processor, a memory, a network interface, and a bus. Among them, the bus is used to connect the processor, memory and network interface. The network interface is used to implement a communication connection between the server and the communication device. The processor is configured to execute a program stored in a memory to implement the method of network operation and maintenance described in the first aspect.

第四方面，提供了一种计算机可读存储介质，该计算机可读存储介质中存储有指令，当该计算机可读存储介质在计算机上运行时，使得计算机执行第一方面所述的网络运维的方法。In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing instructions for causing a computer to perform the network operation and maintenance described in the first aspect when the computer readable storage medium is run on a computer Methods.

第五方面，提供了一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得计算机执行第一方面所述的网络运维的方法。In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the method of network operation and maintenance described in the first aspect when the computer program product is run on a computer is provided.

上述第二方面至第五方面所获得的技术效果与第一方面中对应的技术手段所获得的技术效果近似，在这里不再赘述。The technical effects obtained by the above second to fifth aspects are similar to those obtained by the corresponding technical means in the first aspect, and are not described herein again.

本发明实施例提供的技术方案带来的有益效果是：The beneficial effects brought by the technical solutions provided by the embodiments of the present invention are:

服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息，然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，之后，服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应，进而使得工作人员能够及时处理网络中的故障和潜在故障，通过本发明实施例，能够对多种业务进行综合处理，还能够对评价指标不满足业务要求的运维模型进行自动更新，提高了故障预测的准确性，提高了故障的处理效率。The server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then divide part or all of the n first fault information into k (1 ≤ k ≤ n) The group failure information is the same as the upper fault of the network fault indicated by the first fault information in each set of fault information. After that, the server outputs k sets of fault information and k upper faults, k upper faults and k sets of fault information. Correspondingly, the staff can process faults and potential faults in the network in time. According to the embodiment of the present invention, various services can be comprehensively processed, and the operation and maintenance model whose evaluation index does not meet the service requirements can be automatically updated and improved. The accuracy of fault prediction improves the processing efficiency of faults.

DRAWINGS

图1是本发明实施例所涉及的实施环境示意图；1 is a schematic diagram of an implementation environment according to an embodiment of the present invention;

图2是本发明实施提供的一种网络运维的方法的方法流程图；2 is a flowchart of a method for a network operation and maintenance method provided by an implementation of the present invention;

图3是本发明实施提供的一种确定第一故障信息的方法流程图；3 is a flowchart of a method for determining first fault information provided by an implementation of the present invention;

图4是本发明实施提供的一种第一故障信息和上级故障的示意图；4 is a schematic diagram of a first fault information and a superior fault provided by the implementation of the present invention;

图5是本发明实施提供的一种预测第二故障信息的方法流程图；FIG. 5 is a flowchart of a method for predicting second fault information according to an embodiment of the present invention; FIG.

图6是本发明实施提供的另一种网络运维的方法的方法流程图；6 is a flowchart of a method for another network operation and maintenance method provided by the implementation of the present invention;

图7是本发明实施提供的图4所示的上级故障和第一故障信息图的标注示意图；7 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention;

图8是本发明实施提供的图4所示的上级故障和第一故障信息图的标注示意图；8 is a schematic diagram showing the marking of the upper fault and the first fault information diagram shown in FIG. 4 according to the implementation of the present invention;

图9是本发明实施例提供的一种网络运维的装置的结构示意图；FIG. 9 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention;

图10是本发明实施例提供的另一种网络运维的装置的结构示意图；FIG. 10 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention;

图11是本发明实施例提供的再一种网络运维的装置的结构示意图；11 is a schematic structural diagram of another apparatus for network operation and maintenance provided by an embodiment of the present invention;

图12是本发明实施例提供的一种网络运维的装置的结构示意图。FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance according to an embodiment of the present invention.

Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

图1是本发明实施例所涉及的实施环境示意图，如图1所示，该实施环境可以包括服务器001和通信设备002，示例的，通信设备002可以为基站。基站用于使小区中的终端10进行通信，服务器001可以从基站上获取多种业务的网络数据。服务器001可以是一台服务器，或者由若干台服务器组成的服务器集群，或者是一个云计算服务中心。1 is a schematic diagram of an implementation environment according to an embodiment of the present invention. As shown in FIG. 1 , the implementation environment may include a server 001 and a communication device 002. For example, the communication device 002 may be a base station. The base station is configured to enable the terminal 10 in the cell to communicate, and the server 001 can acquire network data of multiple services from the base station. Server 001 can be a server, or a server cluster consisting of several servers, or a cloud computing service center.

在本发明实施例中，服务器001用于获取m(m≥2)种业务的网络数据，根据该多种业务的网络数据确定n(1≤n≤m)个第一故障信息，然后将该n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，之后，输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应，进而使得工作人员进行故障处理。进一步的，在一种可实现方式中，为了避免潜在故障对网络造成影响，服务器还可以根据上级故障和第一故障信息预测潜在故障；在另一种可实现方式中，为了提高故障预测的准确性，服务器还可以根据工作人员的标注指令确定预测正确的上级故障和预测正确的第一故障信息，然后再基于预测正确的上级故障和预测正确的第一故障信息预测潜在故障。下面以这两种可实现方式为例对本发明实施例提供的网络运维的方法进行说明。In the embodiment of the present invention, the server 001 is configured to obtain network data of m (m≥2) services, and determine n (1≤n≤m) first fault information according to the network data of the multiple services, and then Part or all of the n first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the k-group fault is output. The information and the k upper-level faults, the k upper-level faults and the k-group fault information are in one-to-one correspondence, thereby enabling the staff to perform fault processing. Further, in an achievable manner, in order to avoid potential failures affecting the network, the server may also predict potential faults according to the superior fault and the first fault information; in another achievable manner, in order to improve the accuracy of the fault prediction Sex, the server can also determine the correct superior fault and predict the correct first fault information according to the staff's labeling instructions, and then predict the potential fault based on predicting the correct superior fault and predicting the correct first fault information. The method for network operation and maintenance provided by the embodiment of the present invention is described below by taking the two implementations as an example.

在一种可实现方式中，本发明实施例提供的网络运维的方法如图2所示，可以包括：In an implementation manner, the method for network operation and maintenance provided by the embodiment of the present invention is as shown in FIG. 2, and may include:

步骤201、服务器获取m种业务的网络数据，m≥2。Step 201: The server acquires network data of m types of services, where m≥2.

参见图1，服务器从通信设备上获取业务的网络数据，示例的，服务器可以从基站上获取业务的网络数据。Referring to FIG. 1, the server obtains network data of the service from the communication device. For example, the server may acquire network data of the service from the base station.

示例的，服务器获取的m种业务可以包括预测类业务、告警压缩类业务和异常检测类业务等。其中，预测类业务可以包括硬件失效预测业务、性能预测业务和资源预测业务等；告警压缩类业务可以包括单域告警压缩业务、跨域告警压缩业务和根因告警分析业务等；异常检测类业务可以包括关键性能指标(Key Performance Indicator，KPI)异常检测业务和业务劣化异常检测业务，下面对每种业务做一简要说明。For example, the m services acquired by the server may include a predictive service, an alarm compression service, and an abnormality detection service. The predictive service may include a hardware failure prediction service, a performance prediction service, and a resource prediction service. The alarm compression service may include a single domain alarm compression service, an inter-area alarm compression service, and a root cause alarm analysis service. It can include key performance indicator (KPI) anomaly detection service and service degradation anomaly detection service. A brief description of each service is provided below.

硬件失效预测业务用于对即将失效的硬件进行预测，进而及时更换或维修即将失效的硬件，比如可以根据硬件的相关性能指标和设置在硬件上的传感器采集的硬件数据进行预测，示例的，预测的硬件可以为单板、硬盘或光模块等。性能预测业务用于对网络性能指标(比如带宽、吞吐量和时延等)进行预测。资源预测业务用于对网络资源(比如中央处理器(Central Processing Unit，CPU)占用率等)进行预测。告警压缩类业务用于对网络中产生的大量告警数据进行压缩，得到影响网络的重要告警数据，告警压缩类业务中的单域告警压缩业务用于对同一产品域内的告警数据进行压缩，比如，可以将接入层的网络设备看作是同一产品域的通信设备。跨域告警压缩业务用于对不同产品域的告警数据进行压缩。根因告警分析业务用于对影响网络的基本告警数据进行分析。异常检测类业务用于对网络中各种指标进行实时监控并上报异常信息。异常检测类业务中的KPI异常检测业务用于对KPI(比如丢包率的KPI和通话质量的KPI等)进行实时监控。业务劣化异常检测业务用于对关键质量指标(Key Quality Indicator，KQI)进行实时监控。其中，KPI用于监测网络的运行状态，KQI用于度量业务的好坏。The hardware failure prediction service is used to predict the hardware that is about to fail, and then replace or repair the hardware that is about to fail in time. For example, the hardware performance data and the hardware data collected by the sensor can be used for prediction, for example, prediction. The hardware can be a single board, a hard disk, or an optical module. Performance prediction services are used to predict network performance metrics such as bandwidth, throughput, and latency. The resource prediction service is used to predict network resources (such as the central processing unit (CPU) occupancy rate, etc.). The alarm compression service is used to compress a large amount of alarm data generated in the network to obtain important alarm data that affects the network. The single-domain alarm compression service in the alarm compression service is used to compress alarm data in the same product domain. For example, The network devices of the access layer can be regarded as communication devices of the same product domain. The inter-area alarm compression service is used to compress alarm data of different product domains. The root cause alarm analysis service is used to analyze the basic alarm data that affects the network. The anomaly detection service is used to monitor various indicators in the network and report abnormal information. The KPI anomaly detection service in the anomaly detection service is used to monitor KPIs (such as KPIs of packet loss rate and KPIs of call quality) in real time. The service degradation anomaly detection service is used to monitor key quality indicators (KQI) in real time. Among them, KPI is used to monitor the running status of the network, and KQI is used to measure the quality of the business.

示例的，服务器获取的硬件失效预测业务的网络数据可以包括硬件的相关性能指标和传感器采集的硬件数据等，获取的性能预测业务的网络数据可以包括网络性能指标等数据，获取的资源预测业务的网路数据可以包括网络资源等数据，获取的单域告警压缩业务的网络数据可以包括同一产品域内的告警数据，获取的跨域告警压缩业务的网络数据可以包括不同产品域的告警数据，获取的KPI异常检测业务的网络数据可以包括KPI等数据，获取的业务劣化异常检测业务的网络数据可以包括KQI等数据。For example, the network data of the hardware failure prediction service acquired by the server may include related performance indicators of the hardware and hardware data collected by the sensor, etc., and the network data of the obtained performance prediction service may include data such as network performance indicators, and the obtained resource prediction service The network data may include data such as network resources, and the network data of the obtained single-domain alarm compression service may include alarm data in the same product domain, and the acquired network data of the cross-domain alarm compression service may include alarm data of different product domains, and obtained. The network data of the KPI abnormality detecting service may include data such as KPI, and the acquired network data of the service degradation abnormality detecting service may include data such as KQI.

需要说明的是，服务器获取每种业务的网络数据的周期可以根据对应业务需要来确定，比如，该周期可以为20分钟或者1小时。It should be noted that the period in which the server obtains the network data of each service may be determined according to the corresponding service requirement, for example, the period may be 20 minutes or 1 hour.

步骤202、服务器根据m种业务的网络数据确定n个第一故障信息，每个第一故障信息用于指示对应的业务出现网络故障，1≤n≤m。Step 202: The server determines, according to the network data of the m types of services, n first fault information, where each first fault information is used to indicate that a network fault occurs in the corresponding service, where 1≤n≤m.

可选的，在本发明实施例中，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，相应的，如图3所示，步骤202可以包括：Optionally, in the embodiment of the present invention, the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. Correspondingly, as shown in FIG. 3, the step 202 may include:

步骤2021、服务器向m个运维模型输入对应业务的网络数据，以得到m个运维模型输出的信息，每个运维模型输出的信息为故障信息或非故障信息，m个运维模型输出的信息包括n个故障信息。Step 2021: The server inputs network data of the corresponding service to the m operation and maintenance models to obtain information outputted by the m operation and maintenance models, and the information output by each operation and maintenance model is fault information or non-fault information, and m operation and maintenance model outputs. The information includes n fault information.

在本发明实施例中，服务器可以采用运维模型根据业务的网络数据确定第一故障信息，假设步骤201中的m种业务包括预测类业务、告警压缩类业务和异常检测类业务，那么用于确定第一故障信息的运维模型可以包括：预测类模型、告警压缩类模型和异常检测类模型。假设在步骤201中，服务器获取到8种业务的网络数据，这8种业务分别为：硬件失效预测业务、性能预测业务、资源预测业务、单域告警压缩业务、跨域告警压缩业务、根因告警分析业务、KPI异常检测业务和业务劣化异常检测业务，那么，预测类模型可以包括硬件失效预测模型、性能预测模型和资源预测模型；告警压缩类模型可以包括单域告警压缩模型、跨域告警压缩模型和根因告警分析模型；异常检测类模型可以包括KPI异常检测模型和业务劣化异常检测模型，运维模型的总数量为8。8种业务的网络数据与8个运维模型一一对应，8个运维模型互不相同。In the embodiment of the present invention, the server may use the operation and maintenance model to determine the first fault information according to the network data of the service, and assume that the types of services in the step 201 include the predictive service, the alarm compression service, and the abnormality detection service, and then The operation and maintenance model for determining the first failure information may include: a prediction class model, an alarm compression class model, and an anomaly detection class model. Assume that in step 201, the server obtains network data of eight types of services: hardware failure prediction service, performance prediction service, resource prediction service, single domain alarm compression service, cross-domain alarm compression service, and root cause. The alarm analysis service, the KPI anomaly detection service, and the service degradation anomaly detection service, the prediction class model may include a hardware failure prediction model, a performance prediction model, and a resource prediction model; the alarm compression model may include a single domain alarm compression model and an inter-domain alarm. The compression model and the root cause alarm analysis model; the anomaly detection class model may include a KPI anomaly detection model and a service degradation anomaly detection model, and the total number of operation and maintenance models is 8. The network data of the eight kinds of services corresponds to the eight operation and maintenance models one by one. The eight operation and maintenance models are different from each other.

服务器向这8个运维模型输入对应业务的网络数据，以得到8个运维模型输出的信息，比如，服务器向硬件失效预测模型输入硬件失效预测业务的网络数据，得到硬件失效预测模型输出的故障信息。又比如，服务器向性能预测模型输入性能预测业务的网络数据，得到性能预测模型输出的故障信息。The server inputs the network data of the corresponding service to the eight operation and maintenance models to obtain the information outputted by the eight operation and maintenance models. For example, the server inputs the network data of the hardware failure prediction service to the hardware failure prediction model, and obtains the output of the hardware failure prediction model. accident details. For another example, the server inputs the network data of the performance prediction service to the performance prediction model, and obtains the failure information output by the performance prediction model.

步骤2022、服务器将n个故障信息确定为n个第一故障信息。Step 2022: The server determines n pieces of fault information as n pieces of first fault information.

如果每个运维模型输出的信息均为故障信息，那么服务器可以得到m个第一故障信息。If the information output by each operation and maintenance model is fault information, the server can obtain m first fault information.

步骤203、服务器将n个第一故障信息的部分或全部划分为k组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障，1≤k≤n。Step 203: The server divides part or all of the n first fault information into k group fault information, and the fault of the upper fault of the network fault indicated by the first fault information in each set of fault information is the same, and is indicated by any first fault information. The superior fault of the network fault is a fault causing the network fault indicated by any of the first fault information, 1 ≤ k ≤ n.

示例的，当某一第一故障信息为“小区231业务劣化”，那么该第一故障信息所指示的网络故障的上级故障可以是基站设备故障。该基站管理的小区包括小区231。For example, when a certain first fault information is “cell 231 service degradation”, the upper fault of the network fault indicated by the first fault information may be a base station equipment fault. The cell managed by the base station includes a cell 231.

现以步骤2021中的8种业务的网络数据和8个运维模型为例进行说明，服务器向8个运维模型输入对应业务的网络数据，假设8个运维模型输出的信息均为故障信息，这样一来，服务器得到了8个第一故障信息。假设服务器对这8个第一故障信息的全部进行分组，比如8个第一故障信息被划分为2组故障信息，第一组故障信息包括3个第一故障信息，这3个第一故障信息所指示的网络故障的上级故障为基站设备故障，第二组故障信息包括5个第一故障信息，这5个第一故障信息所指示的网络故障的上级故障为另一传输设备故障。The network data and eight operation and maintenance models of the eight services in step 2021 are taken as an example. The server inputs the network data of the corresponding service to the eight operation and maintenance models, and assumes that the information output by the eight operation and maintenance models is fault information. In this way, the server got 8 first failure information. It is assumed that the server groups all of the eight first fault information, for example, the eight first fault information is divided into two sets of fault information, and the first group of fault information includes three first fault information, and the three first fault information. The superordinate fault of the indicated network fault is a fault of the base station equipment, and the second set of fault information includes five first fault information, and the fault of the upper fault of the network fault indicated by the five first fault information is another fault of the transport equipment.

图4示例性示出了1组故障信息和该组故障信息对应的上级故障的示意图，该组故障信息包括3个第一故障信息：“小区231业务劣化”，“以太网(Ethernet，ETH)链路连接异常”，“CPU占用率较高”，其中，“小区231业务劣化”是服务器向业务劣化异常检测模型输入对应业务的网络数据，该业务劣化异常检测模型输出的故障信息。“ETH链路连接异常”是服务器向KPI异常检测模型输入对应业务的网络数据，该KPI异常检测模型输出的故障信息。“CPU占用率较高”是服务器向资源预测模型输入对应业务的网络数据，该资源预测模型输出的故障信息。这3个第一故障信息所指示的网络故障的上级故障为基站设备故障。FIG. 4 exemplarily shows a set of fault information and a schematic diagram of a superior fault corresponding to the set of fault information, the set of fault information including three first fault information: “cell 231 service degradation”, “Ethernet (ETH) The link connection is abnormal, and the CPU usage is high. The cell 231 service degradation is the network data that the server inputs the corresponding service to the service degradation abnormality detection model, and the service degradation abnormality detection model outputs the failure information. The ETH link connection abnormality is the network data that the server inputs the corresponding service to the KPI abnormality detection model, and the KPI abnormality detection model outputs the fault information. The "high CPU usage" is the network data that the server inputs to the resource prediction model, and the resource predicts the fault information output by the model. The superior fault of the network fault indicated by the three first fault information is a base station equipment fault.

步骤204、服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应。Step 204: The server outputs k sets of fault information and k upper faults, and the k upper faults are in one-to-one correspondence with the k sets of fault information.

服务器输出k组故障信息以及k个上级故障，以便于工作人员根据k组故障信息和k个上级故障进行故障处理。进一步的，工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障，并对潜在故障进行处理。The server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to the k sets of fault information and k superior faults. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.

可选的，服务器可以显示k组故障信息以及k个上级故障。示例的，服务器显示出的1组故障信息和对应的上级故障的结果可以如图4所示。Optionally, the server can display k sets of fault information and k superior faults. For example, the result of the 1 set of fault information and the corresponding superior fault displayed by the server may be as shown in FIG. 4 .

步骤205、服务器根据k个上级故障和每个上级故障对应的第一故障信息，获取与每个上级故障相关的关联网络数据。Step 205: The server acquires associated network data related to each superior fault according to the k upper faults and the first fault information corresponding to each upper fault.

由于网络数据之间的关联性较强，比如某一基站管理3个小区，当该基站出现故障时，该基站管理的小区可能都会受到影响。因此，服务器在得到上级故障和第一故障信息时，可以进一步确定网络中的潜在故障。为了确定潜在故障，服务器可以先获取与上级故障相关的关联网络数据。Due to the strong correlation between network data, for example, a certain base station manages 3 cells, when the base station fails, the cell managed by the base station may be affected. Therefore, when the server obtains the superior fault and the first fault information, the server can further determine the potential fault in the network. In order to identify potential failures, the server may first obtain associated network data related to the superior failure.

假设，在步骤202中，服务器根据8种业务的网络数据确定出8个第一故障信息，在步骤203中，服务器将这8个第一故障信息划分为2组故障信息，第一组故障信息包括3个第一故障信息：x1、x2和x3，这3个第一故障信息所指示的网络故障的上级故障为A11；第二组故障信息包括5个第一故障信息：y1、y2、y3、y4和y5，这5个第一故障信息所指示的网络故障的上级故障为B11。那么服务器分别获取与A11相关的关联网络数据，以及与 B11相关的关联网络数据。It is assumed that, in step 202, the server determines eight first fault information according to the network data of the eight types of services. In step 203, the server divides the eight first fault information into two sets of fault information, and the first group of fault information. The first fault information includes: x1, x2, and x3. The upper fault of the network fault indicated by the three first fault information is A11; the second fault information includes five first fault information: y1, y2, and y3. , y4 and y5, the upper fault of the network fault indicated by the five first fault information is B11. Then the server obtains the associated network data related to A11 and the associated network data related to B11.

比如第一故障信息为：“小区231业务劣化”，该第一故障信息所指示的网络故障的上级故障为：基站设备故障，那么服务器获取的与该上级故障相关的关联网络数据可以为：小区232的KQI。该基站管理的小区包括小区232和小区231。For example, the first fault information is: “cell 231 service degradation”, the upper fault of the network fault indicated by the first fault information is: the base station equipment fault, and the associated network data related to the upper fault obtained by the server may be: a cell. 232 KQI. The cell managed by the base station includes a cell 232 and a cell 231.

步骤206、服务器根据关联网络数据预测与每个上级故障相关的第二故障信息，该第二故障信息与第一故障信息不同。Step 206: The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.

与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。The network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure.

现以步骤205中的上级故障A11和B11为例对本步骤进行说明，服务器获取与A11相关的关联网络数据p1，然后根据该关联网络数据p1预测与A11相关的第二故障信息；同时，服务器获取与B11相关的关联网络数据p2，然后根据该关联网络数据p2预测与B11相关的第二故障信息。The present step is described by taking the upper faults A11 and B11 in step 205 as an example. The server acquires the associated network data p1 related to A11, and then predicts the second fault information related to A11 according to the associated network data p1. Meanwhile, the server obtains The associated network data p2 associated with B11 then predicts the second failure information associated with B11 based on the associated network data p2.

比如第一故障信息为：“小区231业务劣化”，该第一故障信息所指示的网络故障的上级故障为：基站设备故障，服务器获取的与该上级故障相关的关联网络数据为：小区232的KQI，那么服务器根据该关联网络数据预测的与该上级故障相关的第二故障信息可以为：“小区232业务劣化”。该基站管理的小区包括小区232和小区231。For example, the first fault information is: "cell 231 service degradation", the upper fault of the network fault indicated by the first fault information is: the base station equipment fault, and the associated network data related to the superior fault acquired by the server is: the cell 232 KQI, then the second fault information related to the superior fault predicted by the server according to the associated network data may be: “Cell 232 service degradation”. The cell managed by the base station includes a cell 232 and a cell 231.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息。相应的，如图5所示，步骤206可以包括：Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. Correspondingly, as shown in FIG. 5, step 206 may include:

步骤2061、服务器向关联运维模型输入关联网络数据，以得到该关联运维模型输出的信息，该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型。Step 2061: The server inputs the associated network data to the associated operation and maintenance model to obtain information output by the associated operation and maintenance model, where the associated operation and maintenance model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models.

步骤2062、当关联运维模型输出的信息为故障信息时，服务器将关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Step 2062: When the information output by the associated operation and maintenance model is fault information, the server determines the information output by the associated operation and maintenance model as the second fault information related to each superior fault.

假设8种业务分别为：硬件失效预测业务、性能预测业务、资源预测业务、单域告警压缩业务、跨域告警压缩业务、根因告警分析业务、KPI异常检测业务和业务劣化异常检测业务，那么可以存在8个运维模型，这8个运维模型分别为：硬件失效预测模型、性能预测模型、资源预测模型、单域告警压缩模型、跨域告警压缩模型、根因告警分析模型、KPI异常检测模型和业务劣化异常检测模型。Assume that the eight services are: hardware failure prediction service, performance prediction service, resource prediction service, single domain alarm compression service, cross-domain alarm compression service, root cause alarm analysis service, KPI abnormality detection service, and service degradation abnormality detection service. There are 8 operation and maintenance models, which are: hardware failure prediction model, performance prediction model, resource prediction model, single domain alarm compression model, cross-domain alarm compression model, root cause alarm analysis model, KPI anomaly Detection model and business degradation anomaly detection model.

现以图4所示的上级故障和第一故障信息为例进行说明，上级故障为基站设备故障，3个第一故障信息分别为：“小区231业务劣化”，“ETH链路连接异常”，以及“CPU占用率较高”。服务器获取的与该上级故障相关的关联网络数据可以为：小区232的KQI。然后，服务器向对应的业务劣化异常检测模型输入该关联网络数据，得到该业务劣化异常检测模型输出的故障信息：“小区232业务劣化”，之后，服务器将“小区232业务劣化”确定为第二故障信息。The upper-level fault and the first fault information shown in FIG. 4 are taken as an example. The upper-level fault is a fault of the base station equipment, and the three first fault information are: “cell 231 service degradation” and “ETH link connection abnormality”. And "high CPU usage." The associated network data obtained by the server related to the superior fault may be: KQI of the cell 232. Then, the server inputs the associated network data to the corresponding service degradation anomaly detection model, and obtains the fault information output by the service degradation anomaly detection model: “cell 232 service degradation”, after which the server determines “cell 232 service degradation” as the second. accident details.

步骤207、服务器输出k个上级故障、k组故障信息和预测的所有第二故障信息。Step 207: The server outputs k upper faults, k sets of fault information, and all predicted second fault information.

服务器输出k个上级故障、k组故障信息和预测的所有第二故障信息，以便于工作人员根据k个上级故障、k组故障信息和预测的所有第二故障信息进行故障处理。The server outputs k upper-level faults, k-group fault information, and all predicted second fault information, so that the worker performs fault processing according to k upper-level faults, k-group fault information, and all predicted second fault information.

在本发明实施例中，服务器可以根据上级故障和第一故障信息，预测上级故障可能引起的其余网络故障。本发明实施例提供的这种上级扩散标注选择方式使得工作人员能够根据上级故障、第一故障信息和第二故障信息对网络中的故障和潜在故障进行处理，提高网络的稳定性，保证网络正常运行。In the embodiment of the present invention, the server may predict, according to the superior fault and the first fault information, the remaining network faults that may be caused by the superior fault. The superior diffusion labeling selection method provided by the embodiment of the invention enables the worker to process faults and potential faults in the network according to the superior fault, the first fault information and the second fault information, thereby improving the stability of the network and ensuring the normal network. run.

综上所述，本发明实施例提供的网络运维的方法，服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息，然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，之后，服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应，进而使得以便于工作人员能够及时处理网络中的故障和潜在故障，通过该方法，能够对多种业务进行综合处理。In summary, in the network operation and maintenance method provided by the embodiment of the present invention, the server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k superordinate faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the method can comprehensively process multiple services.

在第二种可实现方式中，如图6所示，本发明实施例提供的网络运维的方法可以包括：In a second implementation manner, as shown in FIG. 6, the method for network operation and maintenance provided by the embodiment of the present invention may include:

步骤601、服务器获取m种业务的网络数据，m≥2。Step 601: The server acquires network data of m types of services, where m≥2.

步骤601可以参考步骤201。Step 601 can refer to step 201.

步骤602、服务器根据m种业务的网络数据确定n个第一故障信息。Step 602: The server determines, according to network data of the m types of services, n first fault information.

每个第一故障信息用于指示对应的业务出现网络故障，1≤n≤m。Each first fault information is used to indicate that a network fault occurs in the corresponding service, 1≤n≤m.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，相应的，步骤602可以包括：服务器向m个运维模型输入对应业务的网络数据，以得到m个运维模型输出的信息，每个运维模型输出的信息为故障信息或非故障信息，m个运维模型输出的信息包括n个故障信息；服务器将该n个故障信息确定为n个第一故障信息。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. Correspondingly, the step 602 may include: the server inputs the network data of the corresponding service to the m operation and maintenance models. To obtain the information output by the m operation and maintenance models, the information output by each operation and maintenance model is fault information or non-fault information, and the information output by the m operation and maintenance models includes n fault information; the server determines the n fault information. For n first fault messages.

步骤602可以参考步骤202。Step 602 can refer to step 202.

步骤603、服务器将n个第一故障信息的部分或全部划分为k组故障信息。Step 603: The server divides part or all of the n first fault information into k sets of fault information.

每组故障信息中的第一故障信息所指示的网络故障的上级故障相同。任一第一故障信息所指示的网络故障的上级故障为引起该任一第一故障信息所指示的网络故障的故障，1≤k≤n。The superior fault of the network fault indicated by the first fault information in each set of fault information is the same. The superior fault of the network fault indicated by any of the first fault information is a fault causing the network fault indicated by any of the first fault information, 1≤k≤n.

步骤603可以参考步骤203。Step 603 can refer to step 203.

步骤604、服务器输出k组故障信息以及k个上级故障。Step 604: The server outputs k sets of fault information and k upper faults.

k个上级故障与k组故障信息一一对应。The k upper level faults correspond to the k group fault information one by one.

服务器输出k组故障信息以及k个上级故障，以便于工作人员根据k个上级故障和k组故障信息进行故障处理。进一步的，工作人员还可以根据服务器输出的上级故障和第一故障信息得到网络中的潜在故障，并对潜在故障进行处理。The server outputs k sets of fault information and k superior faults, so that the staff can perform fault processing according to k upper faults and k sets of fault information. Further, the worker can also obtain potential faults in the network according to the superior fault and the first fault information output by the server, and process the potential faults.

步骤605、服务器接收第一标注指令，该第一标注指令用于指示k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。Step 605: The server receives a first labeling instruction, where the first labeling instruction is used to indicate that the first fault information in the k group fault information is correctly predicted and the upper fault in the k upper fault faults are correctly predicted.

示例的，服务器显示出k组故障信息以及k个上级故障之后，工作人员可以根据网络的实际故障情况对服务器显示出来的第一故障信息和上级故障进行标注，标注出服务器预测正确的第一故障信息和上级故障。比如，服务器可以发出提示信息，用于提示工作人员采用第一标注符号来标注服务器预测正确的第一故障信息和预测正确的上级故障，并采用第二标注符号来标注服务器预测错误的第一故障信息和预测错误的上级故障，之后，工作人员采用第一标注符号来标注服务器预测正确的第一故障信息和预测正确的上级故障，采用第二标注符号来标注服务器预测错误的第一故障信息和预测错误的上级故障。其中，第一标注符号和第二标注符号不同。示例的，第一标注符号可以为对号“√”，第二标注符号可以为错号“×”。For example, after the server displays the k-group fault information and the k-level faults, the staff can mark the first fault information and the superior fault displayed by the server according to the actual fault condition of the network, and mark the first fault that the server predicts correctly. Information and superior failures. For example, the server may issue a prompt message for prompting the staff to use the first annotation symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and use the second annotation symbol to mark the first fault of the server prediction error. Information and predicting the fault of the superior fault, after which the staff uses the first call symbol to mark the server to predict the correct first fault information and predict the correct superior fault, and uses the second call symbol to mark the first fault information of the server prediction error and Predict the wrong superior failure. Wherein the first label symbol and the second label symbol are different. For example, the first annotation symbol may be a checkmark "√", and the second annotation symbol may be a wrong identifier "×".

以图4所示的上级故障和该组故障信息为例，假设工作人员确定出服务器关于基站设备故障，ETH链路连接异常，以及CPU占用率较高的预测是正确的，而关于小区231业务劣化的预测是错误的，那么工作人员可以采用“√”对“基站设备故障”，“ETH链路连接异常”，以及“CPU占用率较高”这3个预测结果进行标注，并采用“×”对“小区231业务劣化”这一预测结果进行标注，标注结果如图7所示。Taking the upper-level fault and the fault information shown in FIG. 4 as an example, it is assumed that the staff determines that the server is faulty about the base station equipment, the ETH link connection is abnormal, and the prediction of the high CPU occupancy rate is correct, and the service about the cell 231 is correct. The prediction of degradation is wrong, then the staff can use "√" to mark the three prediction results of "base station equipment failure", "ETH link connection abnormality", and "high CPU occupancy rate", and use "x" The prediction result of "cell 231 business deterioration" is marked, and the result is shown in FIG. 7.

假设工作人员确定出服务器关于ETH链路连接异常的预测是正确的，关于其他3个预测都是错误的，那么工作人员可以采用“√”对“ETH链路连接异常”这一预测结果进行标注，并采用“×”对其他3个预测结果进行标注，标注结果如图8所示。Suppose the staff determines that the server's prediction about the ETH link connection anomaly is correct. If the other three predictions are wrong, the staff can use "√" to mark the prediction result of "ETH link connection anomaly". And use "X" to mark the other three prediction results, the result of which is shown in Figure 8.

步骤606、服务器基于第一标注指令获取第一样本集，该第一样本集包括第一标注指令所指示的信息。Step 606: The server acquires a first sample set according to the first annotation instruction, where the first sample set includes information indicated by the first annotation instruction.

服务器基于步骤605中的第一标注指令获取第一样本集，该第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。The server acquires a first sample set based on the first annotation instruction in step 605, the first sample set including the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults.

示例的，k等于2，第一组故障信息包括3个第一故障信息：x1、x2和x3，这3个第一故障信息所指示的网络故障的上级故障为A11；第二组故障信息包括5个第一故障信息：y1、y2、y3、y4和y5，这5个第一故障信息所指示的网络故障的上级故障为B11。假设第一标注指令用于指示第一组故障信息中的x1和x2，第二组故障信息中的y4和y5，以及上级故障A11的预测是正确的，那么第一样本集包括的信息为：x1、x2、y4、y5和A11。For example, k is equal to 2, and the first group of fault information includes three first fault information: x1, x2, and x3, and the upper fault of the network fault indicated by the three first fault information is A11; the second group of fault information includes The five first fault information: y1, y2, y3, y4, and y5, and the upper fault of the network fault indicated by the five first fault information is B11. Assuming that the first annotation instruction is used to indicate x1 and x2 in the first set of fault information, y4 and y5 in the second set of fault information, and the prediction of the superior fault A11 is correct, then the information included in the first sample set is :x1, x2, y4, y5, and A11.

步骤607、服务器根据第一样本集获取与第一样本集中每个上级故障相关的关联网络数据。Step 607: The server acquires, according to the first sample set, associated network data related to each superior fault in the first sample set.

假设步骤605中的第一样本集包括的信息为：x1、x2、y4、y5和A11，服务器可以根据该第一样本集获取与上级故障A11相关的关联网络数据，比如，A11为基站设备故障，那么与A11相关的关联网络数据可以是：小区232的KQI。该基站管理的小区包括小区232。It is assumed that the information included in the first sample set in step 605 is: x1, x2, y4, y5, and A11, and the server may acquire associated network data related to the superior fault A11 according to the first sample set, for example, A11 is a base station. If the device fails, then the associated network data associated with A11 may be: KQI of cell 232. The cell managed by the base station includes a cell 232.

步骤608、服务器根据关联网络数据预测与每个上级故障相关的第二故障信息，该第二故障信息与第一故障信息不同。Step 608: The server predicts second fault information related to each superior fault according to the associated network data, where the second fault information is different from the first fault information.

与上级故障相关的第二故障信息所指示的网络故障指的是该上级故障能够引起的网络故障。在本发明实施例中，由于服务器是根据k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障得到第二故障信息，所以第二故障信息的准确度更高。The network failure indicated by the second failure information related to the superior failure refers to a network failure that can be caused by the superior failure. In the embodiment of the present invention, since the server obtains the second fault information according to the first fault information that is correctly predicted in the k group fault information and the correct fault fault in the k upper faults, the accuracy of the second fault information is higher. .

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息，相应的，步骤608可以包括：服务器向关联运维模型输入关联网络数据，以得到该关联运维模型输出的信息，该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型；当该关联运维模型输出的信息为故障信息时，服务器将该关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, correspondingly, step 608 may include: the server inputs the associated network data to the associated operation and maintenance model to obtain the information output by the associated operation and maintenance model, wherein the associated operation and maintenance model is associated with the associated network data in the m operation and maintenance models. Corresponding operation and maintenance model; when the information output by the associated operation and maintenance model is fault information, the server determines the information output by the associated operation and maintenance model as the second failure information related to each superior fault.

步骤608可以参考步骤206。Step 608 can refer to step 206.

步骤609、服务器输出第一样本集和预测的所有第二故障信息。Step 609: The server outputs the first sample set and all the predicted second fault information.

第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障。服务器输出第一样本集和预测的所有第二故障信息，以便于工作人员根据服务器预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息进行故障处理。The first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults. The server outputs the first sample set and all the predicted second fault information, so that the worker performs fault processing according to the server predicting the correct first fault information, predicting the correct superior fault, and predicting all the second fault information.

在本发明实施例中，通过步骤605至步骤609，服务器可以根据工作人员的标注指令，预测出正确的上级故障可能引起的网络故障，使得工作人员能够根据预测正确的第一故障信息、预测正确的上级故障和预测的所有第二故障信息对网络中的故障和潜在故障进行及时处理。且由于第二故障信息的准确度较高，因此还提高了故障的处理效率。In the embodiment of the present invention, through steps 605 to 609, the server may predict a network fault that may be caused by a correct superior fault according to the staffing instruction, so that the staff can correctly predict the first fault information according to the prediction. All of the superior faults and predicted second fault information are processed in time for faults and potential faults in the network. Moreover, since the accuracy of the second fault information is high, the processing efficiency of the fault is also improved.

步骤610、服务器将预测的所有第二故障信息确定为待标注样本集。Step 610: The server determines all the predicted second fault information as a sample set to be labeled.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息。在本发明实施例中，进一步的，为了对评价指标不满足业务要求的运维模型进行更新，进一步提高故障预测的准确性，服务器可以将步骤608中通过运维模型预测的所有第二故障信息确定为待标注样本集，以便于工作人员对该待标注样本集进行标注，得到预测正确的第二故障信息。Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. In the embodiment of the present invention, in order to further update the operation and maintenance model whose evaluation index does not meet the service requirement, and further improve the accuracy of the fault prediction, the server may use the second fault information predicted by the operation and maintenance model in step 608. The sample set to be labeled is determined, so that the worker labels the sample set to be labeled, and obtains the second fault information that is predicted correctly.

步骤611、服务器接收第二标注指令，该第二标注指令用于指示待标注样本集内预测正确的第二故障信息。Step 611: The server receives a second labeling instruction, where the second labeling instruction is used to indicate that the second fault information is correctly predicted in the sample set to be labeled.

示例的，服务器显示出预测的所有第二故障信息之后，工作人员可以根据网络的实际故障情况对服务器显示出来的第二故障信息进行标注，标注出服务器预测正确的第二故障信息。标注方式可以参考步骤605中的图7和图8。For example, after the server displays all the predicted second fault information, the staff can mark the second fault information displayed by the server according to the actual fault condition of the network, and mark the second fault information that the server predicts correctly. Refer to Figure 7 and Figure 8 in step 605 for the labeling method.

步骤612、服务器基于第二标注指令获取第二样本集，该第二样本集包括第二标注指令所指示的信息。Step 612: The server acquires a second sample set according to the second annotation instruction, where the second sample set includes information indicated by the second annotation instruction.

服务器基于步骤611中的第二标注指令获取第二样本集，该第二样本集包括待标注样本集内预测正确的第二故障信息。The server acquires a second sample set based on the second annotation instruction in step 611, the second sample set includes second fault information that is correctly predicted within the sample set to be labeled.

示例的，步骤608中预测的所有第二故障信息包括z1、z2、z3和z4。假设第二标注指令用于指示z1和z2的预测是正确的，那么第二样本集包括的信息为：z1和z2。For example, all of the second fault information predicted in step 608 includes z1, z2, z3, and z4. Assuming that the second annotation instruction is used to indicate that the predictions of z1 and z2 are correct, then the information included in the second sample set is: z1 and z2.

步骤613、服务器将第一样本集和第二样本集确定为目标样本集。Step 613: The server determines the first sample set and the second sample set as the target sample set.

第一样本集包括k组故障信息内预测正确的第一故障信息和k个上级故障内预测正确的上级故障，第二样本集包括待标注样本集内预测正确的第二故障信息，服务器将第一样本集和第二样本集确定为目标样本集，该目标样本集用于对评价指标不满足业务要求的运维模型进行更新。The first sample set includes the first fault information that is correctly predicted within the k sets of fault information and the upper fault that is correctly predicted by the k upper faults, and the second sample set includes the second fault information that is correctly predicted within the sample set to be labeled, and the server will The first sample set and the second sample set are determined as a target sample set, and the target sample set is used to update an operation and maintenance model in which the evaluation index does not satisfy the business requirement.

步骤614、服务器根据目标样本集确定第一运维模型的评价指标，该第一运维模型为m个运维模型中的任一运维模型。Step 614: The server determines, according to the target sample set, an evaluation index of the first operation and maintenance model, where the first operation and maintenance model is any one of the m operation and maintenance models.

服务器根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息确定第一运维模型的评价指标。The server determines an evaluation index of the first operation and maintenance model according to the predicted first failure information, the predicted correct superior failure, and the predicted correct second failure information.

可选的，第一运维模型的评价指标可以为第一运维模型的精度。模型的精度为模型预测正确的结果个数与预测的总结果个数的比值，模型的精度越高，该模型的预测效果就越好。Optionally, the evaluation index of the first operation and maintenance model may be the accuracy of the first operation and maintenance model. The accuracy of the model is the ratio of the number of correct results predicted by the model to the total number of predicted results. The higher the accuracy of the model, the better the prediction effect of the model.

步骤615、当第一运维模型的评价指标不属于指定评价指标范围时，服务器采用目标样本集对第一运维模型进行更新。Step 615: When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the server updates the first operation and maintenance model by using the target sample set.

当第一运维模型的评价指标为第一运维模型的精度时，对应的指定评价指标范围可以是[f，1]，示例的，f可以等于0.4，服务器可以在第一运维模型的评价指标小于0.4时，采用目标样本集对该第一运维模型进行更新。比如，可以采用机器学习算法中的监督学习算法来训练第一运维模型，模型训练过程可以参考相关技术，在此不再赘述。When the evaluation index of the first operation and maintenance model is the accuracy of the first operation and maintenance model, the corresponding specified evaluation index range may be [f, 1], for example, f may be equal to 0.4, and the server may be in the first operation and maintenance model. When the evaluation index is less than 0.4, the first operation and maintenance model is updated by using the target sample set. For example, the supervised learning algorithm in the machine learning algorithm can be used to train the first operation and maintenance model. The model training process can refer to related technologies, and details are not described herein.

可选的，第一运维模型的评价指标也可以为第一运维模型的查准率,模型的查准率越高，该模型的预测效果就越好。第一运维模型的评价指标也可以为错误发现率,模型的错误发现率越小，该模型的预测效果就越好。第一运维模型的评价指标也可以为错误遗漏率等，本发明实施例对第一运维模型的评价指标不做限定，指定评价指标范围可以根据确定的第一运维模型的评价指标来确定。Optionally, the evaluation index of the first operation and maintenance model may also be the precision of the first operation and maintenance model, and the higher the precision of the model, the better the prediction effect of the model. The evaluation index of the first operation dimension model can also be the false discovery rate, and the smaller the error detection rate of the model, the better the prediction effect of the model. The evaluation index of the first operation and maintenance model may also be an error omission rate, etc. The embodiment of the present invention does not limit the evaluation index of the first operation and maintenance model, and the specified evaluation index range may be based on the determined evaluation index of the first operation and maintenance model. determine.

可选的，m个运维模型中每个运维模型由一对应用单元和模型训练器来管理，应用单元用于根据目标样本集确定第一运维模型的评价指标，并在第一运维模型的评价指标不属于指定评价指标范围时，向模型训练器发送模型更新请求，模型训练器用于根据应用单元发送的模型更新请求采用目标样本集对第一运维模型进行更新。Optionally, each operation and maintenance model in the m operation and maintenance models is managed by a pair of application units and a model trainer, and the application unit is configured to determine an evaluation index of the first operation and maintenance model according to the target sample set, and in the first operation When the evaluation index of the dimensional model does not belong to the specified evaluation index range, the model training device sends a model update request, and the model training device is configured to update the first operation and maintenance model by using the target sample set according to the model update request sent by the application unit.

在本发明实施例中，通过步骤610至步骤615，服务器可以根据工作人员的标注指令，得到预测正确的第二故障信息，进而根据预测正确的第一故障信息、预测正确的上级故障和预测正确的第二故障信息对评价指标不满足业务要求的运维模型进行更新，提高故障预测的准确性，进而提高故障的处理效率。In the embodiment of the present invention, through the steps 610 to 615, the server may obtain the second fault information that is predicted correctly according to the labeling instruction of the staff, and then predict the correct first fault information and predict the correct superior fault and the prediction according to the prediction. The second fault information updates the operation and maintenance model that the evaluation index does not meet the business requirements, improves the accuracy of the fault prediction, and further improves the fault processing efficiency.

本发明实施例借助工作人员的运维经验对网络中的故障和潜在故障进行有效预测，在本发明实施例中，服务器能够对运维模型进行及时更新，实现了及时预测和准确预测的目的，降低了人力成本，提高了故障的处理效率。通过本发明实施例提供的主动预防被动处理的网络运维的方法，工作人员能够快速获知网络的运行状态，及时对网络中的故障和潜在故障进行处理，提高了网络的稳定性，保证了网络正常运行。The embodiment of the present invention effectively predicts faults and potential faults in the network by using the operation and maintenance experience of the staff. In the embodiment of the present invention, the server can update the operation and maintenance model in time, and achieve the purpose of timely prediction and accurate prediction. Reduce labor costs and improve the processing efficiency of faults. Through the method for actively preventing passive processing of network operation and maintenance provided by the embodiment of the present invention, the staff can quickly know the running state of the network, timely deal with faults and potential faults in the network, improve the stability of the network, and ensure the network. normal operation.

综上所述，本发明实施例提供的网络运维的方法，服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息，然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，之后，服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应，进而使得工作人员能够及时处理网络中的故障和潜在故障，通过该方法，能够对多种业务进行综合处理，还能够对评价指标不满足业务要求的运维模型进行自动更新，提高了故障预测的准确性，提高了故障的处理效率。In summary, in the network operation and maintenance method provided by the embodiment of the present invention, the server can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and through this method, comprehensive processing of multiple services can be performed, and The operation and maintenance model whose evaluation index does not meet the business requirements is automatically updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.

需要说明的是，本发明实施例提供的网络运维的方法的步骤的先后顺序可以进行适当调整，步骤也可以根据情况进行相应增减，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化的方法，都应涵盖在本申请的保护范围之内，因此不再赘述。It should be noted that the sequence of the steps of the network operation and maintenance method provided by the embodiment of the present invention may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation, and any technology familiar to those skilled in the art may be disclosed in the present application. The methods that can be easily conceived within the scope of the present invention are covered by the scope of the present application and therefore will not be described again.

本发明实施例提供了一种网络运维的装置，该网络运维的装置可以用于图1所示的服务器，如图9所示，该网络运维的装置900，包括：The embodiment of the present invention provides a device for network operation and maintenance. The network operation and maintenance device can be used for the server shown in FIG. 1. As shown in FIG. 9, the network operation and maintenance device 900 includes:

第一获取模块910，用于执行上述实施例中的步骤201或步骤601。The first obtaining module 910 is configured to perform step 201 or step 601 in the foregoing embodiment.

第一确定模块920，用于执行上述实施例中的步骤202或步骤602。The first determining module 920 is configured to perform step 202 or step 602 in the foregoing embodiment.

划分模块930，用于执行上述实施例中的步骤203或步骤603。The dividing module 930 is configured to perform step 203 or step 603 in the foregoing embodiment.

第一输出模块940，用于执行上述实施例中的步骤204或步骤604。The first output module 940 is configured to perform step 204 or step 604 in the foregoing embodiment.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，第一确定模块920，用于执行上述实施例中的步骤2021或步骤2022。Optionally, the network data of the m types of services is in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other. The first determining module 920 is configured to perform step 2021 or step 2022 in the foregoing embodiment.

进一步的，如图10所示，该网络运维的装置900还可以包括：Further, as shown in FIG. 10, the network operation and maintenance apparatus 900 may further include:

第二获取模块950，用于执行上述实施例中的步骤205。The second obtaining module 950 is configured to perform step 205 in the foregoing embodiment.

第一预测模块960，用于执行上述实施例中的步骤206。The first prediction module 960 is configured to perform step 206 in the foregoing embodiment.

第二输出模块970，用于执行上述实施例中的步骤207。The second output module 970 is configured to perform step 207 in the foregoing embodiment.

图10中其他标记含义可以参考图9。The meaning of other marks in FIG. 10 can be referred to FIG.

进一步的，如图11所示，该网络运维的装置900还可以包括：Further, as shown in FIG. 11, the device 900 of the network operation and maintenance may further include:

第一接收模块980，用于执行上述实施例中的步骤605。The first receiving module 980 is configured to perform step 605 in the foregoing embodiment.

第三获取模块990，用于执行上述实施例中的步骤606。The third obtaining module 990 is configured to perform step 606 in the foregoing embodiment.

第四获取模块991，用于执行上述实施例中的步骤607。The fourth obtaining module 991 is configured to perform step 607 in the foregoing embodiment.

第二预测模块992，用于执行上述实施例中的步骤608。The second prediction module 992 is configured to perform step 608 in the foregoing embodiment.

第三输出模块993，用于执行上述实施例中的步骤609。The third output module 993 is configured to perform step 609 in the foregoing embodiment.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息。进一步的，如图11所示，该网络运维的装置900还可以包括：Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- accident details. Further, as shown in FIG. 11, the device 900 of the network operation and maintenance may further include:

第二确定模块994，用于执行上述实施例中的步骤610。The second determining module 994 is configured to perform step 610 in the foregoing embodiment.

第二接收模块995，用于执行上述实施例中的步骤611。The second receiving module 995 is configured to perform step 611 in the foregoing embodiment.

第五获取模块996，用于执行上述实施例中的步骤612。The fifth obtaining module 996 is configured to perform step 612 in the foregoing embodiment.

第三确定模块997，用于执行上述实施例中的步骤613。The third determining module 997 is configured to perform step 613 in the foregoing embodiment.

第四确定模块998，用于执行上述实施例中的步骤614。The fourth determining module 998 is configured to perform step 614 in the foregoing embodiment.

更新模块999，用于执行上述实施例中的步骤615。The update module 999 is configured to perform step 615 in the above embodiment.

图11中其他标记含义可以参考图9。The meaning of other marks in Fig. 11 can be referred to Fig. 9.

可选的，m种业务的网络数据与m个运维模型一一对应，m个运维模型互不相同，每个运维模型用于对对应业务的网络数据进行预测，输出故障信息或非故障信息，图10中的第一预测模块960或图11中的第二预测模块992，用于执行上述实施例中的步骤2061和步骤2062，包括：Optionally, the network data of the m types of services corresponds to the m operation and maintenance models, and the m operation and maintenance models are different from each other. Each operation and maintenance model is used to predict the network data of the corresponding service, and output fault information or non- The fault information, the first prediction module 960 in FIG. 10 or the second prediction module 992 in FIG. 11 is configured to perform step 2061 and step 2062 in the foregoing embodiment, including:

向关联运维模型输入关联网络数据，以得到关联运维模型输出的信息，该关联运维模型为m个运维模型中与关联网络数据相对应的运维模型；Inputting associated network data to the associated operation and maintenance model to obtain information outputted by the associated operation and maintenance model, wherein the associated operation and maintenance model is an operation and maintenance model corresponding to the associated network data in the m operation and maintenance models;

当该关联运维模型输出的信息为故障信息时，将该关联运维模型输出的信息确定为与每个上级故障相关的第二故障信息。When the information output by the associated operation and maintenance model is fault information, the information output by the associated operation and maintenance model is determined as the second fault information related to each superior fault.

综上所述，本发明实施例提供的网络运维的装置，服务器能够根据m(m≥2)种业务的网络数据确定n(1≤n≤m)个第一故障信息，然后将n个第一故障信息的部分或全部划分为k(1≤k≤n)组故障信息，每组故障信息中的第一故障信息所指示的网络故障的上级故障相同，之后，服务器输出k组故障信息以及k个上级故障，k个上级故障与k组故障信息一一对应，进而使得工作人员能够及时处理网络中的故障和潜在故障，通过该装置，能够对多种业务进行综合处理，还能够对评价指标不满足业务要求的运维模型进行更新，提高了故障预测的准确性，提高了故障的处理效率。In summary, the network operation and maintenance device provided by the embodiment of the present invention can determine n (1 ≤ n ≤ m) first fault information according to network data of m (m ≥ 2) services, and then n Part or all of the first fault information is divided into k (1 ≤ k ≤ n) group fault information, and the fault of the network fault indicated by the first fault information in each set of fault information is the same, and then the server outputs the k group fault information. And k upper-level faults, k upper-level faults and k-group fault information are in one-to-one correspondence, so that the staff can timely deal with faults and potential faults in the network, and the device can comprehensively process multiple services, and can also The operation and maintenance model whose evaluation index does not meet the business requirements is updated, which improves the accuracy of fault prediction and improves the processing efficiency of faults.

图12是本发明实施例提供的一种网络运维的装置的结构示意图，该装置可以用于图1所示的服务器。如图12所示，该装置包括处理器1201(如CPU)、存储器1202、网络接口1203和总线1204。其中，总线1204用于连接处理器1201、存储器1202和网络接口1203。存储器1202可能包含随机存取存储器(Random Access Memory，RAM)，也可能包含非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。通过网络接口1203(可以是有线或者无线)实现服务器与通信设备之间的通信连接。存储器1202中存储有程序12021，该程序12021用于实现各种应用功能，处理器1201用于执行存储器1202中存储的程序12021来实现图2或图6所示的网络运维的方法。FIG. 12 is a schematic structural diagram of an apparatus for network operation and maintenance provided by an embodiment of the present invention, and the apparatus may be used in the server shown in FIG. 1. As shown in FIG. 12, the apparatus includes a processor 1201 (such as a CPU), a memory 1202, a network interface 1203, and a bus 1204. The bus 1204 is used to connect the processor 1201, the memory 1202, and the network interface 1203. The memory 1202 may include a random access memory (RAM), and may also include a non-volatile memory, such as at least one disk storage. The communication connection between the server and the communication device is implemented through a network interface 1203, which may be wired or wireless. The program 12021 is stored in the memory 1202. The program 12021 is used to implement various application functions. The processor 1201 is configured to execute the program 12021 stored in the memory 1202 to implement the network operation and maintenance method shown in FIG. 2 or FIG. 6.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the foregoing apparatus and module can be referred to the corresponding process in the foregoing method embodiment, and details are not described herein again.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现，所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如，软盘、硬盘、磁带)、光介质，或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a readable storage medium of a computer or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data The center transmits to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium, or a semiconductor medium (eg, a solid state hard disk) or the like.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

以上所述仅为本申请的可选实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above description is only an optional embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.

Claims

A method for network operation and maintenance, characterized in that the method comprises:

Obtain network data of m kinds of services, m≥2;

Determining, according to the network data of the m types of services, n first fault information, where each of the first fault information is used to indicate that a corresponding service has a network fault, 1≤n≤m;

And dividing part or all of the n first fault information into k sets of fault information, where the first fault of the network fault indicated by the first fault information in each set of fault information is the same, and the network indicated by any first fault information The fault of the fault is a fault that causes the network fault indicated by any of the first fault information, 1≤k≤n;

The k sets of fault information and k upper faults are output, and the k upper faults are in one-to-one correspondence with the k sets of fault information.

The method according to claim 1, wherein after the outputting the k sets of fault information and the k upper faults, the method further comprises:

Obtaining associated network data related to each of the upper faults according to the k upper faults and the first fault information corresponding to each of the upper faults;

And predicting, according to the associated network data, second fault information related to each of the upper faults, where the second fault information is different from the first fault information;

The k upper level faults, the k sets of fault information, and all of the predicted second fault information are output.

Receiving a first labeling instruction, where the first labeling instruction is used to indicate that the first fault information that is correctly predicted in the k group fault information and the upper fault that is correctly predicted in the k upper faults;

Acquiring a first sample set based on the first annotation instruction, where the first sample set includes information indicated by the first annotation instruction;

Acquiring associated network data related to each superior fault in the first sample set according to the first sample set;

The first sample set and all of the predicted second fault information are output.

The method according to claim 3, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other, and each operation and maintenance model is used for Forecasting network data corresponding to the service, and outputting fault information or non-fault information;

After the outputting the first sample set and the predicted second fault information, the method further includes:

Determining all predicted second fault information as a sample set to be labeled;

Receiving a second labeling instruction, where the second labeling instruction is used to indicate that the second fault information is correctly predicted in the sample set to be labeled;

Acquiring a second sample set based on the second annotation instruction, the second sample set including information indicated by the second annotation instruction;

Determining the first sample set and the second sample set as a target sample set;

Determining, according to the target sample set, an evaluation index of the first operation and maintenance model, where the first operation and maintenance model is any one of the m operation and maintenance models;

When the evaluation index of the first operation and maintenance model does not belong to the specified evaluation index range, the first operation and maintenance model is updated by using the target sample set.

The method according to claim 2 or 3, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other, and each operation and maintenance model is used. Predicting network data of the corresponding service, and outputting fault information or non-fault information;

The predicting the second fault information related to each of the upper faults according to the associated network data includes:

Inputting the associated network data to the associated operation and maintenance model to obtain information output by the associated operation and maintenance model, where the associated operation and maintenance model is the operation and maintenance corresponding to the associated network data in the m operation and maintenance models model;

And determining, when the information output by the associated operation and maintenance model is fault information, the information output by the associated operation and maintenance model as the second fault information related to each of the upper faults.

The method according to claim 1, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other;

Determining, according to the network data of the m types of services, the n first fault information, including:

And inputting network data of the corresponding service to the m operation and maintenance models to obtain information output by the m operation and maintenance models, where the information output by each operation and maintenance model is fault information or non-fault information, the m The information output by the operation and maintenance model includes n fault information;

The n pieces of fault information are determined as the n first fault information.

A device for network operation and maintenance, characterized in that the device comprises:

a first acquiring module, configured to acquire network data of m types of services, where m≥2;

a first determining module, configured to determine n first fault information according to the network data of the m types of services, where each of the first fault information is used to indicate that a corresponding service has a network fault, 1≤n≤m;

a dividing module, configured to divide part or all of the n first fault information into k group fault information, where the first fault of the network fault indicated by the first fault information in each set of fault information is the same, any first fault The superior fault of the network fault indicated by the information is a fault causing the network fault indicated by any of the first fault information, 1≤k≤n;

The first output module is configured to output the k group fault information and the k upper faults, where the k upper faults are in one-to-one correspondence with the k group fault information.

The device according to claim 7, wherein the device further comprises:

a second acquiring module, configured to acquire, according to the k upper faults and the first fault information corresponding to each upper fault, associated network data related to each of the upper faults;

a first prediction module, configured to predict, according to the associated network data, second fault information related to each of the upper faults, where the second fault information is different from the first fault information;

And a second output module, configured to output the k upper faults, the k sets of fault information, and all predicted second fault information.

The device according to claim 7, wherein the device further comprises:

a first receiving module, configured to receive a first labeling instruction, where the first labeling instruction is used to indicate that the first fault information that is correctly predicted in the k group fault information and the upper fault that is correctly predicted in the k upper faults;

a third acquiring module, configured to acquire a first sample set based on the first annotation instruction, where the first sample set includes information indicated by the first annotation instruction;

a fourth acquiring module, configured to acquire, according to the first sample set, associated network data related to each upper-level fault in the first sample set;

a second prediction module, configured to predict, according to the associated network data, second fault information related to each of the upper faults, where the second fault information is different from the first fault information;

And a third output module, configured to output the first sample set and all second fault information predicted.

The device according to claim 9, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other, and each operation and maintenance model is used for Forecasting the network data corresponding to the service, and outputting fault information or non-fault information,

The device also includes:

a second determining module, configured to determine all the predicted second fault information as a sample set to be labeled;

a second receiving module, configured to receive a second labeling instruction, where the second labeling instruction is used to indicate that the second fault information is correctly predicted in the sample set to be labeled;

a fifth acquiring module, configured to acquire a second sample set based on the second annotation instruction, where the second sample set includes information indicated by the second annotation instruction;

a third determining module, configured to determine the first sample set and the second sample set as a target sample set;

a fourth determining module, configured to determine, according to the target sample set, an evaluation index of the first operation and maintenance model, where the first operation and maintenance model is any one of the m operation and maintenance models;

And an update module, configured to update the first operation and maintenance model by using the target sample set when an evaluation index of the first operation and maintenance model does not belong to a specified evaluation index range.

The device according to claim 8 or 9, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other, and each operation and maintenance model is used. Predicting the network data of the corresponding service, and outputting fault information or non-fault information,

The first prediction module or the second prediction module is configured to:

The device according to claim 7, wherein the network data of the m types of services are in one-to-one correspondence with the m operation and maintenance models, and the m operation and maintenance models are different from each other.

The first determining module is configured to:

A computer readable storage medium, wherein the computer readable storage medium stores instructions for causing a computer to perform any of claims 1 to 6 when the computer readable storage medium is run on a computer The method of network operation and maintenance.

A device for network operation and maintenance, characterized in that the device comprises: a processor, a memory, a network interface and a bus,

The bus is configured to connect the processor, the memory, and the network interface, and the processor is configured to execute a program stored in the memory to implement the network operation and maintenance method according to any one of claims 1 to 6. .