CN116257404A

CN116257404A - Log analysis method and computing device

Info

Publication number: CN116257404A
Application number: CN202310080852.5A
Authority: CN
Inventors: 鲍国顺
Original assignee: XFusion Digital Technologies Co Ltd
Current assignee: XFusion Digital Technologies Co Ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2023-06-13

Abstract

The application provides a log analysis method and a computing device, relates to the technical field of computing devices, and can effectively improve analysis efficiency of massive log files in a data center. The method comprises the following steps: determining a log mode set matched with the classification attribute of the log packet to be analyzed from a log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files; screening one or more target log files from the log packet to be analyzed; one or more target log files are parsed based on the set of log patterns. The method and the device can be used in the maintenance process of the data center.

Description

Log analysis method and computing device

Technical Field

The present disclosure relates to the field of computing devices, and in particular, to a log parsing method and a computing device.

Background

The log is generated in the running process of the computing equipment, records files of the running condition and the fault condition of the computing equipment, and is commonly used for fault positioning of the computing equipment in the data center. With the development of computer technology, the business of a data center is increasingly complex, and a huge amount of log files are generated.

At present, an analysis method for sequentially reading each log file in a log packet is generally adopted. The analysis method is low in analysis efficiency and cannot locate faults in time in the face of massive log files of the data center.

Disclosure of Invention

The log analysis method and the computing device can effectively improve analysis efficiency of massive log files in the data center.

In a first aspect, the present application provides a log parsing method, including: screening a log mode set matched with the classification attribute of the log packet to be analyzed from a log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files; screening one or more target log files from the log packet to be analyzed; one or more target log files are parsed based on the set of log patterns.

According to the log analysis method, firstly, according to the classification attribute of the log package to be analyzed, a log mode set is obtained by screening in a log mode library. Further, one or more target log files are screened out according to the log package to be analyzed, and then the target log files are analyzed based on the log mode set. According to the scheme, the log mode range and the log file range of the calculation participating in the log analysis can be effectively reduced, and a large amount of invalid calculation is saved. Compared with the traditional log analysis scheme, the method provided by the application can greatly improve the log analysis efficiency, can be suitable for a scene with massive logs such as a data center, does not need higher computational cost, and has a wide application prospect.

In one possible implementation, the screening one or more target log files from the log package to be parsed includes: screening one or more target log files from the log package to be analyzed according to the applicable labels of one or more log modes in the log mode set; the applicable tag is used to indicate directory information and type information of a log file to which the log mode is applicable. It can be understood that not all log files in the log package to be analyzed need to participate in calculation, and the method and the device can further screen one or more target log files from the log package to be analyzed, so that log analysis efficiency is further improved.

In another possible implementation, the applicable labels for each log schema include a directory label and a file type label; the directory label is used for indicating the directory of the log file applicable to the log mode, and the file type label is used for indicating the type of the log file applicable to the log mode; screening one or more target log files from the log package to be parsed based on the applicable labels of one or more log patterns in the log pattern set, including: screening out the log files of one or more catalogues consistent with the catalogues indicated by the catalogue labels from the log package to be analyzed; one or more target log files consistent with the type indicated by the file type label are screened from the log files of one or more catalogues.

In yet another possible implementation, parsing one or more target log files based on the set of log patterns includes: for each of the one or more target log files, determining, in the set of log patterns, one or more log patterns that match the file type of the target log file; the target log file is parsed using one or more log schemas. It can be seen that the above steps mainly group log patterns according to the target log files, and establish a corresponding relationship between each target log file and one or more log patterns, so that the target log files can be matched by adopting all applicable log patterns through one-time reading, thereby avoiding the waste of computing resources.

In yet another possible implementation, parsing the target log using one or more log patterns includes: determining a splitting mode of the target log file according to the type of the target log file; splitting the target log file into one or more log fragments according to a splitting mode; one or more log segments of the target log file are parsed using one or more log schemas. It can be understood that the log file is split, so that the log mode can be conveniently matched, and the log analysis efficiency is further improved.

In another possible implementation manner, the screening the log mode set matched with the classification attribute of the log packet to be parsed from the log mode library includes: extracting key information from a log packet to be analyzed; the key information is used for indicating the classification attribute of the log packet to be analyzed; and determining a log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library according to the key information.

In yet another possible implementation manner, before screening the log mode set matched with the classification attribute of the log packet to be parsed from the log mode library, the method further includes: determining one or more log packages to be parsed from a log library based on fault problems of the computing device; the classification attribute of the fault problem is consistent with the classification attribute of each log packet to be analyzed.

In yet another possible implementation, the classification attribute of the log package is determined according to classification attributes of one or more log patterns, the one or more log patterns are screened from a log pattern library based on current classification attributes of the log package, and the one or more log patterns support parsing the log package.

In yet another possible implementation, the classification attribute and the applicable label of the log schema are determined according to classification attribute, directory information, and file type information of one or more log files; the one or more log files are one or more log files in a plurality of log packages that are screened from the log library based on the current classification attribute of the log schema, and the log schema supports parsing the one or more log files.

In yet another possible implementation manner, the classification attribute of the log packet to be parsed includes at least one of the following: the information of the equipment to which the log packet to be analyzed belongs, the information of the part to which the log packet to be analyzed belongs, the model information of the part and the version information of the part.

In a second aspect, the present application provides a log parsing apparatus, the apparatus comprising: a screening module and an analyzing module; the screening module is used for screening a log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files; the screening module is also used for screening one or more target log files from the log packet to be analyzed; the analysis module is used for analyzing one or more target log files based on the log mode set.

In one possible implementation manner, the screening module is specifically configured to screen one or more target log files from the log packet to be parsed according to the applicable labels of one or more log modes in the log mode set; the applicable tag is used to indicate directory information and type information of a log file to which the log mode is applicable.

In another possible implementation, the applicable labels for each log schema include a directory label and a file type label; the directory label is used for indicating the directory of the log file applicable to the log mode, and the file type label is used for indicating the type of the log file applicable to the log mode; the screening module is specifically used for screening out log files of one or more catalogues consistent with the catalogues indicated by the catalogue labels from the log packet to be analyzed; one or more target log files consistent with the type indicated by the file type label are screened from the log files of one or more catalogues.

In yet another possible implementation, the parsing module is specifically configured to determine, for each of the one or more target log files, one or more log patterns in the set of log patterns that match a file type of the target log file; the target log file is parsed using one or more log schemas.

In another possible implementation manner, the parsing module is specifically configured to determine a splitting manner of the target log file according to a type of the target log file; splitting the target log file into one or more log fragments according to a splitting mode; one or more log segments of the target log file are parsed using one or more log schemas.

In another possible implementation manner, the screening module is specifically configured to extract key information from the log packet to be parsed; the key information is used for indicating the classification attribute of the log packet to be analyzed; and screening the log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library according to the key information.

In yet another possible implementation manner, the screening module is further configured to screen one or more log packets to be parsed from the log library based on a fault problem of the computing device; the classification attribute of the fault problem is consistent with the classification attribute of each log packet to be analyzed.

In a third aspect, the present application provides a computing device comprising: a processor and a memory; the memory stores instructions executable by the processor; the processor is configured to execute the instructions to cause the computing device to implement the method of the first aspect described above.

In a fourth aspect, the present application provides a computer-readable storage medium comprising: computer software instructions; the computer software instructions, when executed in a computing device, cause the computing device to implement the method of the first aspect described above.

In a fifth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the steps of the related method described in the first aspect above, to carry out the method of the first aspect above.

Advantageous effects of the second aspect to the fifth aspect described above may refer to corresponding descriptions of the first aspect, and are not repeated.

Drawings

Fig. 1 is an application environment schematic diagram of a log parsing method provided in the present application;

fig. 2 is a schematic architecture diagram of a log parsing device provided in the present application;

fig. 3 is a flow chart of a log parsing method provided in the present application;

FIG. 4 is a flow chart of another log parsing method provided in the present application;

FIG. 5 is a flow chart of another log parsing method provided in the present application;

FIG. 6 is a flowchart of a method for screening target log files provided in the present application;

FIG. 7 is a schematic flow chart of a splitting method provided in the present application;

FIG. 8 is a schematic flow chart of log parsing provided in the present application;

FIG. 9 is a schematic flow chart of a problem area investigation provided in the present application;

fig. 10 is a schematic diagram of a log parsing device provided in the present application;

fig. 11 is a schematic diagram of a composition of a computing device provided herein.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the terms "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect, and those skilled in the art will understand that the terms "first", "second", and the like are not limited in number and execution order.

In order to facilitate understanding of the technical solutions of the present application, the terms referred to in the present application are first described in the following.

1. Log file: a log file refers to a time ordered collection of certain operations of an object specified by a computing device and its results of operations. Each log file is made up of log records, each log record describing a separate computing device event. The log file records the necessary, valuable information for the relevant activities of the computing device. Specifically, the log file may provide the following functions: monitoring computing device resources, auditing user behavior, determining a fault range, supporting recovery of computing devices, and the like.

2. Log mode: for matching the content of the log file. For log parsing, the traditional approach is for an experienced technician to manually view the log file to find the target content in the log file. The scheme has lower efficiency and higher requirement on human resource cost. In lieu of manual schemes, log schemas are therefore developed to enable automatic query matching of target content in log files. The log mode has various forms, such as keywords and a combination of a plurality of keywords, a matching template written based on a regular expression, a logic flow of code implementation, and the like, and the embodiment of the application does not limit the specific form of the log mode.

As described in the background, log-based analytic diagnostics is the most common method of fault localization of computing devices in a data center. The current log parsing scheme is to read each log file in turn, and based on a plurality of preset log patterns, search is performed in the log files from front to back according to rows to determine whether a certain fault feature exists in the computing device. For the current log parsing scheme, the size of the log amount often determines the efficiency of log parsing. In the face of massive logs in a data center, the analysis efficiency of the scheme is too low to meet the requirements of service-level Agreement (SLA).

In summary, how to improve log parsing efficiency is a problem to be solved.

Based on the above, the embodiment of the application provides a log analysis method, which screens a log mode set matched with the classification attribute from a log mode library based on the classification attribute of a log report to be analyzed so as to analyze the log packet to be analyzed. The method can determine the range of the log mode which participates in calculation, avoid invalid calculation and effectively improve the analysis efficiency of the log.

The log parsing method provided by the application can be applied to an application environment shown in fig. 1. As shown in fig. 1, the application environment may include: log parsing means 101 and user equipment 102. The log analyzing apparatus 101 and the user device 102 are connected to each other.

The log parsing apparatus 101 may be applied to a computing device. Wherein, the computing device can be a computing device cluster formed by a plurality of computing devices, or a single computing device, or a computer. The log parsing means 101 may specifically be a processor or a processing chip in a computing device, etc. The embodiment of the application does not limit the specific device form of the computing device. In fig. 1, the log parsing apparatus 101 is shown as an example applied to a single computing device. In particular, the computing device may be a server.

The user-side device 102 may be an electronic device such as a mobile phone terminal, a computer, or a tablet computer. The embodiment of the present application does not limit the specific device configuration of the user side device 102, and in fig. 1, the user side device 102 is shown as a computer. The user side device 102 may generate a log file during the operation process, and export a log packet to be parsed and send the log packet to the log parsing device 101.

In some embodiments, the user-side device 102 may continually generate journals during operation. When log parsing is needed (e.g. a fault occurs), the user may operate on the user side device 102, derive a log packet to be parsed by configuring the time to be parsed or the range to be parsed, and upload the log packet to be parsed to the log parsing apparatus 101 through the user side device 102. Further, the log parsing device 101 may extract key information for indicating classification attribute in the log package to be parsed, determine a log pattern set matched with the key information from the log pattern library based on the key information, reduce the range of log patterns participating in calculation, parse the parsed log package, and improve the parsing efficiency of the log.

The above embodiment is exemplified by the fact that the device for generating the log and the device for transmitting the log are the same device (the user side device 102), and in other embodiments, the device for generating the log and the device for transmitting the log may be different devices, which is not specifically limited in the embodiments of the present application.

Fig. 2 is a schematic architecture diagram of a log parsing device according to an embodiment of the present application. As shown in fig. 2, the log parsing apparatus includes: the system comprises a log management module, a log mode library management module and a log calculation module.

The log management module is used for managing the log package and the log in the log package. The log management module includes: catalog management unit 1, file management unit 1, log classification management unit, and log label management unit.

The catalog management unit 1 is responsible for managing the catalog of the log files in the log package, and can preset a catalog structure which can be identified by the log files of a specific type in advance. The file management unit 1 is responsible for managing specific log files in the log package, and can preset log files with specific types, under specific directories and specific naming rules. And the log classification management unit is responsible for tree structure classification of log packets. The log label management unit is used for labeling the log package and comprises a belonging equipment label, a belonging component label, a version label, a model label and the like.

The log mode library management module is used for managing various log modes in the log mode library. The log mode library management module comprises: catalog management unit 2, file management unit 2, search rule management unit, and pattern tag management unit.

The directory management unit 2 is responsible for a directory that a particular log pattern needs to match, e.g. a certain log pattern is only valid under a certain directory. The file management unit 2 is responsible for a certain log pattern of files that need to be matched, e.g. a certain log pattern is only valid when a certain file or a certain class of files. The search rule management unit is used for managing specific log modes, indicating that the log modes match specific contents under certain conditions, and storing the specific contents in the form of key value pairs. The log pattern may be a regular expression or a DSL grammar that evolves based on a regular expression. The mode label management unit is used for labeling the log mode and comprises a belonging equipment label, a belonging component label, a version label, a model label, a catalog label, a file label and the like.

The log calculation module is used for analyzing the log package. The log calculation module includes: the system comprises a classification calculating unit, a target and file calculating unit, a key information calculating unit, a log splitting calculating unit and a log mode calculating unit.

The classification calculation unit is responsible for filtering calculation according to labels in the log calculation process in the use process so as to reduce the calculation range. And in the training process, the method is responsible for accurately classifying the log and labeling a new classification label.

The catalog and file calculating unit is used for filtering and calculating the log catalog and the log file type label according to the log mode in the process of log analysis in use so as to reduce the calculation range. And updating accurate directory and file type label information for the log mode in the training process.

The key information calculating unit is used for filtering and calculating key information labels (such as model numbers and version numbers) according to the log mode in the log analysis process in the using process so as to reduce the calculating range. The method is responsible for extracting the key information for log mode analysis and adding the key information labels for log packets in the training process.

The log splitting calculation unit is used for splitting the log files according to the file types during log analysis, including splitting according to blocks and splitting according to rows, and the optimal splitting mode greatly accelerates the file matching efficiency.

The log mode calculation unit is used for matching information in the log fragments during log analysis and storing the information according to preset keys and formats.

The log parsing apparatus shown in fig. 2 includes a log pattern library (not shown) for storing various log patterns. The log mode library can also realize a labeled management mode of the log mode, and different labels are added for various log modes in different stages of log calculation, so that the labels of the log modes are continuously accurate. And in the process of log analysis, the available log modes are screened out from a large number of log modes by using the labels of the log modes, so that the calculation range is reduced, and the efficiency of log analysis is further improved. The specific log parsing process is described in the following corresponding embodiment of fig. 3, and is not described in detail herein.

In addition, the log parsing device shown in fig. 2 further includes a log library for storing log packets. Wherein the log package may include one or more log files therein. The log library can also realize a labeled management mode of the log package, and different labels are added to the log package in different stages of log analysis, so that the labels of the log package are continuously accurate. Under the scene of problem influence range investigation, the label of the log packet can be utilized to screen the log packet which can participate in calculation from a large number of historical log packets, so that the calculation range is reduced, the efficiency of log analysis is further improved, and the problem influence range can be rapidly investigated. The process of problem influence range investigation is specifically described in the following corresponding description of fig. 5, and will not be described in detail here.

The following describes the training process of the log mode and the tagging of the log package in detail with reference to the functional modules shown in fig. 2.

The log mode is of various types, and the devices to which different log modes are applicable are different. For example, some log patterns are applicable to log parsing of log files in a running server log package, and some log patterns are applicable to log parsing of log files in a storage server log package. Therefore, classification attributes (or classification labels) of different device types can be preset, and the corresponding log mode is labeled with the label of the corresponding device.

Further, the classification labels may be further refined because the components on the computing device for which different log patterns are applicable are also different. For example, for the log mode carrying the classification label of the running server, the label of the part, such as an in-band controller or an out-of-band controller, can be further marked based on the applicable part. Still further, the classification labels of the model and the version can be further configured for the model information and the version information of the part to which the label belongs in a refined mode, so that the label classification of the log mode is more accurate. The above-described log schema tagging process may be responsible for the search rule management unit and the schema tag management unit in fig. 2. The system comprises a search rule management unit, a mode label management unit and a classification label management unit, wherein the search rule management unit is used for managing specific log modes, and the mode label management unit is used for adding a classification label to the newly added log mode.

In addition, because of some log patterns, it is only applicable under a specific directory, or to a specific log file. Therefore, the directory tag and the file type tag of the log mode can also be configured to realize further refined classification of the log mode. The directory management unit 2 and the file management unit 2 in fig. 2 may be used to take charge of the directory and the file that need to be matched for a specific log pattern. The schema tag management unit is further configured to add a directory or file type tag to the log schema. As an example, a log schema such as that related to system security applies to log files in the following directory "/var/log/secure", and a log schema such as that related to timed tasks applies to log files in the following directory "/var/log/cron".

Since a log package is a collection of one or more log files derived from a certain device, there are also various types of log packages, and the devices, components, models and versions of the components, etc. to which different log packages belong may be different. Therefore, classification labels such as equipment, parts, models, versions and the like can be marked on the log package at one time, and the classification labels of the log package are refined layer by layer, so that the classification labels of the log package are more accurate. The process of labeling the log package may be responsible for the log classification management unit and the log label management unit in fig. 2. The log classification management unit is used for managing tree structure classification of log packets, and leaf nodes of the tree structure are one or more history log packets with classification labels. The log label management unit is used for adding a classification label to the newly added log packet when the log library newly adds the log packet.

In addition, the log package includes a plurality of directories, each of which includes one or more log files. For the management of the log package, the directory management unit 1 and the file management unit 1 are implemented in cooperation with each other in the log management module of fig. 2.

In the above-described process of configuring the classification labels for the log mode, the labeling of a part of the log mode may be implemented by manual configuration in the initial stage. In the subsequent operation process of the log analysis device, the automatic labeling of the log mode can be realized through the following flow based on the ideas of supervised learning and unsupervised learning.

Wherein, for the log mode, the classification attribute (or classification label) and the applicable label of the log mode are determined according to the classification attribute, directory information and file type information of one or more log files; the one or more log files are one or more log files in a plurality of log packages that are screened from the log library based on the current classification attribute of the log schema, and the log schema supports parsing the one or more log files.

For example, a log pattern is newly added in the log pattern library, and the current classification attribute of the log pattern is the running server. Therefore, the log analysis device can screen a batch of history log packages from the log mode library based on the classification attribute of the operation server, sequentially match log files in the history log packages of different equipment types by adopting the newly added log mode, and configure the classification attribute corresponding to the log file with the analysis result as the classification attribute of the log mode. The method further comprises the step of configuring classification labels of the affiliated parts, models, versions and the like for the log mode. In addition, with the refinement of the matching process, it may also be determined which directories and types of log files the log mode is applicable to, and then the directory tags and file tags may be further configured for the log mode.

Similarly, the process of configuring the classification labels for the log packets can also be based on the ideas of supervised learning and unsupervised learning in the log analysis process, so as to realize automatic labeling of the log packets.

Wherein, for a log package, classification attributes of the log package are determined from classification attributes of one or more log patterns that are screened from the log pattern library based on current classification attributes of the log package, and the one or more log patterns support parsing the log package.

For example, a log package is added to the log library, and the current classification attribute of the log package can be determined to be the storage server-out-of-band controller type. The log parsing device may screen a batch of log patterns from the log pattern library based on the current classification attribute of the log packet, parse the log files in the log packet by using the batch of log patterns, and configure the classification attribute corresponding to the log pattern that obtains the parsing result as the classification attribute of the log packet. The method further comprises the steps of adding model labels of the components, version labels of the components and the like to the log package, configuring tree structure classification for the log package, and further realizing accurate classification management of the log package.

It can be understood that the above-mentioned log mode labeling and log package labeling processes can be performed iteratively, the more the iteration times are, the more accurate the labeling is, and in the subsequent log analysis process, the accurate screening can be performed through the labels, so that the file range participating in calculation is greatly reduced, and the log analysis efficiency is greatly improved.

The log parsing method provided in the present application is described in detail below with reference to the accompanying drawings and specific embodiments.

Fig. 3 is a flow chart of a log parsing method according to an embodiment of the present application. The log parsing method provided by the application can be applied to an application environment shown in fig. 1 or an architecture shown in fig. 2.

As shown in fig. 3, the log parsing method provided in the present application may specifically include the following steps:

s301, a log mode set matched with the classification attribute of the log packet to be analyzed is screened from a log mode library.

Wherein the set of log patterns includes one or more log patterns. The log package to be analyzed comprises a plurality of log files, and the log mode is used for matching the content of the log files.

As described above, since devices, components, and the like to which different log patterns are applied are different, a labeled management method is configured for the log patterns in the log pattern library, thereby facilitating screening of the log patterns. Because the classification attribute (or classification label) of the log mode can determine which classification attribute of the log packet is applicable to the log mode, the matched log mode set can be screened from the log mode library based on the classification attribute of the log packet to be analyzed, so that the range of the log mode participating in calculation is reduced from a large number of log modes, and the efficiency of log analysis is improved.

It should be noted that, at present, some online log parsing platforms need to establish a connection with a computing device of a user, so as to continuously obtain log files from the computing device of the user to realize online real-time analysis. After the log file is generated, the log file is analyzed and calculated through a set of data flow mechanism. However, for the offline log, since the offline log is derived at a certain time, it is difficult to determine the sequence of log file generation, and therefore, the online real-time analysis scheme is not suitable for the analysis of the offline log. Moreover, aiming at data sensitive users in banking industry and the like, the scheme for acquiring the log on line in real time has higher potential safety hazard of data.

The log analysis method provided by the embodiment of the application can be suitable for the scene of offline log package analysis. The user can pertinently export the log package to be analyzed without potential safety hazard of data on the computing equipment needing fault location, and the log package to be analyzed is sent/uploaded to the log analyzing device through the user side equipment to conduct log analysis so as to determine possible problems of the computing equipment. Thus, in one possible implementation, as shown in FIG. 4, the above S301 includes the following S301a-S301b.

S301a, extracting key information from the log packet to be analyzed.

The key information is used for indicating classification attributes of the log packet to be analyzed. The classification attribute includes at least one of: the information of the equipment to which the log packet to be analyzed belongs, the information of the part to which the log packet to be analyzed belongs, the model information of the part and the version information of the part.

An offline log is a collection of log files derived at a certain time. Therefore, when the user needs to analyze the offline log, the log package to be analyzed can be exported and uploaded to the log analysis device. Further, the log parsing device may extract information of the device and information of the component to which the log packet to be parsed belongs from the packet name of the log packet to be parsed. Then, after decompressing the log packet to be parsed, model information of the component, version information of the component and the like can be further obtained from the file obtained by decompression.

For example, for a naming rule of a packet name of a log packet to be parsed, it may be determined that the log packet to be parsed belongs to an operation server packet, or a storage server packet, or a patrol server packet, or the like. Or, the IP information in the packet name may be extracted to determine the device to which the log packet belongs from the IP based on a preset relationship. Further, the information of the part can be determined in a refined manner according to the keywords in the package name. For example, if the packet name carries a keyword such as "OS" or "windows", it can be determined to belong to the in-band controller. If the packet name carries "sn", it can be determined to belong to the out-of-band controller.

And S301b, screening a log mode set matched with the classification attribute of the log packet to be analyzed from a log mode library according to the key information.

In some embodiments, after extracting the key information of the log packet to be parsed, the log parsing apparatus may use the key information to query and filter the corresponding log pattern set from the log pattern library, and determine the range of log patterns participating in the calculation. As described above, the classification attribute of the log pattern in the log pattern library indicates the information of the device to which the log pattern belongs, the information of the component to which the log pattern belongs, the model information of the component, and the version information of the component. Therefore, the screening can be sequentially performed according to the major class-minor class layers, so that the screening efficiency is improved. After determining the log mode set, as shown in fig. 4, the log parsing apparatus may continue to perform S302-S303 as follows.

For example, if the key information of the log package to be parsed indicates that it belongs to the running server package, belongs to the in-band controller, and the model B, the specific version includes version 1 and version 2. The log analysis device determines the log mode carrying the operation server tag and the in-band controller tag from the log mode library, further, screens again to determine the log mode carrying the classification tag of the model B, and finally determines one or more log modes carrying the classification tags of the version 1 and the version 2 to form a log mode set participating in calculation of the log package to be analyzed.

It should be understood that the above hierarchy of classifications is only one example, and is illustrated with four layers of classifications of devices, parts, models, and versions. In a specific scenario, the above classification level may be further refined or simplified, which is not specifically limited in the embodiments of the present application. Any form of variation in the number of classification levels and descriptions of the layers is intended to be within the scope of the present application.

It can be seen that, compared with the online log parsing scheme, users do not need to worry about the risk of privacy disclosure when the log is continuously uploaded to the log parsing platform. The user can selectively export the offline log package without hidden danger of privacy disclosure according to the self requirement, and upload the offline log package to the log analysis device for analysis, so as to meet the log analysis requirement of the user.

In other scenarios, the user may have a need for problem area investigation. Problem-scope investigation is generally performed in a scenario maintained by a data center, where faults of computing devices in the data center often have batch properties and correlation, such as problems caused by defects of a certain batch of computing devices. In the operation and maintenance scenario of a large data center, how to quickly analyze which computing devices have a certain problem and perform batch rectification is a problem to be solved. Based on this, in another possible implementation manner, as shown in fig. 5, before S301, the log parsing method provided in the present application further includes the following S300.

S300, screening one or more log packets to be analyzed from a log library based on the fault problem of the computing equipment.

The classification attribute of the fault problem is consistent with the classification attribute of each log packet to be analyzed.

The fault problem, or the problem to be troubleshooted, corresponds to one or more log patterns. In some embodiments, the log parsing apparatus may screen one or more log packages to be parsed from the log library based on classification labels of fault-corresponding log patterns of the computing device.

Optionally, for the one or more selected log packets to be parsed, the log parsing device may perform parsing calculation by using a multithreading scheme. And an independent thread is distributed for each log packet to be analyzed, so that the log packets to be analyzed are calculated in parallel, and the log analysis efficiency is greatly improved. Or a cluster computing mode can be adopted, so that the computing power level is improved, and the log analysis efficiency is further improved.

It should be noted that, for each log packet to be parsed in the one or more log packets to be parsed, as shown in fig. 5, the log parsing apparatus may execute S301 to S303 to perform log parsing, so as to determine which devices corresponding to the log packets have the above fault problem. In this scenario, the log parsing apparatus may directly execute S301 to screen the log pattern set from the log pattern library, or may directly screen the log pattern set matched with the fault problem from the log pattern library based on the fault problem, which is not specifically limited in the embodiment of the present application.

S302, screening one or more target log files from the log package to be analyzed.

As described above, the log package to be parsed includes a plurality of log files. In the process of log analysis, not all log files in the log package to be analyzed need to participate in calculation, and a part of log files may be irrelevant to the log analysis. Therefore, the log parsing device can further screen one or more target log files from the log package to be parsed, and efficiency is further improved.

Specifically, the log parsing device may screen one or more target log files from the log package to be parsed according to the applicable labels of the log mode set screened in S301. Each log mode has an applicable label, and the applicable label is used for indicating directory information and type information of a log file applicable to the log mode. In addition, the applicable labels further include a directory label for indicating a directory of the log file to which the log mode is applicable and a file type label for indicating a type of the log file to which the log mode is applicable. As shown in fig. 6, the above process of screening the target log file from the log package to be parsed specifically includes the following steps S302a-S302b.

S302a, screening out log files of one or more catalogues consistent with the catalogues indicated by the catalogue labels from the log package to be analyzed.

In some embodiments, the log parsing device may obtain a directory label of each log pattern in the log pattern set, and screen out log files under one or more directories matched with the directory label from a plurality of directories obtained by decompressing the log packet to be parsed. This process may be responsible for the catalog management unit 2 and catalog and file calculation unit in fig. 2.

For example, the log schema set includes log schema 1-log schema 4. The directory labels of the log mode 1 and the log mode 2 are a directory A and a directory B, and the directory labels of the log mode 3 and the log mode 4 are a directory B and a directory C. The directory labels of the log schema set acquired by the log parsing apparatus are "directory a, directory B, and directory C". In addition, the log packet to be analyzed comprises five catalogs, namely a catalog A-a catalog E. The log parsing means may filter out the directories D and E and determine the log files under the directories a-C as the log files participating in the calculation.

S302b, screening one or more target log files consistent with the type indicated by the file type label from the log files of one or more catalogues.

In some embodiments, the log parsing device may obtain a file type tag of each log pattern in the log pattern set, and further screen out one or more target log files matching the file type tag from the log files of the one or more directories obtained in S302 a. This process may be responsible for the file management unit 2 and the directory and file calculation unit in fig. 2.

Continuing with the above example, the file type labels of the log mode 1 and the log mode 2 are "file a and file B", and the file type labels of the log mode 3 and the log mode 4 are "file C and file D". The file type labels of the log mode set acquired by the log analyzing device are "file a, file B, file C and file D". For each of the aforementioned directories a-C, one or more target log files whose file types satisfy the file type tags are further filtered. For example, for directory a, log files 1-10 are included. The file type of the log files 1-5 is file A, the file type of the log files 6-8 is file C, and the file type of the log files 9-10 is file E. The log parsing means may determine that log files 1-8 in directory a meet the criteria as target log files that may participate in the calculation.

S303, analyzing one or more target log files based on the log mode set.

In some embodiments, after the log schema set is filtered and the log package to be parsed is filtered, the log parsing apparatus may parse one or more target log files based on the log schema.

The specific analysis process can be responsible for the log mode calculation unit in fig. 2, the log mode can be used for matching the content in the log file, and the analysis result obtained by matching is saved. For example, the values can be stored in a database in the form of key value pairs, the keys can be preset query parameters, and the values are analysis results obtained by matching the log patterns from the log files.

For example, a log pattern for matching user login situations may be expressed as "user </log >", where "</log >" is a wild card for matching changing parameters. For example, a log file includes the following: "23:00user a login success,23:01user b login fault,23:02user c login success ", the analysis result obtained by using the log pattern matching analysis may be expressed as" a success, b failure, c success ". The use of a preset key (e.g., 123) for saving may be denoted as "1: a success,2: b fault,3: and c success ", the user can quickly inquire the analysis result according to the preset key so as to analyze the login condition of the user.

Optionally, the log parsing device may also perform parsing calculation by using a multi-threaded scheme with respect to the one or more target log files that are screened out. And an independent thread is allocated to each target log file, so that the target log files are calculated in parallel, and the log analysis efficiency is greatly improved. Or a cluster computing mode can be adopted, so that the computing power level is improved, and the log analysis efficiency is further improved.

For each of the one or more target log files, as shown in FIG. 7, the parsing process specifically includes S303a-S303b as follows.

S303a, determining one or more log modes matched with the file types of the target log files in the log mode set.

S303b, analyzing the target log file by using one or more log modes.

In some embodiments, the log parsing device may determine, based on the file type of the target log file, from the set of log patterns that the file type tag conforms to one or more log patterns of the target log file, and parse the target log file using the one or more log patterns.

In the process of log analysis, the computing device needs to read the log file from the disk and load the log file into the memory for analysis and calculation. Therefore, the reading of the log file occupies the operation resources of the computing device, and one log file may correspond to a plurality of log modes, so that the situation that the same log file is repeatedly read for a plurality of times may exist, which may cause great waste of computing resources. It can be seen that the above steps mainly group log patterns according to the target log files, and establish a corresponding relationship between each target log file and one or more log patterns, so that the target log files can be matched by adopting all applicable log patterns through one-time reading, thereby avoiding the waste of computing resources.

In one possible implementation manner, the above S303b may be specifically implemented as the following steps a-c.

Step a, determining a splitting mode of the target log file according to the type of the target log file; step b, splitting the log file into one or more log fragments according to a splitting mode; and c, analyzing one or more log fragments of the target log file by using one or more log modes.

It should be understood that the log files are different in file type and different in corresponding content format. For example, for a log file of a configuration type, its content typically includes a plurality of configuration parameters, and a set of configuration parameters typically exists in the form of paragraphs. For a log file of an operation type, the content thereof generally records operation information in a time sequence, and the operation information at a certain moment generally exists in a line form. According to the embodiment of the application, aiming at different types of log files, the log files are split according to rows or segments, so that the log modes are conveniently matched, and the log analysis efficiency is further improved. This process may be responsible for implementation by the log split computation unit and log mode computation unit of fig. 2.

The technical scheme provided by the embodiment at least brings the following beneficial effects, and the log analysis method provided by the embodiment of the application firstly screens a log mode set from a log mode library according to the classification attribute of the log packet to be analyzed. Further, one or more target log files are screened out according to the log package to be analyzed, and then the target log files are analyzed based on the log mode set. Compared with the traditional log analysis scheme, the scheme can effectively reduce the log mode range and the log file range of the calculation in the log analysis process, saves a large amount of invalid calculation, greatly improves the efficiency of log analysis, can be suitable for the scenes of a data center and the like with massive logs, does not need higher calculation cost support, and has wide application prospect.

Furthermore, the method and the device are also suitable for offline scenes or scenes with higher data security sensitivity of users, and have higher universality. In addition, the method and the device can also realize the problem influence range analysis and investigation of specific problems in massive log packages, and the like, can quickly locate the problem influence range and ensure the operation and maintenance efficiency of the data center.

Fig. 8 is a schematic flow chart of log parsing provided in the present application. As shown in fig. 8, first, the log package is packetized, and the classification of the log package is determined based on the key information in the package name. And further extracting key information, and refining and obtaining information such as version, model and the like corresponding to the log package. And then, calculating a log mode range according to the key information, namely filtering the log mode range which needs to be matched with the current log packet to be analyzed from a log mode library through key information such as version, model and the like. And further calculating the log file range of the log package to be analyzed, namely screening and filtering the log package to be analyzed according to the applicable catalogue and the applicable file of the log mode. And grouping the log modes, and establishing a corresponding relation between one log file and one or more log modes. And then multithreading reads different log files, and allocates an independent thread or performs calculation according to a specific log file. In addition, the log file can be split, the split according to the file type or the split according to the segments is determined to be one or more segments, finally log matching calculation is completed, and the split segments are analyzed by using a log mode.

Fig. 9 is a schematic flow chart of problem influence range investigation provided in the present application. As shown in fig. 9, first, a log packet participating in the calculation is determined according to a problem to be checked (fault problem), that is, a plurality of log packets participating in the calculation are screened out from a log library according to information such as a model, a version and the like of the problem to be checked, then the plurality of log packets are analyzed in a multithreading manner, and each log packet is allocated with an independent thread for calculation. And calculating a log file range for each log packet, namely screening and filtering the log packets according to the applicable catalogue and the applicable file of the log mode corresponding to the problem to be checked. And grouping the log modes, and establishing a corresponding relation between one log and one or more log modes. And then multithreading reads different log files, and allocates an independent thread or performs calculation according to a specific log file. In addition, the log file can be split, the split according to the file type or the split according to the segments is determined to be one or more segments, finally log matching calculation is completed, and the split segments are analyzed by using a log mode.

It can be seen that the foregoing description of the solution provided by the embodiments of the present application has been presented mainly from a method perspective. To achieve the above-mentioned functions, embodiments of the present application provide corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In an exemplary embodiment, the present application further provides a log parsing apparatus. The log parsing apparatus may include one or more functional modules for implementing the log parsing method of the above method embodiment.

For example, fig. 10 is a schematic diagram of a log parsing device according to an embodiment of the present application. As shown in fig. 10, the log parsing apparatus includes: the filtering module 1001 and the parsing module 1002. The filtering module 1001 and the parsing module 1002 are connected to each other.

The screening module 1001 is configured to screen a log mode set that matches a classification attribute of a log packet to be parsed from a log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files.

The filtering module 1001 is further configured to filter one or more target log files from the log packet to be parsed.

The parsing module 1002 is configured to parse one or more target log files based on the log schema set.

In some embodiments, the screening module 1001 is specifically configured to screen one or more target log files from the log package to be parsed according to the applicable labels of one or more log patterns in the log pattern set; the applicable tag is used to indicate directory information and type information of a log file to which the log mode is applicable.

In some embodiments, the applicable labels for each log schema include a directory label and a file type label; the directory label is used for indicating the directory of the log file applicable to the log mode, and the file type label is used for indicating the type of the log file applicable to the log mode; the screening module 1001 is specifically configured to screen, from the log packet to be parsed, log files of one or more directories that are consistent with the directory indicated by the directory label; one or more target log files consistent with the type indicated by the file type label are screened from the log files of one or more catalogues.

In some embodiments, parsing module 1002 is specifically configured to, for each of one or more target log files, determine one or more log patterns in the set of log patterns that match a file type of the target log file; the target log file is parsed using one or more log schemas.

In some embodiments, the parsing module 1002 is specifically configured to determine a splitting manner of the target log file according to a type of the target log file; splitting the target log file into one or more log fragments according to a splitting mode; one or more log segments of the target log file are parsed using one or more log schemas.

In some embodiments, the filtering module 1001 is specifically configured to extract key information from the log packet to be parsed; the key information is used for indicating the classification attribute of the log packet to be analyzed; and screening the log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library according to the key information.

In some embodiments, the screening module 1001 is further configured to screen one or more log packets to be parsed from the log library based on a fault problem of the computing device; the classification attribute of the fault problem is consistent with the classification attribute of each log packet to be analyzed.

In some embodiments, the classification attributes of a log package are determined from classification attributes of one or more log patterns that are screened from the log pattern library based on current classification attributes of the log package, and the one or more log patterns support parsing the log package.

In some embodiments, the classification attributes and applicable labels of the log schema are determined from classification attributes, directory information, and file type information of one or more log files; the one or more log files are one or more log files in a plurality of log packages screened from the log library based on a current classification attribute of the log schema, and the log schema supports parsing the one or more log files.

In some embodiments, the classification attribute of the log package to be parsed includes at least one of: the information of the equipment to which the log packet to be analyzed belongs, the information of the part to which the log packet to be analyzed belongs, the model information of the part and the version information of the part.

In the case of implementing the functions of the integrated modules in the form of hardware, the embodiment of the application provides a schematic structural diagram of a computing device, where the computing device may be the log parsing apparatus. As shown in fig. 11, the computing device 1100 includes: a processor 1102, a communication interface 1103, a bus 1104. Optionally, the computing device may also include a memory 1101.

The processor 1102 may be a processor that implements or performs the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 1102 may be a central processing unit, general purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 1102 may also be a combination of computing functions, e.g., including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

A communication interface 1103 for connecting with other devices via a communication network. The communication network may be an ethernet, a radio access network, a wireless local area network (wireless local area networks, WLAN), etc.

The memory 1101 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

As a possible implementation, the memory 1101 may exist separately from the processor 1102, and the memory 1101 may be connected to the processor 1102 by a bus 1104 for storing instructions or program code. The processor 1102, when calling and executing instructions or program code stored in the memory 1101, can implement the log parsing method provided in the embodiments of the present application.

In another possible implementation, the memory 1101 may also be integrated with the processor 1102.

Bus 1104 may be an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like. The bus 1104 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus.

It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the log parsing device is divided into different functional modules to perform all or part of the above-described functions.

Embodiments of the present application also provide a computer-readable storage medium. All or part of the flow in the above method embodiments may be implemented by computer instructions to instruct related hardware, and the program may be stored in the above computer readable storage medium, and the program may include the flow in the above method embodiments when executed. The computer readable storage medium may be any of the foregoing embodiments or memory. The computer readable storage medium may be an external storage device of the log analyzing apparatus, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like provided in the log analyzing apparatus. Further, the computer-readable storage medium may include both the internal storage unit and the external storage device of the log parsing apparatus. The computer readable storage medium stores the computer program and other programs and data required by the log analyzing device. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

The present application also provides a computer program product comprising a computer program which, when run on a computer, causes the computer to perform any one of the log parsing methods provided in the above embodiments.

Although the present application has been described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the figures, the disclosure, and the appended claims. In the claims, the word "Comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the present application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of log parsing, the method comprising:

screening a log mode set matched with the classification attribute of the log packet to be analyzed from a log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files;

screening one or more target log files from the log packet to be analyzed;

and analyzing the one or more target log files based on the log mode set.

2. The method of claim 1, wherein the screening one or more target log files from the log package to be parsed comprises:

screening the one or more target log files from the log package to be analyzed according to the applicable labels of one or more log modes in the log mode set; the applicable tag is used for indicating directory information and type information of the log file applicable to the log mode.

3. The method of claim 2, wherein the applicable labels for each log schema include a directory label and a file type label; the directory tag is used for indicating a directory of the log file applicable to the log mode, and the file type tag is used for indicating a type of the log file applicable to the log mode;

the screening the one or more target log files from the log packet to be parsed based on the applicable labels of one or more log patterns in the log pattern set includes:

screening out log files of one or more catalogues consistent with the catalogues indicated by the catalogue labels from the log package to be analyzed;

screening one or more target log files consistent with the type indicated by the file type label from the log files of the one or more catalogues.

4. A method according to any of claims 1-3, wherein said parsing the one or more target log files based on the set of log patterns comprises:

for each of the one or more target log files, determining, in the set of log patterns, one or more log patterns that match a file type of the target log file;

And analyzing the target log file by using the one or more log modes.

5. The method of claim 4, wherein parsing the target log file using the one or more log patterns comprises:

determining a splitting mode of the target log file according to the type of the target log file;

splitting the target log file into one or more log fragments according to the splitting mode;

and analyzing one or more log fragments of the target log file by using the one or more log modes.

6. The method according to any one of claims 1-5, wherein the screening the log schema collection matching the classification attribute of the log package to be parsed from the log schema library comprises:

extracting key information from the log packet to be analyzed; the key information is used for indicating the classification attribute of the log packet to be analyzed;

and screening a log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library according to the key information.

7. The method of any of claims 1-5, wherein prior to determining a set of log patterns from the log pattern library that match the classification attribute of the log package to be parsed, the method further comprises:

Screening one or more log packets to be analyzed from a log library based on the fault problem of the computing equipment; and the classification attribute of the fault problem is consistent with the classification attribute of each log packet to be analyzed.

8. The method of any of claims 1-7, wherein classification attributes of a log package are determined from classification attributes of one or more log patterns that are screened from the log pattern library based on current classification attributes of the log package, and wherein the one or more log patterns support parsing the log package.

9. The method of any of claims 1-7, wherein the classification attributes and applicable labels of the log schema are determined based on classification attributes, directory information, and file type information of one or more log files; the one or more log files are one or more log files in a plurality of log packages screened from the log library based on a current classification attribute of the log schema, and the log schema supports parsing the one or more log files.

10. The method according to any one of claims 1-7, wherein the classification attribute of the log package to be parsed comprises at least one of: the information of the equipment to which the log packet to be analyzed belongs, the information of the part to which the log packet to be analyzed belongs, the model information of the part and the version information of the part.

11. A computing device, the computing device comprising a processor and a memory; the processor is coupled with the memory; the memory is for storing computer instructions that are loaded and executed by the processor to cause a computing device to implement the log parsing method of any one of claims 1 to 10.

12. A log parsing apparatus, the apparatus comprising: a screening module and an analyzing module;

the screening module is used for screening a log mode set matched with the classification attribute of the log packet to be analyzed from the log mode library; the log mode set comprises one or more log modes, the log package to be analyzed comprises a plurality of log files, and the log modes are used for matching the content of the log files;

the screening module is further used for screening one or more target log files from the log package to be analyzed;

the parsing module is configured to parse the one or more target log files based on the log mode set.