CN110321410A - Method, apparatus, storage medium and the electronic equipment that log is extracted - Google Patents
Method, apparatus, storage medium and the electronic equipment that log is extracted Download PDFInfo
- Publication number
- CN110321410A CN110321410A CN201910544248.7A CN201910544248A CN110321410A CN 110321410 A CN110321410 A CN 110321410A CN 201910544248 A CN201910544248 A CN 201910544248A CN 110321410 A CN110321410 A CN 110321410A
- Authority
- CN
- China
- Prior art keywords
- log
- event
- template
- target journaling
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
This disclosure relates to method, apparatus, storage medium and electronic equipment that a kind of log is extracted, it is related to technical field of data processing, this method comprises: determining sample log from log to be extracted, sample log includes multiple log events, target journaling event is extracted from multiple log events, the determining target journaling template with target journaling event matches in preset log template set, include at least one log template in log template set, contents extraction is carried out to log to be extracted according to target journaling template.Without for different logs labyrinth design specialized decimation rule, reduce the workload of developer, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and the scope of application.
Description
Technical field
This disclosure relates to technical field of data processing, and in particular, to a kind of method, apparatus that log is extracted, storage are situated between
Matter and electronic equipment.
Background technique
With the continuous development of information technology, more and more business start to realize by information-based means.It is a variety of
Business is run in multiple platforms (or system) in maintenance process, can generate the log comprising different log events.Due to
Operation system scale is continuously increased, and the scale and format of corresponding log are log-structured more and more multiple there has also been many variations
It is miscellaneous, there is the scene of a variety of log mixing.
In the prior art, for the consistent log of row text structure, need to preset dedicated extracting rule to log
It extracts, flexibility is poor, and the scope of application is small.For the inconsistent log of row text structure, need dedicated to sentence by writing
Disconnected logic and processing logic (such as: regular expression, rule, shell script etc.) log is extracted, the complexity of realization
Height, and developer is needed to develop, efficiency and accuracy are lower.And for text structure is excessively complicated or structure
Unknown log can not be extracted effectively.
Summary of the invention
Purpose of this disclosure is to provide method, apparatus, storage medium and electronic equipments that a kind of log is extracted, to solve
It is existing in the prior art to be difficult to the problem of extracting to the log of text structure complexity.
To achieve the goals above, according to the first aspect of the embodiments of the present disclosure, a kind of method that log is extracted, institute are provided
The method of stating includes:
Sample log is determined from log to be extracted, the sample log includes multiple log events;
Target journaling event is extracted from multiple log events;
The determining target journaling template with the target journaling event matches, the log in preset log template set
It include at least one log template in template set;
Contents extraction is carried out to the log to be extracted according to the target journaling template.
It is optionally, described that target journaling event is extracted from multiple log events, comprising:
For each log event, determine in the log event and multiple log events except the log event it
The diversity factor of outer each log event, and according in the log event and multiple log events in addition to the log event
Each log event diversity factor, determine the difference characteristic value of the log event;
According to the difference characteristic value of each log event, the corresponding event extraction parameter of the sample log is determined;
According to the event extraction parameter, the target journaling event is extracted from multiple log events.
Optionally, the diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
Optionally, described according to the event extraction parameter, the target journaling is extracted from multiple log events
Event, comprising:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from more
The target journaling event is extracted in a log event.
Optionally, the target journaling mould with the target journaling event matches determining in preset log template set
Plate, comprising:
For each target journaling event, each log mould in the target journaling event and the log template set is determined
The matching degree of plate;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
Optionally, the target journaling template is multiple;It is described according to the target journaling template to the day to be extracted
Will carries out contents extraction, comprising:
For each log event for including in the log to be extracted, the log event and multiple target days are determined
The matching degree of each target journaling template in will template, and according to the maximum target journaling template of matching degree to the day
Will event carries out contents extraction.
Optionally, the target journaling template is multiple;It is described according to the target journaling template to the day to be extracted
Will carries out contents extraction, comprising:
For each log event for including in the log to be extracted, the log event and multiple target days are determined
The matching degree of each target journaling template in will template, if the maximum value in matching degree is greater than or equal to matching degree threshold value,
Contents extraction then is carried out to the log event according to the matching degree maximum target journaling template.
According to the second aspect of an embodiment of the present disclosure, a kind of device that log is extracted is provided, described device includes:
Sample determining module, for determining sample log from log to be extracted, the sample log includes multiple logs
Event;
Abstraction module, for extracting target journaling event from multiple log events;
Template determining module, for the target with the target journaling event matches determining in preset log template set
Log template includes at least one log template in the log template set;
Extraction module, for carrying out contents extraction to the log to be extracted according to the target journaling template.
Optionally, the abstraction module includes:
It determines submodule, for being directed to each log event, determines the log event and multiple log events
In each log event in addition to the log event diversity factor, and according in the log event and multiple log events
The diversity factor of each log event in addition to the log event determines the difference characteristic value of the log event;
The determining submodule is also used to the difference characteristic value according to each log event, determines the sample day
The corresponding event extraction parameter of will;
Submodule is extracted, for extracting the target from multiple log events according to the event extraction parameter
Log event.
Optionally, the diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
Optionally, the extraction submodule is used for:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from more
The target journaling event is extracted in a log event.
Optionally, the template determining module is used for:
For each target journaling event, each log mould in the target journaling event and the log template set is determined
The matching degree of plate;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
Optionally, the target journaling template is multiple, and the extraction module is used for:
For each log event for including in the log to be extracted, the log event and multiple target days are determined
The matching degree of each target journaling template in will template, and according to the maximum target journaling template of matching degree to the day
Will event carries out contents extraction.
Optionally, the target journaling template is multiple, and the extraction module is used for:
For each log event for including in the log to be extracted, the log event and multiple target days are determined
The matching degree of each target journaling template in will template, if the maximum value in matching degree is greater than or equal to matching degree threshold value,
Contents extraction then is carried out to the log event according to the matching degree maximum target journaling template.
According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, calculating is stored thereon with
The step of machine program, the method that the log that realization first aspect provides when which is executed by processor is extracted.
According to a fourth aspect of embodiments of the present disclosure, a kind of electronic equipment is provided, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize the log of first aspect offer
The step of method of extraction.
Through the above technical solutions, disclosure determination first from log to be extracted includes the sample of multiple log events
Log, later from multiple log events extract target journaling event, then it is preset include at least one log template
Log template set in, the determining target journaling templates with target journaling event matches are finally treated according to target journaling template
It extracts log and carries out contents extraction.It, can be to various without the decimation rule of the labyrinth design specialized for different logs
The log of structure carries out contents extraction automatically, reduces extraction complexity, improves extraction efficiency and the scope of application.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool
Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart for the method that a kind of log shown according to an exemplary embodiment is extracted;
Fig. 2 is the flow chart for the method that another log shown according to an exemplary embodiment is extracted;
Fig. 3 is the block diagram for the device that a kind of log shown according to an exemplary embodiment is extracted;
Fig. 4 is the block diagram for the device that another log shown according to an exemplary embodiment is extracted;
Fig. 5 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
Before method, apparatus, storage medium and the electronic equipment that the log for introducing disclosure offer is extracted, first to this
Application scenarios involved by each embodiment are disclosed to be introduced.The application scenarios are to carry out contents extraction to log.Wherein, log
Source can there are many, can be what kinds of platform was generated when executing multiple business, the log exported by unified port,
It include various types of multiple log events in log, it can be understood as a record in log, i.e., it can be only in log
An individual for vertical description once-through operation or an implementing result, is the basic unit of contents extraction.Log event for example may be used
To include: log event of fire wall, interchanger log event, system execution journal event, business diary event, user's operation day
Will event, database journal event etc..For various types of log events, log template set, log can be previously provided with
Include the corresponding log template of each type of log event in template set, in the log event for extracting the type in
Hold.Since the scale of operation system gradually increases, the type of log event is very much (several hundred kinds), therefore corresponding log template set
In also contain a large amount of log template (such as: JSON template, XML template, date template, KV template, CSV template etc.), such as
Fruit successively matches each log event in log to be extracted with each log template in log template set, calculation amount
All very high with complexity, the efficiency of contents extraction is too low, is difficult to carry out practical application.
Fig. 1 is the flow chart for the method that a kind of log shown according to an exemplary embodiment is extracted, as shown in Figure 1, should
Method includes:
Step 101, sample log is determined from log to be extracted, sample log includes multiple log events.
For example, due to containing a large amount of log event in log to be extracted, and it is difficult to predefine a large amount of
It include the log event of which type in log event, it accordingly also can not be from log template a large amount of in log template set
Selection is suitble to the log template of log to be extracted, carries out contents extraction to log to be extracted.It therefore, can be first from day to be extracted
Determination includes the sample log of multiple log events in will.The quantity for the log event for including in sample log will be far smaller than
The quantity for the log event for including in log to be extracted, therefore in subsequent processing, effectively reduce calculation amount and complexity.
In the present embodiment, can determine sample log according to pre-set rule, for example, according to time range into
Row selection, or random selection.By taking the time range of log to be extracted is one month as an example, then can choose wherein 24 hours
Inside include multiple log events composition sample log, can also be randomly choosed in log to be extracted preset percentage (such as:
10%) multiple log events form sample log.
Step 102, target journaling event is extracted from multiple log events.
It is exemplary, it, can be from sample log since the quantity for the log event for including in sample log is still more
In extract the target journaling event for being able to reflect log to be extracted, to be further reduced the number of log event to be treated
Amount, is effectively reduced the calculation amount and complexity of subsequent processing.For example, analyzing the difference of each log event in sample log first
Characteristic value, difference characteristic value can be understood as uniqueness of the log event in sample log, that is, reflect the log event with
The difference size between other log events in sample log.Come later further according to the difference characteristic value of each log event true
The event extraction parameter of this log of random sample, event extraction parameter can be understood as the energy that sample log is able to reflect log to be extracted
Power size, it is understood that contain the log event of which type in reflected sample log.Finally according to sample log
Event extraction parameter extracts target journaling event from multiple log events, and target journaling event can be represented wait mention to maximum probability
Take the log event in log.Wherein, target journaling event can be one or more, the quantity of target journaling event compared to
The quantity for the log event for including in sample log will substantially reduce.
Step 103, the determining target journaling template with target journaling event matches, log in preset log template set
It include at least one log template in template set.
Further, after determining target journaling event, at least one log mould that preset log template set includes
In plate, the target journaling template of selection and target journaling event matches.Wherein, target journaling template can be to target journaling event
Carry out correctly contents extraction.The corresponding target journaling template of one target journaling event, while may have multiple target days
Will event corresponds to the same log template, therefore the quantity of target journaling template is less than or equal to the quantity of target journaling event.
The log event in log to be extracted, corresponding target journaling template can be represented to maximum probability due to target journaling event
It can be adapted to maximum probability log to be extracted.
Step 104, contents extraction is carried out to log to be extracted according to target journaling template.
Finally, according to the target journaling template determined in step 103, successively to each log event in log to be extracted
Carry out contents extraction.If target journaling template only one (can be understood as log to be extracted at this time be row text structure it is consistent
Log, the type of each of these log event is identical), then each log event in log to be extracted, all uses mesh
It marks log template and carries out contents extraction.If target journaling template is multiple, then for each log thing in log to be extracted
Part successively determines the matching degree of the log event Yu multiple target journaling templates, and according to the maximum target journaling mould of matching degree
Plate carries out contents extraction to the log event.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it
Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template
Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted
Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer
Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable
Use range.
Fig. 2 is the flow chart for the method that another log shown according to an exemplary embodiment is extracted, as shown in Fig. 2,
Step 102 can be realized by following steps:
Step 1021, it for each log event, determines in the log event and multiple log events except the log event
Except each log event diversity factor, and according in the log event and multiple log events in addition to the log event
The diversity factor of each log event determines the difference characteristic value of the log event.
For example, the difference characteristic value that each log event uniqueness is able to reflect in sample log can be first determined,
Wherein, difference characteristic value can be according to each log thing in the log event and multiple log events in addition to the log event
The diversity factor of part determines.For example, can by the log event with it is each in addition to the log event in multiple log events
The diversity factor of log event is summed, and using summed result as the diversity factor characteristic value of the log event.Two log events
Diversity factor may include Difference of content CS, difference in length degree CL, format differences degree CLAt least one of.So log thing
The diversity factor of each log event in part and multiple log events in addition to the log event can sum for a variety of diversity factoies,
Such as: CS+CL+CL。
Wherein, CSIt, can be according to preset character for reflecting difference size of two log events on content of text
String matching algorithm compares the character in two log events successively to obtain, it can be understood as will be in the text of two log events
Hold and successively matched according to preset sequence (such as from left to right), so that it is determined that of the content of text of two log events
With value, to characterize the similarity degree of two log events, matching value is bigger, corresponding CSValue it is lower, matching value is smaller, corresponding
CSValue it is higher.Wherein, string matching algorithm for example may is that KMP (English: Knuth-Morris-Pratt
Algorithm) algorithm, BF (English: Brute Force) algorithm or Horspool algorithm etc..
CLFor reflecting difference size of two log events on text size, in the present embodiment, CLIt may include two
Part, a part are that the content of text of two log events includes the diversity factor C of number of charactersL1, illustratively, CL1Meter can be passed through
Calculating the number of characters that two log events include asks absolute value of the difference to obtain, for example, the number of characters difference that two log events include
It, can be using 30-25=5 as the C of two log events for 25 and 30L1Value.Another part is that two log events are pressed
After being divided into multiple character strings according to spcial character collection (such as: space, ", ", "-", " _ ", "/", tab etc.), character string includes
Number of characters diversity factor CL2, illustratively, CL2It can be obtained by a under type: by two log events according to spcial character
Collection is divided, and N number of character string and M character string are obtained, will be in the first character string and M character string in N number of character string
First character string be compared, if comprising number of characters it is identical, comparison result is denoted as 0, if comprising number of characters
Comparison result is denoted as 1 by difference, successively more N number of character string and M character string.The value that finally comparison result is summed
As CL2Value.It should be noted that if N and M are unequal, the character string that the character string having more and length are 0 can be carried out
Compare, comparison result is 1.
For example, log event A are as follows: " account: 11642205800/ trade date: 2017-06-19 ", log event B are as follows:
" IP:209.160.24.63/ date: 2018-05-21 ".First two log events can be divided according to "/", then day
Will event A is divided into two character strings: " account: 11642205800 ", the number of characters for including is 14, " trade date: 2017-
06-19 ", the number of characters for including are 15.Log event B is also divided into two character strings: " IP:209.160.24.63 " includes
Number of characters be 16, " date: 2018-05-21 ", the number of characters for including be 13.So successively compare log event A and log thing
Character string in part B, " account: 11642205800 " different from the number of characters that " IP:209.160.24.63 " includes, comparison result
It is 1, " trade date: 2017-06-19 " is different from the number of characters that " date: 2018-05-21 " includes, comparison result 1, then
The log event A and corresponding C of log event BL2It sums for two comparison results: 2.
In this way, obtaining CL1And CL2Afterwards, by CL1And CL2Value after being added, as CL。
CMFor reflecting difference size of two log events on text formatting, such as can be first by two log events
Respectively with preset format (time format, " [] " data format, " () " data format, JSON object, JSON array, XML lattice
Formula etc.) it is matched, the matching value for characterizing matching degree is obtained, further according to of two log events and preset format
The C of two log events is determined with valueM, for example, the matching value of two log events and the format of JSON object is respectively
80% and 60%, then two matching values can be asked to absolute value of the difference, i.e., 20% is used as CMValue.
Further, when calculating the diversity factor of two log events, C can be calculated separatelyS、CLAnd CLCorresponding number
Value, since the calculation of different diversity factoies is different, correspondingly, the range of the corresponding numerical value of every species diversity degree obtained may also
Therefore difference is determining CS、CLAnd CLIt later, can be first respectively to CS、CLAnd CLIt is weighted normalization, finally to three kinds
Diversity factor summation, to obtain the diversity factor of two log events.
It should be noted that difference characteristic value is an opposite concept, a log event is described compared to sample
Unique degree of other log events in this log, to illustrate in sample log comprising 20 log events, if wherein a log
Event is type-A, and remaining 19 log events are B type, then each log event in a log event and 19 log events
Diversity factor it is very big, then the corresponding difference characteristic value of a log event is also very high, and b log event in 19 log events, with
Remaining 18 log events are much like, only larger with a log event difference, then the corresponding difference characteristic value of b log event compared with
It is small.
Step 1022, according to the difference characteristic value of each log event, the corresponding event extraction parameter of sample log is determined.
Step 1023, according to event extraction parameter, target journaling event is extracted from multiple log events.
It is exemplary, the difference characteristic value of each log event can be summed, and summed result is normalized, it will be through
Normalized summed result is crossed as the corresponding event extraction parameter of sample log, decimation in time parameter is bigger, can indicate sample
The type for the log event for including in this log is more, needs the target journaling event of selection also more accordingly.
Wherein, in step 1023 extract target journaling event implementation may is that using event extraction parameter as with
The random coefficient (can be understood as operator) of machine selection algorithm extracts target from multiple log events by stochastic selection algorithm
Log event.For example, event extraction parameter is 7, then can be in sample day if in sample log including 100 log events
Every 7 extractions, one log event as target journaling event in will, or using 7 as Pseudo-Random Number calculation
Son generates a pseudo-random sequence, and target journaling event is extracted in sample log according to pseudo-random sequence.
Optionally, the specific implementation of step 103 can be with are as follows:
Step a) is directed to each target journaling event, determines each log in the target journaling event and log template set
The matching degree of template.
Step b) is using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
For example, for each target journaling event, the target journaling event and log template set can successively be calculated
In each log template matching degree.Later, using the maximum log template of matching degree as with the target journaling event matches
Target journaling template, the maximum log template of matching degree can be understood as being best suited for carrying out content to target journaling event mentioning
The log template taken.
The method for calculating the matching degree of target journaling event and each log template, can be according to the log template to mesh
Mark log event and carry out contents extraction, determine matching degree according to the number of characters successfully extracted, matching degree with successfully extract
Number of characters it is directly proportional, i.e., the number of characters successfully extracted is more, and matching degree is higher, and the number of characters successfully extracted is fewer,
It is lower with spending.
It should be noted that the corresponding log template of each target journaling event can be determined according to matching degree, and target
The quantity of log event should be greater than or equal to target journaling template quantity.If each target journaling event corresponding one different
Target journaling template, then the quantity of target journaling event is equal to the quantity of target journaling template, if depositing in target journaling event
The same target journaling template is corresponded at least two target journaling events, then the quantity of target journaling event is greater than target day
The quantity of will template.
In a kind of realization scene, target journaling template may include it is multiple, to log to be extracted in corresponding step 104
The implementation for carrying out contents extraction may include two kinds:
The first implementation: for each log event for including in log to be extracted, the log event and more is determined
The matching degree of each target journaling template in a target journaling template, and according to the maximum target journaling template of matching degree to the day
Will event carries out contents extraction.
Second of implementation: for each log event for including in log to be extracted, the log event and more is determined
The matching degree of each target journaling template in a target journaling template, if the maximum value in matching degree is greater than or equal to matching degree threshold
Value then carries out contents extraction to the log event according to the maximum target journaling template of matching degree.
10 target journaling templates to be determined in step 103, the first log event is any day in log to be extracted
For will event.The matching degree for successively determining the first log event and 10 target journaling templates obtains corresponding 10 matchings
Degree.Wherein, the method for calculating the matching degree of the first log event and each target journaling template, can be according to the target journaling
Template carries out contents extraction to the first log event, and matching degree is determined according to the number of characters successfully extracted, is successfully extracted
Number of characters it is more, indicate that the target journaling template and the matching degree of the first log event are higher, the number of characters successfully extracted
It is fewer, indicate that the target journaling template is lower with the matching degree of the first log event.
In the first implementation, directly the first log event is carried out according to the maximum target journaling template of matching degree
Contents extraction.In the second implementation, first the maximum value in matching degree can be compared with matching degree threshold value, if
With matching degree threshold value is less than to maximum value in degree, then indicating each mesh in the first log event and 10 target journaling templates
The matching degree for marking log template is not high, and it is lower to carry out contents extraction accuracy to the first log event with target journaling template.
The matching degree that can so determine each log template in the first log and log template set again, according in log template set
Contents extraction is carried out to the first log event with maximum log template is spent, is further ensured that the accuracy of contents extraction.If
It is greater than or equal to matching degree threshold value with the maximum value in degree, then indicating that the first log event is looked in 10 target journaling templates
Most matched target journaling template is arrived, then carrying out content to the log event according to the maximum target journaling template of matching degree
It extracts.
Wherein, matching degree threshold value can be determination based on experience value, can also be determined according to target journaling template.Example
Such as, 10 target journaling templates have been determined according to 15 target journaling events in step 103.Further, it can also record
Each target journaling event matching degree with 10 target journaling templates respectively in 15 target journaling events.At step 104,
The matching degree for first determining the first log event and 10 target journaling templates, obtains corresponding 10 matching degrees, wherein matching degree
Maximum target journaling template is first object log template.So by 15 target journaling events and first object log template
Matching degree in minimum value as matching degree threshold value.Matching degree threshold value can be understood as in multiple target journaling events with first
Minimum value in the matching degree of target journaling template.
For example, contents extraction is carried out to the first log event in log to be extracted, the first log event is for example are as follows:
[195.160.24.63-- 05/Jan/2015:18:22:16-0800] " GET/product.screen? productId=WC-
SH-A02&JSESSIONID=SD0SL6FF7ADFF4953 HTTP 1.1 " 200 3878 " http: //
www.google.com""Mozilla/5.0(Windows NT 6.1;WOW64)AppleWebKit/536.5(KHTML,like
Gecko) Chrome/19.0.1084.46 Safari/536.5 " 349, wherein containing multiple fields.It is corresponding, target journaling
Template includes template X:agent field, auth field, ident field, referrer field, bytes field, response word
Section, clientip field, rawrequest field, timestamp field and template Y:timestamp field ,@version word
Section, clientip field, user's id field, flowing water id field.Template X and template Y and the first log event are so determined respectively
Matching degree, such as the format for the field for including in template X and template Y can successively be compared with the first log event, really
The matching degree of solid plate X is maximum, then carrying out contents extraction to the first log event according to template X, obtains the following contents:
Agent: " Mozilla/5.0 (Windows NT 6.1;WOW64)AppleWebKit/536.5(KHTML,like
Gecko)Chrome/19.0.1084.46 Safari/536.5"
Auth:-
Ident:-
Referrer: " http://www.google.com "
Bytes:3878
Response:200
Clientip:195.160.24.63
Rawrequest: " GET/product.screen? productId=WC-SH-A02&JSESSIONID=
SD0SL6FF7ADFF4953 HTTP 1.1"
Timestamp:05/Jan/2015:18:22:16-0800
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it
Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template
Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted
Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer
Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable
Use range.
Fig. 3 is the block diagram for the device that a kind of log shown according to an exemplary embodiment is extracted, as shown in figure 3, the dress
Setting 200 includes:
Sample determining module 201, for determining sample log from log to be extracted, sample log includes multiple log things
Part.
Abstraction module 202, for extracting target journaling event from multiple log events.
Template determining module 203, for the target with target journaling event matches determining in preset log template set
Log template includes at least one log template in log template set.
Extraction module 204, for carrying out contents extraction to log to be extracted according to target journaling template.
Fig. 4 is the block diagram for the device that another log shown according to an exemplary embodiment is extracted, and abstraction module 202 wraps
It includes:
It determines submodule 2021, for being directed to each log event, determines and removed in the log event and multiple log events
The diversity factor of each log event except the log event, and according to removing the log in the log event and multiple log events
The diversity factor of each log event except event determines the difference characteristic value of the log event.
Wherein, diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
It determines submodule 2021, is also used to the difference characteristic value according to each log event, determine that sample log is corresponding
Event extraction parameter.
Submodule 2022 is extracted, for extracting target journaling event from multiple log events according to event extraction parameter.
Optionally, submodule 2022 is extracted to be used for:
Using event extraction parameter as the random coefficient of stochastic selection algorithm, by stochastic selection algorithm from multiple log things
Target journaling event is extracted in part.
In another embodiment, template determining module 203 is for executing following steps:
Step a) is directed to each target journaling event, determines each log in the target journaling event and log template set
The matching degree of template.
Step b) is using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
It is multiple scenes for target journaling template, extraction module 204 is for executing following steps:
For each log event for including in log to be extracted, determine in the log event and multiple target journaling templates
The matching degree of each target journaling template, and content is carried out to the log event according to matching degree maximum target journaling template and is mentioned
It takes.
Or extraction module 204 is for executing following steps:
For each log event for including in log to be extracted, determine in the log event and multiple target journaling templates
The matching degree of each target journaling template, if the maximum value in matching degree is greater than or equal to matching degree threshold value, according to matching degree
Maximum target journaling template carries out contents extraction to the log event.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it
Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template
Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted
Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer
Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable
Use range.
Fig. 5 is the block diagram of a kind of electronic equipment 300 shown according to an exemplary embodiment.As shown in figure 5, the electronics is set
Standby 300 may include: processor 301, memory 302.The electronic equipment 300 can also include multimedia component 303, input/
Export one or more of (I/O) interface 304 and communication component 305.
Wherein, processor 301 is used to control the integrated operation of the electronic equipment 300, to complete above-mentioned log extraction
All or part of the steps in method.Memory 302 is for storing various types of data to support in the electronic equipment 300
Operation, these data for example may include the instruction of any application or method for operating on the electronic equipment 300,
And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory
302 can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random is deposited
Access to memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory
(Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable
Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory
(Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as
ROM), magnetic memory, flash memory, disk or CD.Multimedia component 303 may include screen and audio component.Wherein
Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include
One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage
Device 302 is sent by communication component 305.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O
Interface 304 provides interface between processor 301 and other interface modules, other above-mentioned interface modules can be keyboard, mouse,
Button etc..These buttons can be virtual push button or entity button.Communication component 305 is for the electronic equipment 300 and other
Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field
Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication
Component 305 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 300 can be by one or more application specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part realization, the method extracted for executing above-mentioned log.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should
The step of method that above-mentioned log is extracted is realized when program instruction is executed by processor.For example, the computer-readable storage medium
Matter can be the above-mentioned memory 302 including program instruction, and above procedure instruction can be held by the processor 301 of electronic equipment 300
Row is to complete the method that above-mentioned log is extracted.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it
Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template
Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted
Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer
Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable
Use range.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case where shield, can be combined in any appropriate way, in order to avoid unnecessary repetition, the disclosure to it is various can
No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought equally should be considered as disclosure disclosure of that.
Claims (10)
1. a kind of method that log is extracted, which is characterized in that the described method includes:
Sample log is determined from log to be extracted, the sample log includes multiple log events;
Target journaling event is extracted from multiple log events;
The determining target journaling template with the target journaling event matches, the log template in preset log template set
Concentrating includes at least one log template;
Contents extraction is carried out to the log to be extracted according to the target journaling template.
2. the method according to claim 1, wherein described extract target journaling from multiple log events
Event, comprising:
For each log event, determine in the log event and multiple log events in addition to the log event
The diversity factor of each log event, and according to the log event with it is every in addition to the log event in multiple log events
The diversity factor of one log event determines the difference characteristic value of the log event;
According to the difference characteristic value of each log event, the corresponding event extraction parameter of the sample log is determined;
According to the event extraction parameter, the target journaling event is extracted from multiple log events.
3. according to the method described in claim 2, it is characterized in that, the diversity factor include Difference of content, difference in length degree,
At least one of format differences degree.
4. according to the method described in claim 2, it is characterized in that, described according to the event extraction parameter, from multiple described
The target journaling event is extracted in log event, comprising:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from multiple institutes
It states and extracts the target journaling event in log event.
5. the method according to claim 1, wherein the determining and mesh in preset log template set
Mark the matched target journaling template of log event, comprising:
For each target journaling event, each log template in the target journaling event and the log template set is determined
Matching degree;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
6. the method according to claim 1, wherein the target journaling template is multiple;It is described according to described
Target journaling template carries out contents extraction to the log to be extracted, comprising:
For each log event for including in the log to be extracted, the log event and multiple target journaling moulds are determined
The matching degree of each target journaling template in plate, and according to the maximum target journaling template of matching degree to the log thing
Part carries out contents extraction.
7. the method according to claim 1, wherein the target journaling template is multiple;It is described according to described
Target journaling template carries out contents extraction to the log to be extracted, comprising:
For each log event for including in the log to be extracted, the log event and multiple target journaling moulds are determined
The matching degree of each target journaling template in plate is pressed if the maximum value in matching degree is greater than or equal to matching degree threshold value
Contents extraction is carried out to the log event according to the matching degree maximum target journaling template.
8. the device that a kind of log is extracted, which is characterized in that described device includes:
Sample determining module, for determining sample log from log to be extracted, the sample log includes multiple log events;
Abstraction module, for extracting target journaling event from multiple log events;
Template determining module, for the target journaling with the target journaling event matches determining in preset log template set
Template includes at least one log template in the log template set;
Extraction module, for carrying out contents extraction to the log to be extracted according to the target journaling template.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of any one of claim 1-7 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-7
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910544248.7A CN110321410B (en) | 2019-06-21 | 2019-06-21 | Log extraction method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910544248.7A CN110321410B (en) | 2019-06-21 | 2019-06-21 | Log extraction method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110321410A true CN110321410A (en) | 2019-10-11 |
CN110321410B CN110321410B (en) | 2021-08-06 |
Family
ID=68120028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910544248.7A Active CN110321410B (en) | 2019-06-21 | 2019-06-21 | Log extraction method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321410B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046012A (en) * | 2019-12-02 | 2020-04-21 | 东软集团股份有限公司 | Inspection log extraction method and device, storage medium and electronic equipment |
CN111813849A (en) * | 2020-09-14 | 2020-10-23 | 杭州数梦工场科技有限公司 | Data extraction method, device and equipment and storage medium |
CN112463772A (en) * | 2021-02-02 | 2021-03-09 | 北京信安世纪科技股份有限公司 | Log processing method and device, log server and storage medium |
CN112882900A (en) * | 2021-02-26 | 2021-06-01 | 山东浪潮通软信息科技有限公司 | Method and device for recording service data change log |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040236984A1 (en) * | 2003-05-20 | 2004-11-25 | Yasuo Yamasaki | Data backup method in a network storage system |
US20070239799A1 (en) * | 2006-03-29 | 2007-10-11 | Anirudh Modi | Analyzing log files |
CN101625703A (en) * | 2009-08-21 | 2010-01-13 | 华中科技大学 | Method and system for merging logs of memory database |
CN102984161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Identification method and device for reliable website |
CN103414758A (en) * | 2013-07-19 | 2013-11-27 | 北京奇虎科技有限公司 | Method and device for processing logs |
CN105049287A (en) * | 2015-07-28 | 2015-11-11 | 小米科技有限责任公司 | Log processing method and log processing devices |
CN109510721A (en) * | 2018-11-01 | 2019-03-22 | 郑州云海信息技术有限公司 | A kind of network log management method and system based on Syslog |
-
2019
- 2019-06-21 CN CN201910544248.7A patent/CN110321410B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040236984A1 (en) * | 2003-05-20 | 2004-11-25 | Yasuo Yamasaki | Data backup method in a network storage system |
US20070239799A1 (en) * | 2006-03-29 | 2007-10-11 | Anirudh Modi | Analyzing log files |
CN101625703A (en) * | 2009-08-21 | 2010-01-13 | 华中科技大学 | Method and system for merging logs of memory database |
CN102984161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Identification method and device for reliable website |
CN103414758A (en) * | 2013-07-19 | 2013-11-27 | 北京奇虎科技有限公司 | Method and device for processing logs |
CN105049287A (en) * | 2015-07-28 | 2015-11-11 | 小米科技有限责任公司 | Log processing method and log processing devices |
CN109510721A (en) * | 2018-11-01 | 2019-03-22 | 郑州云海信息技术有限公司 | A kind of network log management method and system based on Syslog |
Non-Patent Citations (3)
Title |
---|
DIEGO CALVANESE 等: "Ontology-driven extraction of event logs from relational databases", 《BUSINESS PROCESS MANAGEMENT》 * |
崔元 等: "基于大规模网络日志的模板提取研究", 《计算机科学》 * |
顾兆军 等: "多源日志聚合分析方法", 《计算机工程与设计》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046012A (en) * | 2019-12-02 | 2020-04-21 | 东软集团股份有限公司 | Inspection log extraction method and device, storage medium and electronic equipment |
CN111046012B (en) * | 2019-12-02 | 2023-09-26 | 东软集团股份有限公司 | Method and device for extracting inspection log, storage medium and electronic equipment |
CN111813849A (en) * | 2020-09-14 | 2020-10-23 | 杭州数梦工场科技有限公司 | Data extraction method, device and equipment and storage medium |
CN112463772A (en) * | 2021-02-02 | 2021-03-09 | 北京信安世纪科技股份有限公司 | Log processing method and device, log server and storage medium |
CN112463772B (en) * | 2021-02-02 | 2022-05-27 | 北京信安世纪科技股份有限公司 | Log processing method and device, log server and storage medium |
CN112882900A (en) * | 2021-02-26 | 2021-06-01 | 山东浪潮通软信息科技有限公司 | Method and device for recording service data change log |
CN112882900B (en) * | 2021-02-26 | 2022-11-29 | 浪潮通用软件有限公司 | Method and device for recording service data change log |
Also Published As
Publication number | Publication date |
---|---|
CN110321410B (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321410A (en) | Method, apparatus, storage medium and the electronic equipment that log is extracted | |
WO2023124204A1 (en) | Anti-fraud risk assessment method and apparatus, training method and apparatus, and readable storage medium | |
CN110245274A (en) | A kind of label temperature calculates method, apparatus, electronic equipment and storage medium | |
CN108091333A (en) | Sound control method and Related product | |
CN108469955B (en) | Annotation-based Android injection framework implementation method | |
CN111813409A (en) | Code generation method, device, equipment and storage medium of interactive interface | |
WO2017000743A1 (en) | Method and device for software recommendation | |
CN108664392B (en) | A kind of application testing method, device, terminal and storage medium | |
CN113255365A (en) | Text data enhancement method, device and equipment and computer readable storage medium | |
CN108733557A (en) | A kind of test point generation method and device | |
CN108255976A (en) | The method, apparatus and storage medium and electronic equipment of data sorting | |
US9591014B2 (en) | Capturing correlations between activity and non-activity attributes using N-grams | |
CN113536770A (en) | Text analysis method, device and equipment based on artificial intelligence and storage medium | |
CN105550250B (en) | A kind of processing method and processing device of access log | |
CN110427277B (en) | Data verification method, device, equipment and storage medium | |
CN115187060B (en) | Land use data processing method and device, storage medium and electronic equipment | |
CN104142885B (en) | A kind of method and apparatus for carrying out abnormality test to tested program | |
CN117033309A (en) | Data conversion method and device, electronic equipment and readable storage medium | |
CN112784552B (en) | Table editing method and apparatus | |
CN115994839A (en) | Prediction method, device, equipment and medium for answer accuracy | |
Brandouy et al. | Estimating the algorithmic complexity of stock markets | |
CN109815118A (en) | Data base management method and device, electronic equipment and computer readable storage medium | |
CN109635287A (en) | Method, apparatus, computer equipment and the storage medium of policy dynamics analysis | |
WO2022141793A1 (en) | Method and apparatus for building durian tracing model, and durian tracing method | |
CN115774784A (en) | Text object identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |