Nothing Special   »   [go: up one dir, main page]

WO2023039973A1 - 异常误报的处理方法及装置、存储介质、终端 - Google Patents

异常误报的处理方法及装置、存储介质、终端 Download PDF

Info

Publication number
WO2023039973A1
WO2023039973A1 PCT/CN2021/124046 CN2021124046W WO2023039973A1 WO 2023039973 A1 WO2023039973 A1 WO 2023039973A1 CN 2021124046 W CN2021124046 W CN 2021124046W WO 2023039973 A1 WO2023039973 A1 WO 2023039973A1
Authority
WO
WIPO (PCT)
Prior art keywords
demand
classification
information
theme
keywords
Prior art date
Application number
PCT/CN2021/124046
Other languages
English (en)
French (fr)
Inventor
殷钱安
梁淑云
余贤喆
王启凡
陶景龙
刘胜
马影
周晓勇
魏国富
夏玉明
Original Assignee
上海观安信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海观安信息技术股份有限公司 filed Critical 上海观安信息技术股份有限公司
Publication of WO2023039973A1 publication Critical patent/WO2023039973A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0781Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present invention relates to the technical field of the Internet, in particular to a processing method and device, a storage medium, and a terminal for abnormal false alarms.
  • the present invention provides a method and device, a storage medium, and a terminal for processing abnormality and false alarms, the main purpose of which is to solve the existing problem that the abnormality information cannot be processed effectively and accurately.
  • a method for processing abnormal false positives including:
  • the demand operation theme mapping relationship is used to represent the theme between different demand theme classification keywords and different operation theme categories Classification relationship
  • the topic classification relationship is based on the combination relationship between different demand topic classifications, different demand topic classification keywords, different operation topic classifications, and different operation topic classification keywords respectively. matching determined;
  • the first similarity between the operation information and the operation theme classification keywords of the operation theme classification is greater than a first similarity threshold, then determine that the alarm information generated based on the operation information is a false alarm event, and delete the alarm information .
  • the method further includes:
  • the subject classification keywords of historical business demand information and historical operation information are analyzed respectively, and the demand operation subject mapping relationship is constructed based on the parsed demand subject classification keywords and operation subject classification keywords.
  • theme classification keywords for historical business demand information and historical operation information respectively, and constructing a demand operation theme mapping relationship based on the parsed demand theme classification keywords and operation theme classification keywords includes:
  • Segment the historical business demand information and historical operation information respectively to obtain the sequence of word segmentation for demand and the sequence of word segmentation for operation;
  • subject classification and subject classification keyword extraction are performed on the demand word segmentation sequence and the operation word segmentation sequence to obtain demand subject classification keywords and operation theme classification keywords, and the subject classification includes demand subject classification, operation theme Classification, the number of subject classifications of the demand subject classification keywords is the same as that of the operation subject classification keywords;
  • the determination of the operation theme classification corresponding to the demand theme classification keywords of the business demand information based on the demand operation theme mapping relationship includes:
  • An operation subject category corresponding to the requirement subject category is searched based on the requirement operation subject mapping relationship.
  • the method further includes:
  • the method further includes:
  • the requirement operation subject mapping relationship is updated based on the business requirement information.
  • the method further includes:
  • a device for processing abnormal false alarms including:
  • An acquisition module configured to acquire business requirement information and operation information of operation objects to be alerted
  • a determining module configured to determine an operation topic classification corresponding to a demand topic classification keyword of the business requirement information based on a demand operation topic mapping relationship, where the demand operation topic mapping relationship is used to represent different demand topic classification keywords and different operation topics
  • the theme classification relationship between categories, the theme classification relationship is based on the combined relationship between different demand theme classifications, different demand theme classification keywords and different operation theme classifications, different operation theme classification keywords. Determined by matching with the classification of the subject of the operation;
  • a deletion module configured to determine that generating alarm information based on the operation information is a false alarm event if the first similarity between the operation information and the operation topic classification keywords of the operation topic classification is greater than a first similarity threshold, Delete the warning information.
  • the device also includes:
  • the construction module is used to analyze the subject classification keywords of the historical business demand information and the historical operation information respectively, and construct the demand operation theme mapping relationship based on the parsed demand theme classification keywords and operation theme classification keywords.
  • building blocks include:
  • the word segmentation unit is used to segment the historical business demand information and the historical operation information respectively to obtain the demand word segmentation sequence and the operation word segmentation sequence;
  • the extraction unit is used to perform theme classification and theme classification keyword extraction on the demand word segmentation sequence and the operation word segmentation sequence according to the theme classification model, to obtain demand theme classification keywords and operation theme classification keywords, and the theme classification includes demand Subject classification, operation theme classification, the number of theme categories of the demand theme classification keywords and the operation theme classification keywords is the same;
  • An establishment unit configured to establish a demand operation theme mapping relationship between the demand theme classification, the demand theme classification keywords, the operation theme classification, and the operation theme classification keywords based on the demand operation matching library, the
  • the demand operation matching database stores the combined relationship between different demand theme categories and keywords of different demand theme categories received and updated at preset time intervals, and different operation theme categories and different operation theme category keywords.
  • the determination module includes:
  • a calculation unit configured to calculate the second similarity between the demand word segmentation sequence of the business demand information and each demand topic classification keyword in the demand operation topic mapping relationship;
  • a selecting unit configured to select a demand subject classification matched by a demand subject classification keyword corresponding to the largest second similarity among the second similarities
  • a searching unit configured to search for an operation topic category corresponding to the requirement topic category based on the requirement operation topic mapping relationship.
  • the device also includes:
  • a word segmentation module configured to use a text segmentation algorithm to perform text segmentation on the business requirement information, and perform text segmentation on the operation information based on a preset delimiter, to obtain a demand segmentation sequence and an operation segmentation sequence respectively;
  • a division module configured to perform theme division on the demand word segmentation sequence and the operation word segmentation sequence in combination with the theme classification model and information measurement indicators, and obtain demand theme classification keywords matching the demand theme classification and operations matching the operation theme classification Subject classification keywords.
  • the device also includes:
  • An update module configured to update the demand operation topic mapping relationship based on the business demand information if the number of differences between the demand subject classification keywords in the business demand information and the historical business demand information exceeds a preset difference threshold.
  • the device also includes:
  • the statistical module is used for counting the identity information, false positive time, false positive times, and business demand information corresponding to the false positive event of the operation object to be alarmed as the false positive event, and generating and outputting abnormal false positive warning information.
  • a storage medium wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the above-mentioned method for handling abnormal false alarms.
  • a terminal including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus ;
  • the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the above-mentioned method for handling abnormality and false alarms.
  • the present invention provides a processing method and device, a storage medium, and a terminal for abnormal false alarms.
  • the embodiment of the present invention acquires business demand information and operation information of operation objects to be alerted; operates topic mapping based on demand The relationship determines the operation theme classification corresponding to the demand theme classification keyword of the business demand information, and the demand operation theme mapping relationship is used to represent the theme classification relationship between different demand theme classification keywords and different operation theme categories; If the first similarity between the operation information and the operation topic classification keywords of the operation topic classification is greater than the first similarity threshold, it is determined that the alarm information generated based on the operation information is a false alarm event, and the alarm information is deleted, greatly The labor cost is reduced, the misjudgment and misjudgment of abnormality caused by human judgment are avoided, and the recognition accuracy of false positives of abnormalities is greatly increased, thereby improving the processing efficiency of false positives of abnormalities.
  • FIG. 1 shows a flow chart of a method for processing abnormal false alarms provided by an embodiment of the present invention
  • FIG. 2 shows a flow chart of another method for processing abnormal false alarms provided by an embodiment of the present invention
  • FIG. 3 shows a flow chart of another method for processing abnormal false alarms provided by an embodiment of the present invention
  • Fig. 4 shows a composition block diagram of a device for processing abnormal false positives provided by an embodiment of the present invention
  • FIG. 5 shows a schematic structural diagram of a terminal provided by an embodiment of the present invention.
  • the embodiment of the present invention provides a method for processing abnormal false alarms, as shown in Figure 1, the method includes:
  • the business requirement information is the requirement content for detecting network security in network security, including but not limited to behavior, file object, specified program code, etc., and the obtained business requirement information exists in the form of a business requirement document, which is not included in the embodiment of the present invention. Specific limits.
  • the operation object to be alerted is the operation subject that has been determined to be abnormal, including but not limited to the operator id, ip address, device number, etc., so that the operation object to be alerted is the target of the abnormal alarm, and the operation information is the operation information to be alerted
  • the operation object is determined as the operation content of the abnormal alarm target, including but not limited to operation behavior, operation content, etc., and the obtained operation information exists in the form of operation log, which is not specifically limited in the embodiment of the present invention.
  • the business requirement information in step 101 is the recent business requirement information, that is, the business requirement information whose time interval is shortened relative to the historical business information and is close to the current time. The information is compared, and the same business requirement information is determined as the business requirement information in step 101.
  • the demand subject classification keywords are keywords classified into the words representing each demand content in the business demand information, wherein, the business demand information can be classified according to different demand subjects to obtain different demand subject classifications, including But not limited to system file classification, behavior classification, etc., so that each classification is represented by key words.
  • the operation information can be classified according to the operation theme to obtain different operation theme categories, including but not limited to ip category, id category, etc., which are not specifically limited in this embodiment of the present invention. Since the demand operation theme mapping relationship is used to represent the theme classification relationship between different demand theme classification keywords and different operation theme categories, the matching operation can be found from the demand operation theme mapping relationship according to the demand theme classification keywords Subject categories.
  • a requirement theme category can correspond to multiple requirement theme category keywords
  • an operation theme category can correspond to multiple operation theme category keywords.
  • the operation theme mapping relationship not only includes the corresponding relationship between demand theme classification keywords and demand theme classification, operation theme classification keywords and operation theme classification, but also includes demand theme classification and operation theme classification, demand theme classification keywords and operation The corresponding relationship between topic classification, operation topic classification keywords and demand topic classification. Therefore, after determining the demand topic classification keywords, you can find the corresponding operation topic classification based on the demand operation topic mapping relationship, and then demand topic classification keywords It can correspond to multiple operation theme classification keywords under one operation theme classification.
  • the operation information after determining the operation theme category that matches the keyword of the demand theme category, since one operation theme category contains at least one operation theme category keyword, by calculating the difference between the operation information and the operation theme category keyword
  • the first similarity between the two is compared with the first similarity threshold to determine whether it is a false positive event.
  • the operation information since the operation information exists in the form of an operation log, when calculating the first similarity with the keywords of the operation topic classification, the operation information can be word-segmented, so as to complete the similarity calculation. If the calculated first similarity with multiple operation topic classification keywords is greater than the first similarity threshold, it means that the operation information is similar to the operation topic classification keywords, and the abnormal alarm information generated for this operation information is a false positive. , therefore, the alarm information generated by the operation information is determined as a false positive event, and the alarm information is deleted.
  • the method before the acquisition of the business demand information and the operation information of the operation object to be alerted, the method further includes: performing subject classification keywords on the historical business demand information and historical operation information respectively Analyze, and build a demand operation theme mapping relationship based on the parsed demand theme classification keywords and operation theme classification keywords.
  • topic classification is determined according to historical business requirements and historical operation information, including demand topic classification and operation topic classification, so as to obtain each topic
  • the subject classification keywords corresponding to the classification and construct the demand operation subject mapping relationship.
  • the requirement theme classification is used to represent the theme classification of different requirements in the business demand information
  • the operation theme classification is used to represent the theme classification of different operations summarized in the operation information.
  • each theme classification can be determined by keywords, so that based on Requirement theme classification keywords, demand theme classification, operation theme classification keywords, and operation theme classification establish a demand operation theme mapping relationship.
  • the historical business demand information and historical operation information are respectively analyzed by subject classification keywords, and based on the analyzed demand subject classification keywords and operation
  • the topic classification keywords constructing demand operation topic mapping relationship includes: 201. Segmenting historical business demand information and historical operation information respectively to obtain a demand segmentation sequence and an operation segmentation sequence; 202. Segmenting the demand segmentation sequence, Subject classification and subject classification keyword extraction are performed on the operation word segmentation sequence to obtain demand subject classification keywords and operation theme classification keywords; 203. Based on the demand operation matching database, establish the demand subject classification and the demand subject classification keywords The mapping relationship between the required operation theme and the operation theme category and the operation theme category keyword.
  • word segmentation is performed according to the extracted historical business requirement information and historical operation information, and the sequence of requirement word segmentation and the sequence of operation word segmentation are respectively obtained. Since the information in the business requirement document is text content, and the information in the operation document is various strings, Therefore, the business requirement information in the business requirement document is segmented according to the text, and the operation information in the operation document is segmented according to the space delimiter, so as to obtain list1 and the operation segment sequence list2 in step 201 as the sequence of segmented segments of the requirement, the present invention Examples are not specifically limited.
  • the subject classification is performed on the demand word segmentation sequence and the operation word segmentation sequence based on the topic classification model, and the keywords of different subject classifications are extracted.
  • the categories include demand theme categories and operation theme categories, and the number of theme categories of the demand theme category keywords is the same as that of the operation theme category keywords.
  • the topic classification model is an unsupervised Bayesian model LDA (Latent Dirichlet Allocation).
  • the demand segmentation sequence and the operation segmentation sequence are divided into topics, and each topic classification is obtained, and each topic classification corresponds to keywords, based on topic classification including requirement topic classification and operation topic classification, respectively obtain demand topic classification keyword topword1 and operation topic classification keyword topword2, which are not specifically limited in this embodiment of the present invention.
  • the demand operation matching database stores different demand subject classifications received and updated according to preset time intervals
  • keywords of different demand subject classifications are respectively associated with different operation subject categories and different operation subject categories
  • the combination relationship between keywords therefore, after determining the keywords of demand topic classification, operation topic classification keywords, demand topic classification, and operation topic classification, based on the demand operation matching database, different demand topic classifications and different demand topic classification keywords Establish demand operation theme mapping relationships with different operation theme categories and combination relationships between different operation theme category keywords.
  • the combination relationship stored in the demand operation matching library is the combination relationship entered or updated by technicians according to the preset time interval, and the combination relationship covered in the demand operation matching library is far larger than that in the historical business demand information and historical operation information.
  • the scope of the demand operation topic mapping relationship is established between the extracted topic classification and topic classification keywords. Therefore, when the business requirement information changes, the demand operation topic mapping relationship can be reconstructed based on the original demand operation matching, without To update the requirement operation matching database, the requirement operation matching database is only updated after the technician enters a new combination relationship, which is not specifically limited in the embodiment of the present invention.
  • the determination of the operation subject classification corresponding to the demand subject classification keywords of the business demand information based on the demand operation theme mapping relationship includes: calculating the demand word segmentation of the business demand information The second similarity between each demand theme classification keyword in the mapping relationship between the sequence and the demand operation theme; select the demand theme classification matched by the demand theme classification keyword corresponding to the largest second similarity in the second similarity ; Searching for an operation theme category corresponding to the requirement theme category based on the requirement operation theme mapping relationship.
  • the obtained second similarity is multiple similarity values, so as to determine the demand theme classification keyword corresponding to the largest similarity value, and select the demand theme classification key here Matching requirements topic taxonomy.
  • the demand operation theme mapping relationship includes the relationship between different demand theme classifications and different operation theme classifications, after determining the demand theme classification based on the maximum similarity value, find the corresponding demand theme classification from the demand operation theme mapping relationship
  • the operation theme classification is used to calculate the similarity between each operation theme classification keyword in the operation theme classification and the operation information, so as to determine whether it is a false alarm event.
  • the calculation of the similarity in the embodiment of the present invention is calculated by numerical data. Therefore, before calculating the similarity between the required word segmentation sequence and the required subject classification keywords, the words need to be converted into word vectors, such as through word2vec
  • the model converts word vectors to calculate the similarity based on numerical values, which is not specifically limited in this embodiment of the present invention.
  • the method before determining the operation theme classification corresponding to the demand theme classification keywords of the business demand information based on the demand operation theme mapping relationship, the method It also includes: 301. Using a text segmentation algorithm to perform text segmentation on the business requirement information, and performing text segmentation on the operation information based on a preset delimiter to obtain a demand segmentation sequence and an operation segmentation sequence respectively; 302. Combining with the topic classification model And the information measurement index performs theme division on the demand word segmentation sequence and the operation word segmentation sequence to obtain demand theme classification keywords matching the demand theme classification and operation theme classification keywords matching the operation theme classification.
  • the business requirement information in the business requirement document is text content
  • the text segmentation algorithm is used to perform text segmentation on the business requirement information
  • the text segmentation algorithm can be a natural language processing technology The jieba (C++) tool in , so as to obtain the required word segmentation sequence.
  • the operation information in the operation document is in the form of a character string, that is, the text segmentation of the operation information is performed through a preset delimiter, so as to obtain an operation segmentation sequence.
  • the optimal number of topic categories is selected to obtain N demand topic categories, M operation topic categories, and corresponding topic category keywords for each topic category, such as demand topic category keyword topword1 and operation topic category keyword topword2.
  • the information measurement index is the measurement index determined by the perplexity method, which is used to measure the quality of a probability distribution or probability model prediction sample, that is, the classification quality of the topic classification model in the embodiment of the present invention, based on a setting It is determined by the index, which is not specifically limited in this embodiment of the present invention.
  • the method before acquiring the business requirement information and the operation information of the operation object to be alerted, the method further includes: If the number of keyword differences exceeds the preset difference threshold, the mapping relationship of the demand operation topic is updated based on the business demand information.
  • the subject mapping relationship is not specifically limited in this embodiment of the present invention.
  • the method further includes: statistically determining that the alarm information is a false alarm event
  • the identity information of the operation object to be alerted, the time of false alarm, the number of false alarms, and the business requirement information corresponding to the false alarm event generate and output abnormal false alarm alarm information.
  • the identity information, false positive time, false positive times, and the relationship between the false positive event and the false positive operation object to be alarmed are counted.
  • Events correspond to business requirement information, and generate and output abnormal false alarm warning information.
  • the identity information is the identity information identified based on the operator id, ip address, device number and other information, such as name, etc.
  • the false alarm time is the generated The time of the alarm information, this time is determined by the time timer for identifying network security in the current execution terminal, and the number of false alarms is counted according to the preset time interval, such as one week, three days, etc. , which is not specifically limited in this embodiment of the present invention.
  • the abnormal and false alarm information is generated and output in combination with the business demand information corresponding to the false positive events.
  • the abnormal false alarm warning information is used to represent the status of false positives, so that technicians can judge the complete detection rules of the network, so as to optimize the detection efficiency of network security.
  • the embodiment of the present invention provides a method for processing abnormal false alarms.
  • the embodiment of the present invention acquires business demand information and operation information of operation objects to be alerted;
  • the operation topic classification corresponding to the demand topic classification keyword of the business demand information, and the demand operation topic mapping relationship is used to represent the topic classification relationship between different demand topic classification keywords and different operation topic classifications; if the operation information is related to the If the first similarity between the operation theme classification keywords of the operation theme classification is greater than the first similarity threshold, it is determined that the alarm information generated based on the operation information is a false alarm event, and the alarm information is deleted, which greatly reduces labor costs.
  • the misjudgment and misjudgment of abnormalities caused by human judgment are avoided, and the recognition accuracy of false positives of abnormalities is greatly increased, thereby improving the processing efficiency of false positives of abnormalities.
  • an embodiment of the present invention provides a device for processing abnormal false alarms.
  • the device includes:
  • An acquisition module 41 configured to acquire business requirement information and operation information of operation objects to be alerted
  • the determination module 42 is configured to determine the operation theme classification corresponding to the demand theme classification keywords of the business demand information based on the demand operation theme mapping relationship, and the demand operation theme mapping relationship is used to represent different demand theme classification keywords and different operations Subject taxonomy relationships between subject categories;
  • a deletion module 43 configured to determine that generating alarm information based on the operation information is a false alarm event if the first similarity between the operation information and the operation topic classification keywords of the operation topic classification is greater than a first similarity threshold , to delete the alarm information.
  • the device also includes:
  • the construction module is used to analyze the subject classification keywords of the historical business demand information and the historical operation information respectively, and construct the demand operation theme mapping relationship based on the parsed demand theme classification keywords and operation theme classification keywords.
  • building blocks include:
  • the word segmentation unit is used to segment the historical business demand information and the historical operation information respectively to obtain the demand word segmentation sequence and the operation word segmentation sequence;
  • the extraction unit is used to perform theme classification and theme classification keyword extraction on the demand word segmentation sequence and the operation word segmentation sequence according to the theme classification model, to obtain demand theme classification keywords and operation theme classification keywords, and the theme classification includes demand Subject classification, operation theme classification, the number of theme classifications of the demand theme classification keywords and the operation theme classification keywords is the same;
  • An establishment unit configured to establish a demand operation theme mapping relationship between the demand theme classification, the demand theme classification keywords, the operation theme classification, and the operation theme classification keywords based on the demand operation matching library, the
  • the demand operation matching database stores the combined relationship between different demand theme categories and keywords of different demand theme categories received and updated at preset time intervals, and different operation theme categories and different operation theme category keywords.
  • the determination module includes:
  • a calculation unit configured to calculate the second similarity between the demand word segmentation sequence of the business demand information and each demand topic classification keyword in the demand operation topic mapping relationship;
  • a selecting unit configured to select a demand subject classification matched by a demand subject classification keyword corresponding to the largest second similarity among the second similarities
  • a searching unit configured to search for an operation topic category corresponding to the requirement topic category based on the requirement operation topic mapping relationship.
  • the device also includes:
  • a word segmentation module configured to use a text segmentation algorithm to perform text segmentation on the business requirement information, and perform text segmentation on the operation information based on a preset delimiter, to obtain a demand segmentation sequence and an operation segmentation sequence respectively;
  • a division module configured to perform theme division on the demand word segmentation sequence and the operation word segmentation sequence in combination with the theme classification model and information measurement indicators, and obtain demand theme classification keywords matching the demand theme classification and operations matching the operation theme classification Subject classification keywords.
  • the device also includes:
  • An update module configured to update the demand operation topic mapping relationship based on the business demand information if the number of differences between the demand subject classification keywords in the business demand information and the historical business demand information exceeds a preset difference threshold.
  • the device also includes:
  • the statistical module is used for counting the identity information, false positive time, false positive times, and business demand information corresponding to the false positive event of the operation object to be alarmed as the false positive event, and generating and outputting abnormal false positive warning information.
  • the embodiment of the present invention provides a device for processing abnormal and false alarms.
  • the embodiment of the present invention obtains business demand information and operation information of operation objects to be alerted;
  • the operation topic classification corresponding to the demand topic classification keyword of the business demand information, and the demand operation topic mapping relationship is used to represent the topic classification relationship between different demand topic classification keywords and different operation topic classifications; if the operation information is related to the If the first similarity between the operation theme classification keywords of the operation theme classification is greater than the first similarity threshold, it is determined that the alarm information generated based on the operation information is a false alarm event, and the alarm information is deleted, which greatly reduces labor costs.
  • the misjudgment and misjudgment of abnormalities caused by human judgment are avoided, and the recognition accuracy of false positives of abnormalities is greatly increased, thereby improving the processing efficiency of false positives of abnormalities.
  • a storage medium stores at least one executable instruction, and the computer-executable instruction can execute the method for processing abnormal false alarms in any of the above method embodiments.
  • FIG. 5 shows a schematic structural diagram of a terminal provided according to an embodiment of the present invention.
  • the specific embodiment of the present invention does not limit the specific implementation of the terminal.
  • the terminal may include: a processor (processor) 502, a communication interface (Communications Interface) 504, a memory (memory) 506, and a communication bus 508.
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 502 , the communication interface 504 , and the memory 506 communicate with each other through the communication bus 508 .
  • the communication interface 504 is configured to communicate with network elements of other devices such as clients or other servers.
  • the processor 502 is configured to execute the program 510, and may specifically execute the relevant steps in the above embodiment of the method for processing an abnormality false alarm.
  • the program 510 may include program codes including computer operation instructions.
  • the processor 502 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present invention.
  • the one or more processors included in the terminal may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 506 is used for storing the program 510 .
  • the memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the program 510 can specifically be used to make the processor 502 perform the following operations:
  • the demand operation theme mapping relationship is used to represent the theme between different demand theme classification keywords and different operation theme categories Classification relationship
  • the topic classification relationship is based on the combination relationship between different demand topic classifications, different demand topic classification keywords, different operation topic classifications, and different operation topic classification keywords respectively. matching determined;
  • the first similarity between the operation information and the operation theme classification keywords of the operation theme classification is greater than a first similarity threshold, then determine that the alarm information generated based on the operation information is a false alarm event, and delete the alarm information .
  • each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here
  • the steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation.
  • the present invention is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种异常误报的处理方法,包括:获取业务需求信息以及待告警操作对象的操作信息(101);基于需求操作主题映射关系确定与业务需求信息的需求主题分类关键词对应的操作主题分类(102),需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系;若操作信息与操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于操作信息生成的告警信息为误报事件,删除所述告警信息(103)。还公开了异常误报的处理装置、存储介质和终端,解决了现有技术无法有效、准确地对异常信息进行处理的问题。

Description

异常误报的处理方法及装置、存储介质、终端 技术领域
本发明涉及一种互联网技术领域,特别是涉及一种异常误报的处理方法及装置、存储介质、终端。
背景技术
随着互联网技术的快速发展,数据安全问题越来越突出,为了避免系统或终端设备发生数据泄露,造成用户财产损失,数据安全防护变得越来越重要。在实际应用中,通常基于行为安全监控系统对系统或设备行为进行安全检测,从而实现安全防护。在此过程中,当工作内容出现变动时,譬如业务变更引起数据处理人员非工作时段进行大量操作行为,会存在行为安全检测误报的情况,为了保证安全监控系统的防护效果以及降低防护的误报率,通常需要对安全检测到的异常信息进行误报判断,以减少误报的情况。
目前,通常采用人工或白名单规则匹配方式对安全检测到的异常进行误报的判断,但是,由于异常信息以及产生异常信息的目标数量均较大,业务需求比较复杂,数据变动工作类型并不单一,无论是采用白名单匹配方式还是人工方式,均需要不断根据业务类型、需求内容进行实时变动以进行人工维护,使得消耗大量的人力成本,并容易发生人为判断错误,造成误报判断速度较慢,以及处理不准确,从而使得无法有效、准确地对异常误报进行处理。
发明内容
有鉴于此,本发明提供一种异常误报的处理方法及装置、存储介质、终端,主要目的在于解决现有无法有效、准确地对异常信息进行处理的问题。
依据本发明一个方面,提供了一种异常误报的处理方法,包括:
获取业务需求信息以及待告警操作对象的操作信息;
基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,所述主题分类关系为基于不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系对需求主题分类关键词与操作主题分类进行匹配确定的;
若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
进一步地,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:
分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系。
进一步地,所述分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系包括:
分别对历史业务需求信息、以及历史操作信息进行分词,得到需求分词序列、操作分词序列;
根据主题分类模型对所述需求分词序列、所述操作分词序列进行主题分类以及主题分类关键词提取,得到需求主题分类关键词以及操作主题分类关键词,所述主题分类包括需求主题分类、操作主题分类,所述需求主题分类关键词与所述操作主题分类关键词的主题分类个数相同;
基于需求操作匹配库,建立所述需求主题分类、所述需求主题分类关键词与所述操作主题分类、所述操作主题分类关键词之间的需求操作主题映射关系,所述需求操作匹配库中存储有按照预设时间间隔接收并更新的不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、 不同操作主题分类关键词之间的组合关系。
进一步地,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类包括:
计算所述业务需求信息的需求分词序列与所述需求操作主题映射关系中各需求主题分类关键词之间的第二相似度;
选取所述第二相似度中最大第二相似度对应的需求主题分类关键词所匹配的需求主题分类;
基于所述需求操作主题映射关系查找与所述需求主题分类对应的操作主题分类。
进一步地,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类之前,所述方法还包括:
利用文本分词算法对所述业务需求信息进行文本分词,并基于预设分隔符对所述操作信息进行文本分词,分别得到需求分词序列、操作分词序列;
结合主题分类模型以及信息测量指标对所述需求分词序列、以及所述操作分词序列进行主题划分,得到匹配所述需求主题分类的需求主题分类关键词以及匹配操作主题分类的操作主题分类关键词。
进一步地,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:
若所述业务需求信息与历史业务需求信息中需求主题分类关键词差异个数超过预设差异阈值,则基于所述业务需求信息对所述需求操作主题映射关系进行更新。
进一步地,所述确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息之后,所述方法还包括:
统计确定为误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与所述误报事件对应的业务需求信息,生成并输出异常误报示警信息。
依据本发明另一个方面,提供了一种异常误报的处理装置,包括:
获取模块,用于获取业务需求信息以及待告警操作对象的操作信息;
确定模块,用于基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,所述主题分类关系为基于不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系对需求主题分类关键词与操作主题分类进行匹配确定的;
删除模块,用于若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
进一步地,所述装置还包括:
构建模块,用于分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系。
进一步地,所述构建模块包括:
分词单元,用于分别对历史业务需求信息、以及历史操作信息进行分词,得到需求分词序列、操作分词序列;
提取单元,用于根据主题分类模型对所述需求分词序列、所述操作分词序列进行主题分类以及主题分类关键词提取,得到需求主题分类关键词以及操作主题分类关键词,所述主题分类包括需求主题分类、操作主题分类,所述需求主题分类关键词与所述操作主题分类关键词的主题分类个数相同;
建立单元,用于基于需求操作匹配库,建立所述需求主题分类、所述需求主题分类关键词与所述操作主题分类、所述操作主题分类关键词之间的需求操作主题映射关系,所述需求操作匹配库中存储有按照预设时间间隔接收并更新的不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系。
进一步地,所述确定模块包括:
计算单元,用于计算所述业务需求信息的需求分词序列与所述需求操作主题映射关系中各需求主题分类关键词之间的第二相似度;
选取单元,用于选取所述第二相似度中最大第二相似度对应的需求主题分类关键词所匹配的需求主题分类;
查找单元,用于基于所述需求操作主题映射关系查找与所述需求主题分类对应的操作主题分类。
进一步地,所述装置还包括:
分词模块,用于利用文本分词算法对所述业务需求信息进行文本分词,并基于预设分隔符对所述操作信息进行文本分词,分别得到需求分词序列、操作分词序列;
划分模块,用于结合主题分类模型以及信息测量指标对所述需求分词序列、以及所述操作分词序列进行主题划分,得到匹配所述需求主题分类的需求主题分类关键词以及匹配操作主题分类的操作主题分类关键词。
进一步地,所述装置还包括:
更新模块,用于若所述业务需求信息与历史业务需求信息中需求主题分类关键词差异个数超过预设差异阈值,则基于所述业务需求信息对所述需求操作主题映射关系进行更新。
进一步地,所述装置还包括:
统计模块,用于统计确定为误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与所述误报事件对应的业务需求信息,生成并输出异常误报示警信息。
根据本发明的又一方面,提供了一种存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如上述异常误报的处理方法对应的操作。
根据本发明的再一方面,提供了一种终端,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述异常误报的处理方法对应的操作。
借由上述技术方案,本发明实施例提供的技术方案至少具有下列优点:
本发明提供了一种异常误报的处理方法及装置、存储介质、终端,与 现有技术相比,本发明实施例通过获取业务需求信息以及待告警操作对象的操作信息;基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系;若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息,大大减少了人力成本,避免了因人为判断而产生的异常误判失误情况,极大地增加了对异常误报识别准确性,从而提高了对异常误报的处理效率。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本发明实施例提供的一种异常误报的处理方法流程图;
图2示出了本发明实施例提供的另一种异常误报的处理方法流程图;
图3示出了本发明实施例提供的又一种异常误报的处理方法流程图;
图4示出了本发明实施例提供的一种异常误报的处理装置组成框图;
图5示出了本发明实施例提供的一种终端的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更 透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本发明实施例提供了一种异常误报的处理方法,如图1所示,该方法包括:
101、获取业务需求信息以及待告警操作对象的操作信息。
本发明实施例中,针对网络安全的实时检测,会根据不同的安全检测方式对出现异常情况的信息生成告警信息,从而使得对异常情况进行清除。其中,业务需求信息为网络安全中对网络安全进行检测的需求内容,包括但不限于行为、文件对象、指定程序代码等,获取的业务需求信息以业务需求文档形式存在,本发明实施例不做具体限定。待告警操作对象即为已经确定为异常的操作主体,包括但不限于操作人员id、ip地址、设备号等,从而将待告警操作对象作为异常告警的目标,进而的,操作信息即为待告警操作对象被确定为异常告警目标的操作内容,包括但不限于操作行为、操作内容等,获取的操作信息以操作日志形式存在,本发明实施例不做具体限定。
需要说明的是,步骤101中的业务需求信息为近期业务需求信息,即相对于历史业务信息缩短时间间隔且临近当前时间的业务需求信息,在执行步骤101时,可以将历史业务信息与近期业务信息进行比较,确定出相同的业务需求信息作为步骤101中的业务需求信息。
102、基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类。
本发明实施例中,需求主题分类关键词为业务需求信息中表示各个需求内容的词语所分类别的关键词语,其中,业务需求信息可以按照不同的需求主题进行分类得到不同的需求主题分类,包括但不限于系统文件分类、行为分类等,从而使得每个分类中通过关键词语进行表示。同样的,操作信息可以按照操作主题进行分类得到不同的操作主题分类,包括但不限于ip分类、id分类等,本发明实施例不做具体限定。由于所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,因此,可以依据需求主题分类关键词从需求操作主题映射关 系中查找到匹配的操作主题分类。
需要说明的是,由于一个分类中可以通过多个关键词表示,对应的,一个需求主题分类可以对应多个需求主题分类关键词,一个操作主题分类可以对应多个操作主题分类关键词,而需求操作主题映射关系中,不仅仅包含有需求主题分类关键词与需求主题分类、操作主题分类关键词与操作主题分类的对应关系,还包括需求主题分类与操作主题分类、需求主题分类关键词与操作主题分类、操作主题分类关键词与需求主题分类之间的对应关系,因此,可以在确定需求主题分类关键词后,基于需求操作主题映射关系查找到对应的操作主题分类,进而需求主题分类关键词可以对应一个操作主题分类下的多个操作主题分类关键词。
103、若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
本发明实施例中,确定出与需求主题分类关键词匹配的操作主题分类之后,由于一个操作主题分类中包含有至少一个操作主题分类关键词,因此,通过计算操作信息与操作主题分类关键词之间的第一相似度,与第一相似阈值进行比较,确定是否为误报事件。其中,由于操作信息为操作日志形式存在,因此,在计算与操作主题分类关键词的第一相似度时,可以对操作信息进行分词处理,从而完成相似度计算。若计算出与多个操作主题分类关键词之间的第一相似度大于第一相似阈值,则说明操作信息与操作主题分类关键词相似,对此操作信息生成的异常的告警信息属于误报情况,因此,将因此操作信息生成告警信息确定为误报事件,删除此告警信息。
在本发明实施例中,为了进一步说明及限定,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系。
为了实现对误报情况的判断,结合历史业务需求、历史操作信息进行主题分类关键词解析,即根据历史业务需求、历史操作信息确定主题分类, 包括需求主题分类、操作主题分类,从而得到各主题分类对应的主题分类关键词,构建需求操作主题映射关系。其中,需求主题分类用于表示业务需求信息中不同需求的主题分类,操作主题分类用于表示操作信息汇总不同操作的主题分类,进而的,每个主题分类中可以通过关键词进行确定,从而基于需求主题分类关键词、需求主题分类、操作主题分类关键词、操作主题分类建立需求操作主题映射关系。
在本发明实施例中,为了进一步说明及限定,如图2所示,所述分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系包括:201、分别对历史业务需求信息、以及历史操作信息进行分词,得到需求分词序列、操作分词序列;202、根据主题分类模型对所述需求分词序列、所述操作分词序列进行主题分类以及主题分类关键词提取,得到需求主题分类关键词以及操作主题分类关键词;203、基于需求操作匹配库,建立所述需求主题分类、所述需求主题分类关键词与所述操作主题分类、所述操作主题分类关键词之间的需求操作主题映射关系。
具体的,根据提取到的历史业务需求信息、历史操作信息进行分词,分别得到需求分词序列、操作分词系列,由于业务需求文档中的信息为文本内容,操作文档中的信息为各种字符串,因此,对于业务需求文档中的业务需求信息按照文本进行分词,对于操作文档中的操作信息按照空格分隔符进行分词,从而分别得到步骤201中作为需求分词序列的list1以及操作分词系列list2,本发明实施例不做具体限定。另外,步骤202中,为了准确确定出主题分类、以及不同主题分类的关键词,基于主题分类模型对需求分词序列、操作分词序列进行主题分类,以及提取到不同主体分类的关键词,所述主题分类包括需求主题分类、操作主题分类,所述需求主题分类关键词与所述操作主题分类关键词的主题分类个数相同。其中,主题分类模型为一种无监督的贝叶斯模型LDA(Latent Dirichlet Allocation),通过完成训练的LDA模型对需求分词序列、操作分词序列进行主题划分,得到各个主题分类,以及各个主题分类对应的关键词,基于主题分类包括需求主题分类、操作主题分类,分别得到需求主题分类关 键词topword1,以及操作主题分类关键词topword2,本发明实施例不做具体限定。
需要说明的是,本发明实施例中,由于需求操作匹配库中存储有按照预设时间间隔接收并更新的不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系,因此,在确定需求主题分类关键词、操作主题分类关键词、需求主题分类、操作主题分类后,基于需求操作匹配库中不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系建立需求操作主题映射关系。其中,需求操作匹配库中存储的组合关系为技术人员按照预设时间间隔进行录入或更新的组合关系,并且需求操作匹配库中涵盖的组合关系范围远远大于历史业务需求信息、历史操作信息中提取的主题分类以及主题分类关键词的之间建立需求操作主题映射关系的范围,因此,当业务需求信息发生变化时,可以基于原有需求操作匹配可以重新构建需求操作主题映射关系即可,无需更新需求操作匹配库,仅仅在技术人员录入新的组合关系后,更新需求操作匹配库,本发明实施例不做具体限定。
在本发明实施例中,为了进一步说明及限定,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类包括:计算所述业务需求信息的需求分词序列与所述需求操作主题映射关系中各需求主题分类关键词之间的第二相似度;选取所述第二相似度中最大第二相似度对应的需求主题分类关键词所匹配的需求主题分类;基于所述需求操作主题映射关系查找与所述需求主题分类对应的操作主题分类。
为了提高异常误报的识别准确性,首先计算业务需求信息的需求分词序列与需求操作主题映射关系中各需求主题分类关键词之间的相似度,即第二相似度,由于是与多个需求主题分类关键词之间计算的,因此得到的第二相似度为多个相似度值,从而确定相似度值中最大的一个所对应的需求主题分类关键词,并选取此需求主题分类关键此所匹配的需求主题分类。由于需求操作主题映射关系中包括不同需求主题分类与不同操作主题分类 之间的关系,因此,基于最大的相似度值确定需求主题分类后,从需求操作主题映射关系中查找到需求主题分类对应的操作主题分类,以便再根据操作主题分类中的各个操作主题分类关键词与操作信息进行计算相似度,从而确定是否为误报事件。
需要说明的是,本发明实施例中对于相似度的计算均通过数值数据计算的,因此,在需求分词序列与需求主题分类关键词计算相似度之前,需要将词语转换为词向量,如通过word2vec模型转换词向量,从而以数值基础进行相似度的计算,本发明实施例不做具体限定。
在本发明实施例中,为了进一步说明及限定,如图3所示,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类之前,所述方法还包括:301、利用文本分词算法对所述业务需求信息进行文本分词,并基于预设分隔符对所述操作信息进行文本分词,分别得到需求分词序列、操作分词序列;302、结合主题分类模型以及信息测量指标对所述需求分词序列、以及所述操作分词序列进行主题划分,得到匹配所述需求主题分类的需求主题分类关键词以及匹配操作主题分类的操作主题分类关键词。
为了高效地基于需求操作主题映射关系进行匹配,具体的,由于业务需求文档中的业务需求信息为文本内容,即通过文本分词算法对业务需求信息进行文本分词,文本分词算法可以为自然语言处理技术中的jieba(C++)工具,从而得到需求分词序列。并且,由于操作文档中的操作信息为字符串形式内容,即通过预设分隔符对操作信息进行文本分词,从而得到操作分词序列。另外,由于文本分词后需要对需求分词序列、以及操作分词序列进行主题划分,因此,在进行主题分类模型进行主题分类的基础上,结合信息测量指标选取最佳个数的主题分类数,从而得到N个需求主题分类、M个操作主题分类,以及与之对应的每个主题分类的主题分类关键词,如需求主题分类关键词topword1、操作主题分类关键词topword2。其中,信息测量指标为困惑度perplexity方法确定的测量指标,用来度量一个概率分布或概率模型预测样本的好坏程度,即本发明实施例中主题分类模型的分类好坏程度,基于一个设定的指标来确定,本发明实施例不做具 体限定。
在本发明实施例中,为了进一步说明及限定,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:若所述业务需求信息与历史业务需求信息中需求主题分类关键词差异个数超过预设差异阈值,则基于所述业务需求信息对所述需求操作主题映射关系进行更新。
为了更好的对异常误报的识别,从而提高误报识别的准确性,统计业务需求信息与历史业务需求信息中需求主题分类关键词的差异个数,即将业务需求信息中的需求主题分类关键词与历史业务需求信息中需求主题分类关键词进行一一对比,若差异个数超过预设差异阈值,则说明当前业务需求信息与历史业务信息差异较大,根据历史业务信息构建的需求操作主题映射关系已经不适用于当前业务需求信息所对应异常误报的识别,因此,基于业务需求信息对需求操作主题映射关系进行更新。具体的,提示业务需求信息中的需求主题分类以及需求主题分类关键词,并结合需求操作匹配库中的组合关系,构建与历史操作信息的操作主题分类、操作主要分类关键词之间的需求操作主题映射关系,本发明实施例不做具体限定。
在本发明实施例中,为了进一步说明及限定,其特征在于,所述确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息之后,所述方法还包括:统计确定为误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与所述误报事件对应的业务需求信息,生成并输出异常误报示警信息。
为了提高误报处理的有效性,在确定基于操作信息生成告警信息为误报事件后,统计出现误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与产生误报事件多对应的业务需求信息,生成并输出异常误报示警信息。其中,由于待告警操作对象为已经确定为异常的操作主体,因此,身份信息即为基于操作人员id、ip地址、设备号等信息识别出的身份信息,例如姓名等,误报时间即为生成告警信息的时间,此时间为当前执行端中对网络安全进行识别的时间计时器确定的,误报次数为按照预先设定的时间间隔,如一周、3天等,统计产生误报时间的次数,本发明实施例不做具体限定。并且,为了实现异常误报的显示效果,结合误报事件对 应的业务需求信息生成异常误报示警信息,并进行输出。其中,异常误报示警信息用于表征误报出现的状态,从而使技术人员对网络完全的检测规则进行判断,以优化网络安全的检测效率。
本发明实施例提供了一种异常误报的处理方法,与现有技术相比,本发明实施例通过获取业务需求信息以及待告警操作对象的操作信息;基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系;若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息,大大减少了人力成本,避免了因人为判断而产生的异常误判失误情况,极大地增加了对异常误报识别准确性,从而提高了对异常误报的处理效率。
进一步的,作为对上述图1所示方法的实现,本发明实施例提供了一种异常误报的处理装置,如图4所示,该装置包括:
获取模块41,用于获取业务需求信息以及待告警操作对象的操作信息;
确定模块42,用于基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系;
删除模块43,用于若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
进一步地,所述装置还包括:
构建模块,用于分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系。
进一步地,所述构建模块包括:
分词单元,用于分别对历史业务需求信息、以及历史操作信息进行分词,得到需求分词序列、操作分词序列;
提取单元,用于根据主题分类模型对所述需求分词序列、所述操作分词序列进行主题分类以及主题分类关键词提取,得到需求主题分类关键词以及操作主题分类关键词,所述主题分类包括需求主题分类、操作主题分类,所述需求主题分类关键词与所述操作主题分类关键词的主题分类个数相同;
建立单元,用于基于需求操作匹配库,建立所述需求主题分类、所述需求主题分类关键词与所述操作主题分类、所述操作主题分类关键词之间的需求操作主题映射关系,所述需求操作匹配库中存储有按照预设时间间隔接收并更新的不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系。
进一步地,所述确定模块包括:
计算单元,用于计算所述业务需求信息的需求分词序列与所述需求操作主题映射关系中各需求主题分类关键词之间的第二相似度;
选取单元,用于选取所述第二相似度中最大第二相似度对应的需求主题分类关键词所匹配的需求主题分类;
查找单元,用于基于所述需求操作主题映射关系查找与所述需求主题分类对应的操作主题分类。
进一步地,所述装置还包括:
分词模块,用于利用文本分词算法对所述业务需求信息进行文本分词,并基于预设分隔符对所述操作信息进行文本分词,分别得到需求分词序列、操作分词序列;
划分模块,用于结合主题分类模型以及信息测量指标对所述需求分词序列、以及所述操作分词序列进行主题划分,得到匹配所述需求主题分类的需求主题分类关键词以及匹配操作主题分类的操作主题分类关键词。
进一步地,所述装置还包括:
更新模块,用于若所述业务需求信息与历史业务需求信息中需求主题分类关键词差异个数超过预设差异阈值,则基于所述业务需求信息对所述需求操作主题映射关系进行更新。
进一步地,所述装置还包括:
统计模块,用于统计确定为误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与所述误报事件对应的业务需求信息,生成并输出异常误报示警信息。
本发明实施例提供了一种异常误报的处理装置,与现有技术相比,本发明实施例通过获取业务需求信息以及待告警操作对象的操作信息;基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系;若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息,大大减少了人力成本,避免了因人为判断而产生的异常误判失误情况,极大地增加了对异常误报识别准确性,从而提高了对异常误报的处理效率。
根据本发明一个实施例提供了一种存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的异常误报的处理方法。
图5示出了根据本发明一个实施例提供的一种终端的结构示意图,本发明具体实施例并不对终端的具体实现做限定。
如图5所示,该终端可以包括:处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。
其中:处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。
通信接口504,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器502,用于执行程序510,具体可以执行上述异常误报的处理方法实施例中的相关步骤。
具体地,程序510可以包括程序代码,该程序代码包括计算机操作指令。
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本 发明实施例的一个或多个集成电路。终端包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器506,用于存放程序510。存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序510具体可以用于使得处理器502执行以下操作:
获取业务需求信息以及待告警操作对象的操作信息;
基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,所述主题分类关系为基于不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系对需求主题分类关键词与操作主题分类进行匹配确定的;
若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (10)

  1. 一种异常误报的处理方法,其特征在于,包括:
    获取业务需求信息以及待告警操作对象的操作信息;
    基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,所述主题分类关系为基于不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系对需求主题分类关键词与操作主题分类进行匹配确定的;
    若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
  2. 根据权利要求1所述的方法,其特征在于,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:
    分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系。
  3. 根据权利要求2所述的方法,其特征在于,所述分别对历史业务需求信息、历史操作信息进行主题分类关键词解析,并基于解析出的需求主题分类关键词与操作主题分类关键词构建需求操作主题映射关系包括:
    分别对历史业务需求信息、以及历史操作信息进行分词,得到需求分词序列、操作分词序列;
    根据主题分类模型对所述需求分词序列、所述操作分词序列进行主题分类以及主题分类关键词提取,得到需求主题分类关键词以及操作主题分类关键词,所述主题分类包括需求主题分类、操作主题分类,所述需求主题分类关键词与所述操作主题分类关键词的主题分类个数相同;
    基于需求操作匹配库,建立所述需求主题分类、所述需求主题分类关键词与所述操作主题分类、所述操作主题分类关键词之间的需求操作主题 映射关系,所述需求操作匹配库中存储有按照预设时间间隔接收并更新的不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系。
  4. 根据权利要求3所述的方法,其特征在于,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类包括:
    计算所述业务需求信息的需求分词序列与所述需求操作主题映射关系中各需求主题分类关键词之间的第二相似度;
    选取所述第二相似度中最大第二相似度对应的需求主题分类关键词所匹配的需求主题分类;
    基于所述需求操作主题映射关系查找与所述需求主题分类对应的操作主题分类。
  5. 根据权利要求3所述的方法,其特征在于,所述基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类之前,所述方法还包括:
    利用文本分词算法对所述业务需求信息进行文本分词,并基于预设分隔符对所述操作信息进行文本分词,分别得到需求分词序列、操作分词序列;
    结合主题分类模型以及信息测量指标对所述需求分词序列、以及所述操作分词序列进行主题划分,得到匹配所述需求主题分类的需求主题分类关键词以及匹配操作主题分类的操作主题分类关键词。
  6. 根据权利要求1所述的方法,其特征在于,所述获取业务需求信息以及待告警操作对象的操作信息之前,所述方法还包括:
    若所述业务需求信息与历史业务需求信息中需求主题分类关键词差异个数超过预设差异阈值,则基于所述业务需求信息对所述需求操作主题映射关系进行更新。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息之后,所述方法还包括:
    统计确定为误报事件的待告警操作对象的身份信息、误报时间、误报次数、以及与所述误报事件对应的业务需求信息,生成并输出异常误报示警信息。
  8. 一种异常误报的处理装置,其特征在于,包括:
    获取模块,用于获取业务需求信息以及待告警操作对象的操作信息;
    确定模块,用于基于需求操作主题映射关系确定与所述业务需求信息的需求主题分类关键词对应的操作主题分类,所述需求操作主题映射关系用于表征不同需求主题分类关键词与不同操作主题分类之间的主题分类关系,所述主题分类关系为基于不同需求主题分类、不同需求主题分类关键词分别与不同操作主题分类、不同操作主题分类关键词之间的组合关系对需求主题分类关键词与操作主题分类进行匹配确定的;
    删除模块,用于若所述操作信息与所述操作主题分类的操作主题分类关键词之间的第一相似度大于第一相似阈值,则确定基于所述操作信息生成告警信息为误报事件,删除所述告警信息。
  9. 一种存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如权利要求1-7中任一项所述的异常误报的处理方法对应的操作。
  10. 一种终端,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-7中任一项所述的异常误报的处理方法对应的操作。
PCT/CN2021/124046 2021-09-17 2021-10-15 异常误报的处理方法及装置、存储介质、终端 WO2023039973A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111089557.3A CN113535458B (zh) 2021-09-17 2021-09-17 异常误报的处理方法及装置、存储介质、终端
CN202111089557.3 2021-09-17

Publications (1)

Publication Number Publication Date
WO2023039973A1 true WO2023039973A1 (zh) 2023-03-23

Family

ID=78092747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124046 WO2023039973A1 (zh) 2021-09-17 2021-10-15 异常误报的处理方法及装置、存储介质、终端

Country Status (2)

Country Link
CN (1) CN113535458B (zh)
WO (1) WO2023039973A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661515B (zh) * 2022-05-23 2022-09-20 武汉四通信息服务有限公司 告警信息收敛方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145609A (zh) * 2018-09-06 2019-01-04 平安科技(深圳)有限公司 一种数据处理方法和装置
CN110086767A (zh) * 2019-03-11 2019-08-02 中国电子科技集团公司电子科学研究院 一种混合入侵检测系统及方法
US20200304540A1 (en) * 2019-03-22 2020-09-24 Proofpoint, Inc. Identifying Legitimate Websites to Remove False Positives from Domain Discovery Analysis
CN112612844A (zh) * 2020-12-18 2021-04-06 深圳前海微众银行股份有限公司 数据处理方法、装置、设备和存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053041B (zh) * 2017-12-13 2022-05-10 国网辽宁省电力有限公司电力科学研究院 变电站多关联场景下二次系统异常识别和防误系统及方法
CN109815697B (zh) * 2018-12-29 2021-04-27 360企业安全技术(珠海)有限公司 误报行为处理方法及装置
CN110362545A (zh) * 2019-05-27 2019-10-22 平安科技(深圳)有限公司 日志监控方法、装置、终端与计算机可读存储介质
CN110191124B (zh) * 2019-05-29 2022-02-22 安天科技集团股份有限公司 基于web前端开发数据的网站鉴别方法、装置及存储设备
CN110837874B (zh) * 2019-11-18 2023-05-26 上海新炬网络信息技术股份有限公司 基于时间序列分类的业务数据异常检测方法
CN111106959B (zh) * 2019-12-20 2022-10-14 贵州黔岸科技有限公司 用于运输管理系统的异常监控报警系统及方法
CN111459695B (zh) * 2020-03-12 2024-09-27 平安科技(深圳)有限公司 根因定位方法、装置、计算机设备和存储介质
CN111752811A (zh) * 2020-06-29 2020-10-09 平安普惠企业管理有限公司 异常告警信息处理方法、电子设备及存储介质
CN113076410A (zh) * 2021-04-22 2021-07-06 网银在线(北京)科技有限公司 异常信息处理方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145609A (zh) * 2018-09-06 2019-01-04 平安科技(深圳)有限公司 一种数据处理方法和装置
CN110086767A (zh) * 2019-03-11 2019-08-02 中国电子科技集团公司电子科学研究院 一种混合入侵检测系统及方法
US20200304540A1 (en) * 2019-03-22 2020-09-24 Proofpoint, Inc. Identifying Legitimate Websites to Remove False Positives from Domain Discovery Analysis
CN112612844A (zh) * 2020-12-18 2021-04-06 深圳前海微众银行股份有限公司 数据处理方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN113535458A (zh) 2021-10-22
CN113535458B (zh) 2021-12-28

Similar Documents

Publication Publication Date Title
CN111475804B (zh) 一种告警预测方法及系统
CN111158977B (zh) 一种异常事件根因定位方法及装置
CN113645232B (zh) 一种面向工业互联网的智能化流量监测方法、系统及存储介质
CN113254255B (zh) 一种云平台日志的分析方法、系统、设备及介质
CN111078513B (zh) 日志处理方法、装置、设备、存储介质及日志告警系统
CN113760891A (zh) 一种数据表的生成方法、装置、设备和存储介质
CN112487208A (zh) 一种网络安全数据关联分析方法、装置、设备及存储介质
CN117221087A (zh) 告警根因定位方法、装置及介质
CN111078512A (zh) 告警记录生成方法、装置、告警设备及存储介质
CN107111609A (zh) 用于神经语言行为识别系统的词法分析器
CN116841779A (zh) 异常日志检测方法、装置、电子设备和可读存储介质
WO2023039973A1 (zh) 异常误报的处理方法及装置、存储介质、终端
CN116132263A (zh) 告警解决方案推荐方法、装置、电子设备及存储介质
CN115955355A (zh) 一种攻击事件知识图谱的输出方法及装置
CN115913710A (zh) 异常检测方法、装置、设备及存储介质
CN113282920B (zh) 日志异常检测方法、装置、计算机设备和存储介质
CN113128213A (zh) 日志模板提取方法及装置
CN114528909A (zh) 一种基于流量日志特征提取的无监督异常检测方法
CN112613176A (zh) 慢sql语句预测方法及系统
CN113986657A (zh) 告警事件的处理方法及处理装置
CN117827952A (zh) 一种数据关联分析方法、装置、设备及介质
CN116362534A (zh) 铁路领域在线客服内容违规和风险的应急管理方法及系统
CN115062144A (zh) 一种基于知识库和集成学习的日志异常检测方法与系统
US11243937B2 (en) Log analysis apparatus, log analysis method, and log analysis program
CN118487872B (zh) 一种面向核电行业的网络异常行为检测分析方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21957240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21957240

Country of ref document: EP

Kind code of ref document: A1