CN111800404A

CN111800404A - Method and device for identifying malicious domain name and storage medium

Info

Publication number: CN111800404A
Application number: CN202010608070.0A
Authority: CN
Inventors: 张增锐; 孟翔
Original assignee: Sangfor Technologies Co Ltd
Current assignee: Sangfor Technologies Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-20
Anticipated expiration: 2040-06-29
Also published as: CN111800404B

Abstract

The application provides a method for identifying a malicious domain name, which comprises the steps of obtaining a domain name to be detected, carrying out virus flow matching on the domain name to be detected to obtain a first matching result, adopting a matching mode of at least one dimension to match the domain name to be detected to obtain a second matching result if the first matching result represents that the virus flow matching fails, wherein the at least one dimension comprises an attribute dimension, a rule dimension or a third party information dimension, marking the domain name to be detected based on the second matching result to obtain a final identification result, judging whether the domain name to be detected is the malicious domain name according to the attribute of the domain name to be detected when the second matching result represents that the matching succeeds, avoiding providing evidence-lifting information when the domain name to be detected belongs to a certain virus family, identifying the domain name to be detected by using multiple dimensions and multiple identification modes, and enabling the final identification result to be more reliable, the identification process is simplified, and the identification efficiency is improved.

Description

Method and device for identifying malicious domain name and storage medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for identifying a malicious domain name, and a storage medium.

Background

When the malicious network domain name is judged to be malicious by the engine, some provenance information may be lacked to confirm the domain name is the communication domain name of which virus family, and for threat reports, sometimes the provenance information is needed to be provided for more persuasion when information is detected, so that the simple judgment of whether the domain name is malicious or not can not meet the requirements of users. At present, a virus family of a domain name is mainly confirmed through communicated virus sample information, and an antivirus identification engine identifies characteristics in a sample, however, most of manufacturers use technologies which are biased to analyze a single domain name sample to obtain information flow to identify family information.

Content of application

The embodiment of the application expects to provide a method, a device and a storage medium for identifying a malicious domain name, which can avoid providing evidence-raising information when judging that the domain name to be detected belongs to a certain virus family, and identify the domain name to be detected by using a multi-dimensional and multi-identification mode, so that the final identification result is more reliable, the identification process is simplified, and the identification efficiency is improved.

The technical scheme of the application is realized as follows:

a method of identifying malicious domain names, comprising:

acquiring a domain name to be detected, and performing virus flow matching on the domain name to be detected to obtain a first matching result;

if the first matching result represents that the virus flow matching fails, matching the domain name to be detected in a matching mode of at least one dimension to obtain a second matching result; the at least one dimension comprises an attribute dimension, a rule dimension, or a third party information dimension;

and when the second matching result represents that the matching is successful, marking the domain name to be detected based on the second matching result to obtain a final identification result.

In the above scheme, the attribute dimension includes a type dimension, the rule dimension includes a feature dimension, and the third-party information dimension includes external threat information dimensions from different sources.

In the above scheme, when the at least one dimension is an attribute dimension, the domain name to be detected is matched in a matching manner of the at least one dimension to obtain a second matching result, and the method includes:

determining the current domain name type of the domain name to be detected;

if the current domain name type is a non-dynamic domain name, judging whether a parent domain name corresponding to the domain name to be detected is an identified malicious domain name or not, and obtaining the attribute matching result;

and if the attribute matching result is the identified malicious domain name, taking the identified malicious domain name as the second matching result.

In the above scheme, when the at least one dimension is a regular dimension, the domain name to be detected is matched in a matching manner of the at least one dimension to obtain a second matching result, and the method includes:

acquiring the current characteristics of the domain name to be detected; wherein the current features include: a current domain name characteristic and a current address information characteristic;

judging whether a preset malicious feature library hits the current features or not to obtain the feature matching result;

and if the characteristic matching result is characterized to hit any one of the current domain name characteristic and the current address information characteristic, taking the characteristic matching result as a second matching result.

In the above scheme, when the at least one dimension is a third-party information dimension, the domain name to be detected is matched in a matching manner of the at least one dimension to obtain a second matching result, and the method includes:

acquiring a plurality of external threat information aiming at different sources of the domain name to be detected; each external threat information represents the classification information of the domain name to be detected;

counting the plurality of external threat information to obtain the intelligence matching result of the domain name to be detected;

and using the intelligence matching result as a second matching result.

In the above scheme, the obtaining of the multiple pieces of external threat information from different sources for the domain name to be detected includes:

acquiring a plurality of sample information of different sources aiming at the domain name to be detected;

and performing classification information matching on the plurality of sample information to obtain the plurality of external threat information.

In the above scheme, the counting of the plurality of external threat information to obtain an intelligence matching result of the domain name to be detected includes:

disassembling the external threat information to obtain a plurality of disassembled classification information;

de-aliasing the plurality of disassembled classification information to obtain a plurality of family names;

counting the multiple family names to obtain a target family name with the maximum statistical probability;

and taking the target family name as an intelligence matching result of the domain name to be detected.

In the above scheme, when the at least one dimension includes an attribute dimension and a rule dimension, or an attribute dimension and a third-party information dimension, the domain name to be detected is matched in a matching manner of the at least one dimension to obtain a second matching result, the method includes:

and when the type dimension matching is carried out on the domain name to be detected by adopting an attribute dimension matching mode and the obtained attribute matching result represents that the matching is failed, the characteristic dimension matching is carried out on the domain name to be detected by adopting a regular dimension matching mode to obtain a characteristic matching result, the characteristic matching result is used as a second matching result, or the matching of external threat information dimensions of different sources is carried out on the domain name to be detected by adopting a third party information dimension matching mode to obtain an information matching result, and the information matching result is used as a second matching result.

In the above scheme, when the at least one dimension includes a rule dimension and a third party information dimension, the matching the domain name to be detected in a matching manner of the at least one dimension to obtain a second matching result includes:

and when the matching of the feature dimension is carried out on the domain name to be detected by adopting a matching mode of the regular dimension, and the obtained feature matching result represents that the matching fails, matching of external threat information dimensions of different sources is carried out on the domain name to be detected by adopting a matching mode of the third party information dimension to obtain an information matching result, and the information matching result is used as the second matching result.

In the above scheme, when the at least one dimension includes an attribute dimension, a rule dimension, and a third-party intelligence dimension, the domain name to be detected is matched in a matching manner of the at least one dimension to obtain a second matching result, and the method includes:

when attribute matching result representation matching failure is obtained by matching type dimensions of the domain name to be detected in an attribute dimension matching mode and characteristic matching result representation matching failure is obtained by matching characteristic dimensions of the domain name to be detected in a regular dimension matching mode, matching external threat information dimensions of different sources is carried out on the domain name to be detected in a third party information dimension matching mode to obtain an information matching result, and the information matching result is used as a second matching result; or the like, or, alternatively,

when attribute matching result representation matching failure is obtained by matching type dimensions of the domain name to be detected in an attribute dimension matching mode, and external threat information dimensions of different sources are matched on the domain name to be detected in a third party information dimension matching mode to obtain information matching result representation matching failure, matching feature dimensions of the domain name to be detected in a regular dimension matching mode to obtain a feature matching result, and taking the feature matching result as a second matching result; or the like, or, alternatively,

when the characteristic matching result representation matching obtained by matching the characteristic dimensions of the domain name to be detected in the regular dimension matching mode fails, and the matching of external threat information dimensions of different sources is performed on the domain name to be detected in the third party information dimension matching mode to obtain the information matching result representation matching failure, the domain name to be detected is matched in the domain name type dimension matching mode to obtain the attribute matching result, and the attribute matching result is used as a second matching result.

The embodiment of the present application further provides an apparatus for identifying a malicious domain name, where the apparatus includes:

the device comprises an acquisition unit, a comparison unit and a comparison unit, wherein the acquisition unit is used for acquiring a domain name to be detected and carrying out virus flow matching on the domain name to be detected to obtain a first matching result;

the matching unit is used for matching the domain name to be detected by adopting a matching mode of at least one dimension to obtain a second matching result if the first matching result represents that the virus flow matching fails; the at least one dimension comprises an attribute dimension, a rule dimension, or a third party information dimension;

and the marking unit is used for marking the domain name to be detected based on the second matching result when the second matching result represents that the matching is successful, so as to obtain a final identification result.

The embodiment of the application also provides a device for identifying the malicious domain name, which comprises a processor, a memory and a communication bus;

the communication bus is used for realizing communication connection between the processor and the memory;

the processor is used for executing a control program which is stored in the memory and used for identifying the malicious domain name so as to realize the vulnerability scanning method provided by any one of the above methods.

The embodiment of the application also provides a storage medium, wherein an identification program for the malicious domain name is stored on the storage medium, and the identification program is executed by the processor to realize the vulnerability scanning method provided by any item.

The method for identifying the malicious domain name provided by the embodiment comprises the following steps: performing virus flow matching on the domain name to be detected by acquiring the domain name to be detected to obtain a first matching result; if the first matching result represents that the virus flow matching fails, matching the domain name to be detected by adopting a matching mode of at least one dimension to obtain a second matching result; the at least one dimension comprises an attribute dimension, a rule dimension, or a third party information dimension; when the second matching result represents that the matching is successful, the domain name to be detected is marked based on the second matching result to obtain a final identification result, whether the domain name to be detected is a malicious domain name is judged according to the attribute of the domain name to be detected, the condition that evidence providing is carried out when the domain name to be detected belongs to a certain virus family is avoided, and the domain name to be detected is identified by using a multi-dimensional and multiple identification modes, so that the final identification result is more reliable, the identification process is simplified, and the identification efficiency is improved.

Drawings

Fig. 1 is a first flowchart illustrating a method for identifying a malicious domain name according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart illustrating a second method for identifying a malicious domain name according to an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating a third method for identifying a malicious domain name according to an embodiment of the present application;

fig. 4 is a fourth schematic flowchart of a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 5 is a schematic flowchart illustrating a fifth flow chart of a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 6 is a sixth schematic flowchart of a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 7 is a seventh flowchart illustrating a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 8 is a schematic flowchart eight illustrating a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 9 is a flowchart illustrating a method for identifying a malicious domain name according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus for identifying a malicious domain name according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of another apparatus for identifying a malicious domain name according to an embodiment of the present application.

Detailed Description

The technical solution in this embodiment will be clearly and completely described below with reference to the drawings in this embodiment.

The terminology used in the examples of this application:

domain Name (Domain Name): the name of a computer or a group of computers on the Internet, which is also called a network domain, is composed of a string of names separated by points, and is used for positioning identification (sometimes also referred to as a geographical position) of the computer during data transmission, and has a mapping relation with an IP address.

Virus family (virus family): the method is characterized in that various characteristics of a plurality of virus samples are extracted by technicians, the samples meeting the same characteristics are classified into the same virus family, and certain fields are used as the names of the virus families.

As shown in fig. 1, the present embodiment provides a method for identifying a malicious domain name, including:

s101, acquiring a domain name to be detected, and performing virus flow matching on the domain name to be detected to obtain a first matching result.

The method for identifying the malicious domain name is applied to electronic equipment or a device for identifying the malicious domain name, the electronic equipment or the device for identifying the malicious domain name has the capability of accessing a network, network access can be performed through tools such as a browser, and the domain name to be detected is the domain name which has been accessed or is being accessed by the device. The identification device obtains the domain name to be detected, simulates an access process of the domain name to be detected through a sandbox, obtains virus flows containing some malicious characteristics from the access process, matches the malicious characteristics contained in the virus flows with the characteristics of the identified malicious domain name to obtain a malicious label corresponding to the domain name to be detected, and marks the domain name to be detected. If the matching is successful, correspondingly marking the same label of the domain name to be detected according to the label of the identified malicious domain name, and representing the successful matching by the first matching result; otherwise, a first matching result representing the matching failure is obtained.

The virus flow matching means that the device simulates the execution of a file through a file sandbox, simulates the access process of a domain name to be detected, and matches the communication flow characteristics released in the simulation process with the malicious characteristics of the identified malicious domain name.

S102, if the first matching result represents that the virus flow matching fails, matching the domain name to be detected in a matching mode of at least one dimension to obtain a second matching result; the at least one dimension includes an attribute dimension, a rule dimension, or a third party information dimension.

In the embodiment of the application, if the first matching result represents successful matching, it indicates that the identification device has successfully identified the domain name to be detected, and obtains a label or a mark representing what type of malicious domain name the domain name to be detected belongs to, and if the first matching result represents failure in matching of virus traffic, it indicates that the malicious type of the domain name to be detected cannot be identified, so that the domain name to be detected is further matched by adopting a matching mode of at least one dimension, including identifying the domain name to be detected uniquely through a plurality of attribute dimensions, rule dimensions, third party information dimensions, and the like according to the attribute information, the feature information, or the identification information provided by an external platform, of the domain name to be detected.

It should be noted that the second matching result also includes two possibilities, that is, successfully identifying the domain name to be detected, obtaining a label or a tag representing what type of malicious domain name the domain name to be detected belongs to, or failing to match.

S103, when the second matching result represents that the matching is successful, marking the domain name to be detected based on the second matching result to obtain a final identification result.

In the embodiment of the application, when the second matching result represents that the matching is successful, it indicates that the identification device successfully obtains the malicious type of the domain name to be detected, and then the same marking is performed on the domain name to be detected according to the label or the mark of the malicious domain name matched by the second matching result.

If the second matching result still represents that the matching fails, the domain name to be detected is possibly not a malicious domain name or belongs to a domain name of a certain never-appeared malicious type, a default label is marked for the domain name to be detected, and the domain name to be detected is reserved for next re-identification or manual identification.

It should be noted that, in this embodiment of the present application, there may also be a case where the malicious type of the domain name to be detected is successfully identified in the step S101 virus traffic matching link, and then the step S103 is directly skipped, and the first matching result is used as the second matching result, or the same marking is directly performed on the domain name to be detected according to the first matching result.

In the embodiment of the application, the domain name to be detected is obtained, the virus flow matching is carried out on the domain name to be detected to obtain a first matching result, if the first matching result represents that the virus flow matching fails, a matching mode of at least one dimension is adopted, matching the domain name to be detected to obtain a second matching result, wherein at least one dimension comprises an attribute dimension, a rule dimension or a third-party information dimension, and when the second matching result represents that the matching is successful, based on the second matching result, the domain name to be detected is marked to obtain a final identification result, whether the domain name is a malicious domain name is judged according to the attribute of the domain name to be detected, the condition that evidence providing is carried out when the domain name to be detected belongs to a certain virus family is avoided, and the domain name to be detected is identified by using a multi-dimensional and multi-identification mode, so that the final identification result is more reliable, the identification process is simplified, and the identification efficiency is improved.

In some embodiments of the present application, the attribute dimensions include a type dimension, the rule dimensions include a feature dimension, and the third party information dimensions include external threat information dimensions of different origins.

In the embodiment of the application, the attribute dimension is that the identification device performs domain name type dimension matching on a domain name to be detected according to the attribute information of the domain name to be detected to obtain an attribute matching result; the rule dimension is that the identification device matches the feature dimension of the domain name to be detected according to the feature information existing in the domain name to be detected to obtain a feature matching result; and the third-party information dimension is that the identification device performs matching of external threat information dimensions of different sources on the domain name to be detected according to the external platform to obtain an information matching result.

The attribute information of the domain name to be detected, for example, some types of domain names have inheritance characteristics in a programming language, for example, some attributes of a parent domain name or a child domain name of the domain name to be detected, and the domain name to be detected has the same property when being used as the child domain name or the parent domain name, and by using the property, the domain name to be detected can be subjected to attribute dimension matching; the rule dimension means that there may be some rules of the domain name to be detected and the identified malicious domain name of a certain type, for example, some features of the domain name to be detected on characters are similar to the rule features of the identified malicious domain name, so that the domain name to be detected is matched with the rule dimension; the third party information dimension provides some external threat information about the malicious domain name for the external platform, and the malicious type of the domain name to be detected can be identified by providing matching with the external threat information provided by the external platform.

As shown in fig. 2, in some embodiments of the present application, step S102 includes:

s102a1, determining the current domain name type of the domain name to be detected;

s102a2, if the current domain name type is a non-dynamic domain name, judging whether a parent domain name corresponding to the domain name to be detected is an identified malicious domain name, and obtaining an attribute matching result;

and S102a3, if the attribute matching result is the identified malicious domain name, taking the attribute matching result as a second matching result.

In the embodiment of the application, when the identification device matches the domain name to be detected by using the attribute dimension, it is required to determine whether the domain name to be detected has a subordinate sub-domain name or whether the domain name to be detected has a parent domain name, and if the parent domain name belongs to a certain family, the subordinate sub-domain name also belongs to the family, so that when the parent domain name or the child domain name of the domain name to be detected is detected to be identified as a malicious domain name, an attribute matching result is obtained according to the label or the mark of the parent domain name or the child domain name, and the attribute matching result is used as a second matching result.

However, due to the particularity of the dynamic domain name, the parent domain name does not have the attribute, so that before judging whether the domain name to be detected has the child domain name or the parent domain name, whether the current domain name type is the dynamic domain name needs to be judged, wherein if the current domain name type is judged to be the dynamic domain name, the second matching result is directly confirmed to be the matching failure.

If the current domain name type is a non-dynamic domain name, judging whether a sub domain name or a father domain name exists in the domain name to be detected, if the sub domain name or the father domain name exists in the domain name to be detected and the sub domain name or the father domain name is identified and marked with a malicious label, and obtaining an attribute matching result according to the label or the mark of the father domain name or the sub domain name.

According to the method and the device, the attribute dimensionality of the domain name to be detected is matched, whether a father domain name or a son domain name exists in the domain name to be detected is judged, the father domain name or the son domain name exists in the domain name to be detected, the father domain name or the son domain name is identified as the malicious domain name, and when the corresponding label or mark exists, the domain name to be detected is identified and marked according to the characteristic that the label or mark attribute of the malicious domain name of the non-dynamic domain name can be carried over, so that the malicious domain name can be identified quickly and efficiently, meanwhile, the process is simplified, and evidence providing information when the domain name to be detected is judged to belong to a.

As shown in fig. 3, in some embodiments of the present application, step S102 further includes:

s102b1, acquiring the current characteristics of the domain name to be detected; wherein the current features include: a current domain name characteristic and a current address information characteristic;

s102b2, judging whether the preset malicious feature library hits the current features or not, and obtaining a feature matching result;

s102b3, if the feature matching result represents that any one of the current domain name feature and the current address information feature is hit, the feature matching result is used as a second matching result.

The rule dimension in the embodiment of the present application means that there may be some domain names to be detected, and rules possessed by a certain type of identified malicious domain names, for example, some features of the domain names to be detected on characters are similar to the features of the rules possessed by the identified malicious domain names, so that the device performs rule dimension matching on the domain names to be detected.

The current domain name feature refers to the character feature of the domain name. Some virus families, whose domain names have fixed characteristics, such as wopami-like domain names like "number. The mine class domain name typically contains the keywords: "mine", "pool", "xmr", "btc", and the like are shown in Table 1. Current address information characteristics, for example, the device finds, through statistical analysis, that some common virus families are connected to some fixed IP addresses, as shown in table 2, a small-class virus family is connected to 23.20.239.12 or 117.20.41.86, an ircbot-type virus family is often connected to "eoahoehgegehr.nl", and so on, as shown in table 3, according to the above features, the identifying device can perform feature matching on a domain name to be detected by collecting domain name characteristics of multiple (virus) families and IP characteristics to form a preset malicious characteristic library, and if a previous domain name characteristic exists in the preset malicious characteristic library, a second matching result indicating that the domain name to be detected belongs to which virus family is obtained.

^up\d{0,1}\.nba1001\.com$	wapomi
		^[a-z]{2}[0-9]{4}\.info$	wapomi
^\d{1,3}\.nslook\d{3}\.com$	wapomi
		^server-\d{1,2}.+	glupteba
^[a-z].+\.ws$	conficker
		^xmr\..+pool.+	minepool
^pool\..+mine.+	minepool
		^etc\..+pool.+	minepool

TABLE 1

157.122.62.194	conficker
		23.20.239.12	small
87.106.190.169	kryptik
		117.20.41.86	small
199.2.137.29	dorkbot
		216.218.135.114	teslacrypt

TABLE 2

oeboufanecoauegfe.es	ircbot
		eoahegohaeohgeehr.nl	ircbot
facecommute.com	glupteba
		server-23.samgames.org	glupteba
server-46.speakingworld.org	glupteba

TABLE 3

According to the method and the device, whether the preset malicious feature library hits the current features is judged by obtaining the current features of the domain name to be detected, the feature matching result is obtained, the domain name to be detected is judged according to the features of the common malicious domain name, the problem that the domain name to be detected without sample source data is difficult to mark in the prior art is solved, the malicious domain name can be identified quickly and efficiently, and the identification precision is improved.

As shown in fig. 4, in some embodiments of the present application, step S102 further includes:

s102c1, acquiring a plurality of external threat information aiming at different sources of the domain name to be detected; each external threat information represents classification information of the domain name to be detected;

s102c2, counting a plurality of external threat information to obtain an intelligence matching result of the domain name to be detected;

s102c3, the intelligence matching result is used as the second matching result.

In the embodiment of the application, a plurality of relatively accurate threat information platforms are used, external threat information is used for being matched with the domain name to be detected, the malicious domain name of which type the domain name to be detected belongs to is judged, and the external threat information corresponding to the domain name to be detected is collected by the recognition device through an API (application programming interface) or crawler form given by the platforms. Such a platform includes: microstep online, Qian's letter threat information platform, enlightening star threat information platform, Virusotal and the like, and the platforms can provide different external threat information, so that the device can be conveniently matched with the domain name to be detected.

The different platforms may provide different external threat information for the domain name to be detected, and the classification information for identifying the domain name to be detected is different, so that the external threat information needs to be counted, an information matching result of the domain name to be detected is obtained according to the external threat information with the highest matching frequency or identification frequency, and the information matching result is used as a second matching result.

In some embodiments of the present application, step S102c1 includes:

acquiring a plurality of sample information of different sources aiming at a domain name to be detected;

and carrying out classification information matching on the plurality of sample information to obtain a plurality of external threat information.

In the embodiment of the application, each threat intelligence platform may provide two kinds of attribute information, including directly providing family information of the domain name to be detected, or providing communication sample information about the domain name to be detected. If the threat information platform provides the communication sample information, the communication sample information needs to be inquired first, and the communication sample information is converted into external threat information by retrieving the internal file group information of the domain name or matching the existing feature library.

According to the method and the device for detecting the domain name, the domain name to be detected is matched with the plurality of external threat information of different sources, the problem that virus families cannot be correctly marked when the domain name is used for a plurality of virus samples in the prior art is solved, the condition that a user misunderstanding is easily caused because some identification methods display the virus family information of the plurality of samples at the same time is avoided, the identification accuracy is improved, and misinformation is reduced.

In some embodiments of the present application, step S102c2 includes:

disassembling the external threat information to obtain a plurality of disassembled classification information; de-aliasing the plurality of disassembled classification information to obtain a plurality of family names; counting the multiple family names to obtain a target family name with the maximum statistical probability; and taking the target family name as an intelligence matching result of the domain name to be detected.

In this embodiment of the application, the external threat information acquired by the identification device is a plurality of searching and killing information of a family, a final family result needs to be obtained by integrating the information, which may be a virus name family name, and a final identification result is output according to the searching and killing result of each manufacturer. The method comprises the following steps of (1) splitting fields by identifying corresponding special characters according to virus names of Trojan/Win 32/utilization, Trojan: Win32: Vupa, Trojan-Win 32-Malcio and the like; listing the keywords corresponding to the regular matching, such as Trojan, win 32; all fields are subjected to lowercase writing, matching and duplicate removal are facilitated, whether the rest part is the correct virus name or not is identified by removing the fields of the platform and the classification, the second part is to extract the virus names of various manufacturers, and alias names of part of the virus names are converted into the correct virus names through fuzzy matching to carry out quantity statistics; the voting result of each identified virus name is maximized to obtain the virus family names except for the type and the platform; and finally, selecting and processing the types and the platform parts of the virus names, firstly classifying the virus names and extracting the platform contents in each manufacturer, converting partial alias into a correct name, and finally selecting the most classified and platform labels in the manufacturers to splice into a complete virus name standard format as an information matching result of the domain name to be detected. And if the external threat information fails to provide the external threat information related to the domain name to be detected or a plurality of comparable virus family names appear in the voting result, obtaining an intelligence matching result representing the matching failure.

It should be noted that, through the matching of the third-party information dimensions, the information in the threat information platform is extracted and integrated, and after the specific family information of the domain name is confirmed, the characteristics of the domain name can be optimized and then added into the characteristic library for the next characteristic matching.

According to the method and the device, the matching of the multiple external threat information of different sources is carried out on the domain name to be detected, the problem that virus families cannot be correctly marked when the domain name is aimed at multiple virus samples in the prior art is solved by providing different platforms, the condition that the virus families are easily misunderstood by users due to the fact that the virus family information of the multiple samples is displayed by some identification methods is avoided, the identification accuracy is improved, the false alarm is reduced, and the virus family most suitable for the domain name can be obtained according to the external threat information provided by the multiple platforms.

In some embodiments of the present application, step S102 further comprises:

when the attribute matching result representation matching failure is obtained by adopting the attribute dimension matching mode to match the type dimension of the domain name to be detected, the characteristic dimension matching result is obtained by adopting the regular dimension matching mode to match the characteristic dimension of the domain name to be detected, and the characteristic matching result is used as a second matching result; or the like, or, alternatively,

and when the attribute matching result representation matching obtained by matching the type dimensions of the domain name to be detected in the attribute dimension matching mode fails, matching the external threat information dimensions of different sources of the domain name to be detected in the third party information dimension matching mode to obtain an information matching result, and taking the information matching result as a second matching result.

FIG. 5 shows a combination of attribute dimension matching + rule dimension matching. In the embodiment of the application, the identification combinations and the sequence of different dimensionality matching modes can be various, the matching modes comprise three dimensionalities of an attribute dimensionality, a rule dimensionality or a third-party information dimensionality, when the two-dimensionality matching mode is adopted and the domain names to be detected are matched, the two-dimensionality matching mode can be combined into 6 identification combinations in total, the method comprises the steps of adopting the attribute dimensionality to match the domain names to be detected in the domain name type dimensionality after the virus flow matching fails, if the attribute matching result represents that the attribute dimensionality matching fails, adopting the regular dimensionality matching mode to match the domain names to be detected in the feature dimensionality, and also can be a combination of matching of the rule dimensionality and matching of the attribute dimensionality; the method comprises the steps of adopting attribute dimensionality to match domain name type dimensionality to a domain name to be detected, and if attribute matching results indicate that the attribute dimensionality matching fails, adopting a third-party information dimensionality matching mode to match the domain name to be detected with external threat information dimensionalities of different sources, wherein the matching mode can also be a combination of the third-party information dimensionality matching and the attribute dimensionality matching; and matching the feature dimensions of the domain name to be detected by adopting a rule dimension matching mode, and matching the external threat information dimensions of different sources of the domain name to be detected by adopting a third-party information dimension matching mode if the feature matching result represents that the rule dimension matching fails, or a combination of the third-party information dimension matching and the rule dimension matching.

According to the embodiment of the application, the domain name to be detected is flexibly identified through different combination forms of multiple matching modes, various identification combinations are provided, the identification efficiency of the domain name to be detected is improved, and the risk that the domain name cannot be identified due to a single identification mode is avoided.

In some embodiments of the present application, step S102 further comprises:

and when the characteristic matching result representation matching failure is obtained by matching the characteristic dimensions of the domain name to be detected in a regular dimension matching mode, matching the external threat information dimensions of different sources of the domain name to be detected in a third party information dimension matching mode to obtain an information matching result, and taking the information matching result as a second matching result.

In some embodiments of the present application, step S102 further comprises:

when attribute matching result representation matching failure is obtained by adopting an attribute dimension matching mode for matching the type dimension of the domain name to be detected, and feature matching result representation matching failure is obtained by adopting a rule dimension matching mode for matching the feature dimension of the domain name to be detected, information matching results are obtained by adopting a third-party information dimension matching mode for matching external threat information dimensions of different sources of the domain name to be detected, and the information matching results are used as second matching results; or the like, or, alternatively,

when an attribute matching result representation matching failure is obtained by adopting an attribute dimension matching mode for matching the type dimension of the domain name to be detected, and an information matching result representation matching failure is obtained by adopting a third party information dimension matching mode for matching the external threat information dimensions of different sources of the domain name to be detected, a characteristic matching result is obtained by adopting a regular dimension matching mode for matching the characteristic dimension of the domain name to be detected, and the characteristic matching result is used as a second matching result; or the like, or, alternatively,

when the characteristic matching result representation matching failure is obtained by matching the characteristic dimensions of the domain name to be detected in a regular dimension matching mode, and the information matching result representation matching failure is obtained by matching the external threat information dimensions of different sources of the domain name to be detected in a third party information dimension matching mode, the attribute matching result is obtained by matching the domain name to be detected in the attribute dimension matching mode, and the attribute matching result is used as a second matching result.

In this application embodiment, the identification combination and the order to different dimension matching modes can be various, and the matching mode includes attribute dimension, rule dimension or the three dimension of third party's information dimension, and on the matching mode that adopts three dimension, on the basis of the matching mode that adopts two dimensions, the combination form that matches is handled to detect the domain name with the above-mentioned matching mode that adopts two dimensions is similar, and totally 6 kinds of identification combination also include:

matching attribute dimensions, matching rule dimensions and matching third-party information dimensions;

matching attribute dimensions, matching third-party information dimensions and matching rule dimensions;

matching of rule dimension, matching of attribute dimension and matching of third-party information dimension;

matching of rule dimension, matching of third-party information dimension and matching of attribute dimension;

matching of third-party information dimensions, matching of attribute dimensions and matching of rule dimensions;

matching of third party information dimension + matching of rule dimension + matching of attribute dimension;

as shown in fig. 6, in the combined matching form of matching of attribute dimensions, matching of rule dimensions, and matching of third-party information dimensions, after a failure of matching of virus traffic is performed on a domain name to be detected, matching of attribute dimensions is performed on the domain name to be detected, if a failure of matching is represented by an attribute matching result, matching of rule dimensions is performed on the domain name to be detected, and if a failure of matching is represented by a feature matching result, matching of third-party information dimensions is performed on the domain name to be detected, so that a second matching result is obtained.

It should be noted that, if the second matching result represents that the matching fails, the identifying device performs engine identification on the domain name to be detected to match the domain name to be detected, so as to obtain a second matching result. Some search engines (such as google/hundredth, etc.) provide family information of domain names, and the device can determine the family information of the domain names to be detected by searching the domain names to be detected or keywords of the domain names to be detected. In addition, the identification device can also confirm the related family information of the domain name according to the security event type, for example, some security event types such as a fishing link can be popped up when a certain mining domain name is accessed, and the related family information of the domain name can also be confirmed by searching the security event types.

If the second matching result obtained through the engine identification still represents that the matching fails, which indicates that the domain name to be detected may not be a malicious domain name or belongs to a domain name of a certain never-appeared malicious type, a default label is marked for the domain name to be detected, and the next identification or manual identification is reserved.

As shown in fig. 7, an embodiment of the present application further provides a method for identifying a malicious domain name, including:

and acquiring the domain name to be detected, and matching the domain name to be detected through the virus flow obtained in the matching sandbox. And simulating the state of the executed file through the file sandbox, matching the released communication flow characteristics with the virus flow library, marking a correct label if the matching is successful, and entering the second step if the matching is not successful.

The second step of the parent domain matching principle is as follows: if the parent domain name belongs to a family, the subordinate child domain names also belong to the family (e.g., ns2275ab. com belongs to win32.trojan. wapomi, and 192.ns2275ab. com and 199.ns2275ab. com both belong to win32.trojan. wapomi). For the matching of the parent domain names, firstly, the dynamic domain names are excluded, because of the particularity of the dynamic domain names, the parent domain names or the child domain names do not have the attributes, specifically, the method comprises the steps of collecting a sorted list (matching library) of the dynamic domain names through a device, judging whether the domain names to be detected are the dynamic domain names or not, filtering out the domain names to be detected of the type of the dynamic domain names, and if the parent domain names of the domain names to be detected exist in the dynamic domain name list, directly entering the third step; if the domain name is not in the dynamic domain name list, entering parent domain name matching: judging whether the domain name to be detected belongs to the earphone domain name with the label in the matching library or not by judging whether the parent domain name identifies the family information or not, and if so, marking the same label on the sub domain name; if not, entering the third step.

The third step is to match the extracted feature library, and the communication domain names of some virus families have fixed features. For example, wopami domain names are: com, nslokup; the mine class domain name typically contains the keywords: mine, pool, xmr, btc, etc. According to the above features, domain name features of multiple (virus) families are collected in the feature library, as well as IP features or other features. If the domain name to be detected hits the characteristics of the characteristic library, marking out corresponding classified family information; if not, the fourth step is entered.

The above methods belong to internal threat intelligence. If the internal threat information can not be identified, identifying by adopting a fourth external threat information mode, wherein at present, a plurality of platforms can also mark corresponding family labels for domain name information, and the device collects corresponding information through an API (application program interface) or crawler form given by the threat information platform, wherein the platform comprises the following steps: microstep online, Qianxin threat information platform, Qimingxing threat information platform, Venuseye, Virusotal, etc. Each threat intelligence platform typically provides two types of attribute information: directly give family information or provide communication sample information. After the classification family information of a domain name on each platform is obtained, the family information of the domain name is finally obtained in a weighted voting mode. If the threat intelligence platform provides the communication sample information, the communication sample information needs to be converted into family information, and then the family information of the domain name is obtained through a weighted voting mode.

Fig. 8 shows a process of obtaining the family information of the domain name by a weighted voting according to the classification family information of each platform, which includes the following specific steps:

the first part can split all the character strings of the virus names of all the platforms into a plurality of parts by splitting the character strings; for example: the method comprises the following steps of identifying virus names of Trojan/Win 32/benefit, Trojan Win32, Vupa, Trojan-Win 32-Malcio, splitting fields by identifying corresponding special characters, listing corresponding keywords by regular matching, such as Trojan and Win32, and carrying out lowercase writing on all the fields to facilitate matching and duplicate removal;

the second part is to remove the platform and classified fields to identify whether the residual part is the correct virus name, firstly, the virus names in each manufacturer are extracted, for example, troj/W32. the platform is removed and classified to obtain the sale, then, part of the virus names are subjected to alias removal treatment through fuzzy matching, alias conversion is carried out to obtain the correct virus name, for example, the correct virus name of the salerop type is the sale, the correct virus name of the saleode type is also the sale, and then, the obtained virus names are subjected to quantity statistics, for example, the correct virus name obtains 10 tickets, the ramnit virus name obtains 5 tickets, and the wap virus name obtains 1 ticket; finally, the voting result of each identified virus name is maximized, for example, the voting result is maximized to capacity, and the virus family names except the type and the platform are obtained;

and the third part is to select and process the types of the virus names and the platform parts, firstly, the virus names in each platform are classified, the platform contents are extracted to remove alias, partial alias is converted into correct name, for example, the virus prefix troj is removed to obtain trojan, finally, the most classified and platform labels in the platform are selected and spliced into a complete virus name standard format to obtain win32.trojan.

And if the matching of the domain name to be detected fails, entering an engine identification process. Some search engines (such as google/hundredth, etc.) provide family information of domain names, and the device can determine the family information of the domain names to be detected by searching the domain names to be detected or keywords of the domain names to be detected, as shown in fig. 9. In addition, the device may also confirm the relevant family information for the domain name based on the security event type.

If the matching is still failed through the engine identification, which indicates that the domain name to be detected is not possibly a malicious domain name or belongs to a domain name of a certain never-appeared malicious type, marking a default label for the domain name to be detected, and reserving for next identification or manual identification.

The identification method provided in the embodiment of the application avoids the problems that a platform in the prior art only analyzes a single sample, the processing flow is complicated, the reusability is poor, a large amount of data and even data without sample sources cannot be subjected to family marking, the condition that evidence providing is carried out when the data belong to a certain virus family is avoided, and a domain name to be detected is identified by using a multi-dimensional and multiple identification modes, so that the final identification result is more reliable, the identification flow is simplified, and the identification efficiency is improved.

As shown in fig. 10, an embodiment of the present application further provides a malicious domain name recognition apparatus 2, including:

an obtaining unit 201, configured to obtain a domain name to be detected, and perform virus traffic matching on the domain name to be detected to obtain a first matching result;

the matching unit 202, if the first matching result represents that the virus traffic matching fails, the obtaining unit matches the domain name to be detected by adopting a matching mode of at least one dimension to obtain a second matching result; the at least one dimension comprises an attribute dimension, a rule dimension, or a third party information dimension;

and the marking unit 203 is configured to mark the domain name to be detected based on the second matching result when the second matching result represents that the matching is successful, so as to obtain a final identification result.

In some embodiments of the present application, the attribute dimension includes matching a domain name type dimension of a domain name to be detected to obtain an attribute matching result, the rule dimension includes matching a feature dimension of the domain name to be detected to obtain a feature matching result, and the third-party information dimension includes matching external threat information dimensions of different sources of the domain name to be detected to obtain an intelligence matching result.

In some embodiments of the present application, the malicious domain name recognition apparatus 2 further includes:

the judging unit is used for determining the current domain name type of the domain name to be detected;

the judging unit is also used for judging whether a parent domain name corresponding to the domain name to be detected is an identified malicious domain name or not if the current domain name type is a non-dynamic domain name, and obtaining an attribute matching result;

the matching unit 202 is further configured to take the attribute matching result as a second matching result if the attribute matching result is the identified malicious domain name.

the obtaining unit 201 is further configured to obtain a current feature of the domain name to be detected; wherein the current features include: a current domain name characteristic and a current address information characteristic;

the judging unit is also used for judging whether the preset malicious feature library hits the current features or not to obtain a feature matching result;

the matching unit 202 is further configured to, if the feature matching result indicates that any one of the current domain name feature and the current address information feature is hit, take the feature matching result as a second matching result.

the obtaining unit 201 is further configured to obtain a plurality of external threat information for different sources of the domain name to be detected; each external threat information represents classification information of the domain name to be detected;

the matching unit 202 is further configured to count a plurality of external threat information to obtain an intelligence matching result of the domain name to be detected;

the matching unit 202 is further configured to use the intelligence matching result as a second matching result.

the obtaining unit 201 is further configured to obtain a plurality of sample information of different sources for the domain name to be detected;

the obtaining unit 201 is further configured to perform classification information matching on the plurality of sample information to obtain a plurality of external threat information.

the obtaining unit 201 is further configured to disassemble the multiple pieces of external threat information to obtain multiple pieces of disassembled classification information;

the obtaining unit 201 is further configured to make a de-alias name for the plurality of disassembled classification information to obtain a plurality of family names;

the obtaining unit 201 is further configured to count a plurality of family names to obtain a target family name with a maximum statistical probability;

the matching unit 202 is further configured to use the target family name as an intelligence matching result of the domain name to be detected.

the matching unit 202 is further configured to, when an attribute matching result representation matching failure is obtained by performing type dimension matching on the domain name to be detected in an attribute dimension matching manner, perform feature dimension matching on the domain name to be detected in a regular dimension matching manner to obtain a feature matching result, and use the feature matching result as a second matching result; or the like, or, alternatively,

the matching unit 202 is further configured to, when attribute matching result representation matching obtained by performing type dimension matching on the domain name to be detected in an attribute dimension matching manner fails, perform matching of external threat information dimensions of different sources on the domain name to be detected in a third party information dimension matching manner to obtain an intelligence matching result, and take the intelligence matching result as a second matching result; or the like, or, alternatively,

the matching unit 202 is further configured to, when the characteristic matching result representation matching obtained by performing feature dimension matching on the domain name to be detected in the regular dimension matching manner fails, perform matching of external threat information dimensions of different sources on the domain name to be detected in the third party information dimension matching manner to obtain an intelligence matching result, and use the intelligence matching result as a second matching result.

the matching unit 202 is further configured to, when attribute matching result representation matching failure is obtained by performing type dimension matching on the domain name to be detected in an attribute dimension matching manner, and feature matching result representation matching failure is obtained by performing feature dimension matching on the domain name to be detected in a regular dimension matching manner, perform matching of external threat information dimensions of different sources on the domain name to be detected in a third-party information dimension matching manner to obtain an intelligence matching result, and use the intelligence matching result as a second matching result; or the like, or, alternatively,

the matching unit 202 is further configured to, when attribute matching result representation matching failure is obtained by performing type dimension matching on the domain name to be detected in an attribute dimension matching manner, and when information matching result representation matching failure is obtained by performing external threat information dimension matching on the domain name to be detected in a third-party information dimension matching manner, perform feature dimension matching on the domain name to be detected in a regular dimension matching manner to obtain a feature matching result, and use the feature matching result as a second matching result; or the like, or, alternatively,

the matching unit 202 is further configured to, when a feature matching result representation matching failure is obtained by performing feature dimension matching on the domain name to be detected in a regular dimension matching manner, and when an intelligence matching result representation matching failure is obtained by performing matching on the domain name to be detected in external threat information dimensions of different sources in a third party information dimension matching manner, perform domain name type dimension matching on the domain name to be detected in an attribute dimension matching manner to obtain an attribute matching result, and use the attribute matching result as a second matching result.

As shown in fig. 11, an embodiment of the present application further provides a malicious domain name recognition apparatus 3, which includes a processor 301, a memory 302, and a communication bus 303;

the communication bus 303 is used for realizing communication connection between the processor 301 and the memory 302;

the processor 301 is configured to execute a control program for identifying a malicious domain name stored in the memory 302 to implement any one of the vulnerability scanning methods provided above.

The embodiment of the application also provides a storage medium, wherein an identification program for the malicious domain name is stored on the storage medium, and the vulnerability scanning method provided by any item is realized when the identification program is executed by the processor.

One skilled in the art will appreciate that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the size of the serial number of each process described above does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the present embodiment. The above-mentioned serial number of the embodiment is merely for description and does not represent the merits of the embodiment.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solution of the present embodiment essentially or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the method of the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for identifying a malicious domain name, the method comprising:

2. The method of claim 1, wherein the attribute dimensions include a type dimension, wherein the rule dimensions include a feature dimension, and wherein the third-party information dimensions are external threat information dimensions of different origins.

3. The method according to claim 2, wherein when the at least one dimension is an attribute dimension, the matching of the domain name to be detected is performed in the at least one dimension matching manner to obtain a second matching result, and the method comprises:

determining the current domain name type of the domain name to be detected;

and if the attribute matching result is the identified malicious domain name, taking the identified attribute matching result as the second matching result.

4. The method according to claim 2, wherein when the at least one dimension is a regular dimension, the matching of the domain name to be detected by using the matching mode of the at least one dimension to obtain a second matching result comprises:

5. The method according to claim 2, wherein when the at least one dimension is a third-party information dimension, the matching of the domain name to be detected is performed in a matching manner of the at least one dimension to obtain a second matching result, and the method comprises:

and using the intelligence matching result as a second matching result.

6. The method according to claim 5, wherein the obtaining a plurality of external threat information for different sources of the domain name to be detected comprises:

7. The method according to claim 5 or 6, wherein the counting the plurality of external threat information to obtain the intelligence matching result of the domain name to be detected comprises:

8. The method according to claim 2, wherein when the at least one dimension includes an attribute dimension and a rule dimension, or an attribute dimension and a third-party information dimension, the matching of the domain name to be detected by using a matching mode of the at least one dimension to obtain a second matching result includes:

9. The method according to claim 2, wherein when the at least one dimension includes a rule dimension and a third party information dimension, the matching of the domain name to be detected by using a matching manner of the at least one dimension to obtain a second matching result includes:

10. The method according to claim 2, wherein when the at least one dimension includes an attribute dimension, a rule dimension, and a third party intelligence dimension, the matching of the domain name to be detected in the at least one dimension matching manner is performed to obtain a second matching result, including:

when attribute matching result representation matching failure is obtained by matching type dimensions of the domain name to be detected in an attribute dimension matching mode, and external threat information dimensions of different sources are matched on the domain name to be detected in a third party information dimension matching mode to obtain information matching result representation matching failure, matching feature dimensions of the domain name to be detected in a regular dimension matching mode to obtain a feature matching result, and taking the feature matching result as the second matching result; or the like, or, alternatively,

when the characteristic matching result representation matching obtained by matching the characteristic dimensions of the domain name to be detected in the regular dimension matching mode fails, and the matching of external threat information dimensions of different sources is performed on the domain name to be detected in the third party information dimension matching mode to obtain the information matching result representation matching failure, the domain name to be detected is matched in the domain name type dimension matching mode to obtain the attribute matching result, and the attribute matching result is used as the second matching result.

11. An apparatus for identifying a malicious domain name, the apparatus comprising:

12. An apparatus for identifying malicious domain names, the apparatus comprising a processor, a memory, and a communication bus;

the processor is configured to execute a control program stored in the memory for identifying malicious domain names, so as to implement the steps of any one of claims 1 to 10.

13. A storage medium having stored thereon an identification program for malicious domain names, which identification program, when executed by a processor, implements the method according to any one of claims 1 to 10.