Summary of the invention
Based on the problems referred to above, the embodiment of the invention discloses a kind of software identification method and device, to improve the accuracy utilizing character string identification software.Technical scheme is as follows:
First aspect, embodiments provides a kind of software identification method, comprising:
The executable file of software to be identified is carried out dis-assembling process;
Remove the code belonging to built-in function in the executable file after dis-assembling process;
Character string to be identified is extracted from residue code;
From described character string to be identified, determine the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching; Wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function;
According to the ratio of described first quantity and described second quantity, determine the recognition result of described software to be identified.
Preferably, from residue code, extract character string to be identified, comprising:
Determine the global address in described residue code;
Character string to be identified is extracted from determined global address.
Preferably, remove the code belonging to built-in function in the executable file after dis-assembling process, comprising:
Build the functional blocks of the executable file after dis-assembling process, and be stored in the 3rd functional blocks set;
Determine the functional blocks belonging to built-in function in described 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined;
Accordingly, from residue code, extract character string to be identified, comprising:
Belong to from removal in the remaining functional blocks the described 3rd functional blocks set of the functional blocks of built-in function and extract character string to be identified.
Preferably, the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Preferably, the building process in described first character string storehouse and the second character string storehouse comprises:
Respectively dis-assembling process is carried out to the first executable file corresponding to the Malware of the 3rd quantity, and respectively dis-assembling process is carried out to the first executable file corresponding to the normal software of the 4th quantity;
Build the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Determine the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and remove the functional blocks belonging to built-in function determined;
From the functional blocks that described first functional blocks set is remaining, extract character string and build the first character string storehouse;
From the functional blocks that described second functional blocks set is remaining, extract character string and build the second character string storehouse.
Preferably, behind structure first character string storehouse and the second character string storehouse, described method also comprises:
Determine the common character string existed in described first character string storehouse and the second character string storehouse;
The character string of determined common existence is removed from the first character string storehouse and the second character string storehouse.
Preferably, described software identification method also comprises:
Show the recognition result of described software to be identified.
Preferably, described recognition result, comprising:
Normal software, Malware, partially normal software or partially Malware.
Second aspect, embodiments provides a kind of software recognition device, comprising:
Dis-assembling module, for carrying out dis-assembling process by the executable file of software to be identified;
Module removed by built-in function code, for the code belonging to built-in function in the executable file after removing dis-assembling process;
Text string extracting module, for extracting character string to be identified from residue code;
Quantity determination module, for from described character string to be identified, determines the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Result determination module, for the ratio according to described first quantity and described second quantity, determines the recognition result of described software to be identified;
Character string storehouse builds module, for building described first character string storehouse and described second character string storehouse, wherein, wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
Preferably, described text string extracting module, comprising:
Global address determining unit, for determining the global address in described residue code;
First text string extracting unit, for extracting character string to be identified from determined global address.
Preferably, module removed by described built-in function code, comprising:
Functional blocks construction unit, for building the functional blocks of the executable file after dis-assembling process, and is stored in the 3rd functional blocks set;
Built-in function block determining unit, for determining the functional blocks belonging to built-in function in described 3rd functional blocks set;
First built-in function block removal unit, for removing the determined functional blocks belonging to built-in function;
Described text string extracting module, comprising:
Second text string extracting unit, extracts character string to be identified for belonging to from removal in the remaining functional blocks in the described 3rd functional blocks set of the functional blocks of built-in function.
Preferably, described built-in function block determining unit, for the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Preferably, described character string storehouse builds module, comprising:
Dis-assembling unit, for carrying out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively, and carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Second functional blocks construction unit, for building the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Second built-in function block removal unit, for determining the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and removes the functional blocks belonging to built-in function determined;
First character string storehouse construction unit, for extracting character string and building the first character string storehouse from the remaining functional blocks of described first functional blocks set;
Second character string storehouse construction unit, for extracting character string and building the second character string storehouse from the remaining functional blocks of described second functional blocks set.
Preferably, described character string storehouse builds module, also comprises:
Common character string determining unit, for determining the common character string existed in described first character string storehouse and the second character string storehouse;
Common character string delete cells, for removing the character string of determined common existence from the first character string storehouse and the second character string storehouse.
Preferably, described software recognition device also comprises:
Recognition result display module, for showing the recognition result of described software to be identified
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In order to improve the accuracy utilizing character string identification software, embodiments provide a kind of software identification method and device.
First a kind of software identification method is provided to be introduced to the embodiment of the present invention below.
It should be noted that, a kind of software identification method that the embodiment of the present invention provides is applicable to electronic equipment.Wherein, in actual applications, this electronic equipment can be: mobile phone, panel computer, notebook computer etc.
As shown in Figure 1, a kind of software identification method, can comprise:
S101, carries out dis-assembling process by the executable file of software to be identified;
When needs identification software, after the executable file obtaining software to be identified, this executable file to be identified can be carried out dis-assembling process, thus carry out follow-up process.
Utilize higher level lanquage as higher level lanquages such as C, pascal when it will be appreciated by persons skilled in the art that coding, and then generate the file that directly can be performed by operating system through program compiler, i.e. executable file; And namely dis-assembling refers to these executable file decompilings is reduced into assembly language or other higher level lanquages.
Further, it should be noted that, executable file is binary file, and in actual applications, executable file can be: exe formatted file, sys formatted file or com formatted file etc.
S102, removes the code belonging to built-in function in the executable file after dis-assembling process;
After executive software carries out dis-assembling process, can remove the code belonging to built-in function in the executable file after dis-assembling process, thus making not comprise the code belonging to built-in function in residue code this software to be identified.
Wherein, because built-in function stores several functional blocks, therefore, the code belonging to built-in function in executable file after can removing dis-assembling process by the mode of constructor block, concrete, remove the code belonging to built-in function in the executable file after dis-assembling process, can comprise:
Build the functional blocks of the executable file after dis-assembling process, and be stored in the 3rd functional blocks set;
Determine the functional blocks belonging to built-in function in the 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined.
Wherein, determine the functional blocks belonging to built-in function in the 3rd functional blocks set, can comprise: the current functional blocks to be identified in the 3rd functional blocks set is mated with the functional blocks in built-in function respectively, when mating completely with a certain functional blocks of built-in function, show that current functional blocks to be identified belongs to built-in function, complete the judgement to current functional blocks to be identified, thus next functional blocks to be identified is processed as current functional blocks to be identified.
Further, in order to improve processing speed, the mode can mated by code segmentation, determine the functional blocks belonging to built-in function in the 3rd functional blocks set, concrete mode is: mated to the corresponding first paragraph code of the functional blocks in built-in function respectively by the first paragraph code preset of current functional blocks to be identified, when there is not the built-in function of coupling, showing that current functional blocks to be identified does not belong to built-in function, completing the judgement to this function to be identified; When the first paragraph code matches of the first paragraph code of current functional blocks to be identified and a certain functional blocks of built-in function, then continue through both second segment code judgements whether to mate, and when judged result is no, determine that current functional blocks to be identified does not belong to built-in function, terminate matching process, otherwise, continue the coupling of follow-up code segment; And when there is unmatched situation, terminate coupling, and determine that current functional blocks to be identified does not belong to built-in function, and if all codes all mate time, show that current functional blocks to be identified belongs to built-in function.
It should be noted that, above-mentioned code matches refers to that code is identical; Further, the mode belonging to the code of built-in function in the executable file after above-mentioned removal dis-assembling process, as just example, should not form the restriction to the embodiment of the present invention.
S103, extracts character string to be identified from residue code;
Wherein, from residue code, extract character string to be identified, can comprise: determine the global address in residue code; Character string to be identified is extracted from determined global address.
It should be noted that, because the process belonging to the code of built-in function in the executable file after dis-assembling process can be: the functional blocks building the executable file after dis-assembling process, and be stored in the 3rd functional blocks set; Determine the functional blocks belonging to built-in function in the 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined; Therefore, accordingly, from residue code, extract character string to be identified can comprise: belong to from removal in the remaining functional blocks the 3rd functional blocks set of the functional blocks of built-in function and extract character string to be identified.Wherein, time understandable, the mode extracting character string to be identified from remaining functional blocks is: determine the global address in remaining functional blocks; Character string to be identified is extracted from determined global address.
S104, from this character string to be identified, determines the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
After determining the character string to be identified that this software to be identified is corresponding, can determine that each character string to be identified is the string matching stored with the first character string storehouse, or the string matching stored with the second character string storehouse, or, both all do not mate, second quantity of the first quantity determining the character string to be identified belonging to the first character string storehouse further and the character string to be identified belonging to the second character string storehouse, and then carry out follow-up process.
Concrete, as shown in Figure 2, the building process in this first character string storehouse and the second character string storehouse can comprise:
S201, carries out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively;
S202, carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Wherein, in actual applications, the 3rd quantity and the 4th quantity can be identical or different, and quantity is larger, the first constructed character string storehouse and the confidence level in the second character string storehouse higher.
S203, builds the functional blocks of the first executable file after dis-assembling process, and is stored into the first functional blocks set;
S204, builds the functional blocks of the second executable file after dis-assembling process, and is stored in the second functional blocks set;
S205, determines the functional blocks belonging to built-in function in this first functional blocks set and this second functional blocks set, and removes the functional blocks belonging to built-in function determined;
Wherein, determine with above-mentioned, this first functional blocks set can determine that the mode belonging to the functional blocks of built-in function in the 3rd functional blocks set is identical with the mode belonging to the functional blocks of built-in function in this second functional blocks set, do not repeat them here.
S206, extracts character string and builds the first character string storehouse from the functional blocks that this first functional blocks set is remaining;
S207, extracts character string and builds the second character string storehouse from the functional blocks that this second functional blocks set is remaining.
Wherein, neutralizing from the functional blocks that the first functional blocks set is remaining the mode extracting character string the remaining functional blocks of second functional blocks set can be identical with the above-mentioned mode extracting character string to be identified from remaining functional blocks, and therefore not to repeat here.
Further, because some character string is present in the first character string storehouse and the second character string storehouse jointly, cause it little to the contribution identified, but may matching ratio be affected, the final accuracy affecting recognition result, therefore, affects matching ratio to reduce the common character string existed, behind structure first character string storehouse and the second character string storehouse, described method also comprises:
Determine the common character string existed in this first character string storehouse and the second character string storehouse;
The character string of determined common existence is removed from the first character string storehouse and the second character string storehouse.
It should be noted that, the process in above-mentioned structure first character string storehouse and the second character string storehouse, as just example, should not form the restriction to the embodiment of the present invention.
S105, according to the ratio of this first quantity and this second quantity, determines the recognition result of this software to be identified.
After determining this first quantity and the second quantity, according to the ratio of this first quantity and this second quantity, the recognition result of this software to be identified can be determined.
Wherein, this recognition result can comprise: normal software, Malware, partially normal software or partially Malware; Further, be understandable that, under different application scenarioss, recognition result can only comprise: normal software, Malware, also can only comprise: normal software or partially Malware partially, this is all rational.
Wherein, each recognition result can a corresponding ratio interval, such as: determined ratio belong to ratio corresponding to normal software interval time, the recognition result of this software to be identified is normal software; Belong to ratio corresponding to Malware at determined ratio interval, the recognition result of this software to be identified is Malware; Belong to ratio corresponding to inclined normal software at determined ratio interval, the recognition result of this software to be identified is inclined normal software; Belong to ratio corresponding to inclined Malware at determined ratio interval, the recognition result of this software to be identified is inclined Malware.Be understandable that, under different application scenarioss, the ratio interval corresponding to each recognition result can be different.
Further, after determining the recognition result that this software to be identified is corresponding, this software identification method can also comprise: the recognition result showing this software to be identified.
Further, after the recognition result determining this software to be identified, follow-up process can be carried out further, such as: when determine this recognition result be Malware or partially Malware time, the link of deleting this software to be identified can be provided for user, or, provide warning message to user, the problem may brought after being mounted to warn this software to be identified, is not limited thereto certainly; And when determine this result to be identified be normal software or partially normal software time, corresponding information can be shown to user, to inform that user can this software to be identified of relieved use, certainly be not limited thereto.
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contains the character string extracted from the residue code beyond the code belonging to built-in function corresponding to Malware, and this second character string stock contains the character string extracted residue code beyond the code belonging to built-in function corresponding to normal software; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
It should be noted that, " first " in above-mentioned " the first functional blocks set ", " second " in " the second functional blocks set ", " the 3rd " in " the 3rd functional blocks set ", just to distinguishing different functional blocks set, does not have any limiting meaning; Same, " the 4th " in " the 3rd ", " the 4th quantity " in " second " in " first " in " the first quantity ", " the second quantity ", " the 3rd quantity ", just to distinguishing different quantity, does not have any limiting meaning.
Corresponding to said method embodiment, the embodiment of the present invention additionally provides a kind of software recognition device, as shown in Figure 3, can comprise:
Dis-assembling module 310, for carrying out dis-assembling process by the executable file of software to be identified;
Module 320 removed by built-in function code, for the code belonging to built-in function in the executable file after removing dis-assembling process;
Text string extracting module 330, for extracting character string to be identified from residue code;
Quantity determination module 340, for from described character string to be identified, determine the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Result determination module 350, for the ratio according to described first quantity and described second quantity, determines the recognition result of described software to be identified;
Character string storehouse builds module 360, for building described first character string storehouse and described second character string storehouse, wherein, wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
Wherein, described text string extracting module 330, can comprise:
Global address determining unit, for determining the global address in described residue code;
First text string extracting unit, for extracting character string to be identified from determined global address.
Wherein, module 320 removed by described built-in function code, can comprise:
Functional blocks construction unit, for building the functional blocks of the executable file after dis-assembling process, and is stored in the 3rd functional blocks set;
Built-in function block determining unit, for determining the functional blocks belonging to built-in function in described 3rd functional blocks set;
First built-in function block removal unit, for removing the determined functional blocks belonging to built-in function;
Described text string extracting module, can comprise:
Second text string extracting unit, extracts character string to be identified for belonging to from removal in the remaining functional blocks in the described 3rd functional blocks set of the functional blocks of built-in function.
Wherein, described built-in function block determining unit, for the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Wherein, described character string storehouse builds module 350, can comprise:
Dis-assembling unit, for carrying out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively, and carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Second functional blocks construction unit, for building the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Second built-in function block removal unit, for determining the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and removes the functional blocks belonging to built-in function determined;
First character string storehouse construction unit, for extracting character string and building the first character string storehouse from the remaining functional blocks of described first functional blocks set;
Second character string storehouse construction unit, for extracting character string and building the second character string storehouse from the remaining functional blocks of described second functional blocks set.
Wherein, described character string storehouse builds module 350, can also comprise:
Common character string determining unit, for determining the common character string existed in described first character string storehouse and the second character string storehouse;
Common character string delete cells, for removing the character string of determined common existence from the first character string storehouse and the second character string storehouse.
Further, described software recognition device can also comprise:
Recognition result display module, for showing the recognition result of described software to be identified.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.