Nothing Special   »   [go: up one dir, main page]

CN114443041A - Method for parsing abstract syntax tree and computer program product - Google Patents

Method for parsing abstract syntax tree and computer program product Download PDF

Info

Publication number
CN114443041A
CN114443041A CN202111442982.6A CN202111442982A CN114443041A CN 114443041 A CN114443041 A CN 114443041A CN 202111442982 A CN202111442982 A CN 202111442982A CN 114443041 A CN114443041 A CN 114443041A
Authority
CN
China
Prior art keywords
script
grammar
different
compiling
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111442982.6A
Other languages
Chinese (zh)
Inventor
杨健雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202111442982.6A priority Critical patent/CN114443041A/en
Publication of CN114443041A publication Critical patent/CN114443041A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The embodiment of the disclosure discloses an abstract syntax tree parsing method and a computer program product, wherein the method comprises the following steps: acquiring a script to be analyzed and a compiling type of the script; selecting a target compilation component based on the compilation type; analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same. The technical scheme can normalize scripts written in various languages into the same type of abstract syntax tree, and can improve the adaptation efficiency and accuracy of the scripts.

Description

Method for parsing abstract syntax tree and computer program product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method for parsing an abstract syntax tree and a computer program product.
Background
In the field of script detection, the rule matching is directly performed on a sample file, which easily results in that the matching result is not accurate enough, so that the sample file is usually converted into an abstract syntax tree, and then the rule matching or detection operation of other means is performed on the basis of the abstract syntax tree.
However, the inventor of the present disclosure finds that, in the prior art, the script language is usually converted into the abstract syntax tree by using the existing compiler, although the accuracy of this method is high, there are some problems:
for example, the scripting languages of the script files are various, and different native compilers are needed for different scripting languages; the problem of grammar incompatibility of different versions exists between the same script writing language; the purpose of generating the abstract syntax tree by the native compiler is not completely consistent with the purpose of generating the abstract syntax tree by the detection mode, the native compiler often performs more checks including checking whether library symbols exist or not, and the checks cause extra resource consumption, thereby causing performance degradation; the abstract syntax tree structures generated by different native compilers are different in meaning, and different script compiling languages need to be subjected to customized adaptation; some scripting languages do not support the generation of abstract syntax trees themselves, etc.
Therefore, based on the above existing problems, it is necessary to provide a universal parsing scheme capable of cross-language, so as to parse abstract syntax trees of the same structure type for different scripting languages.
Disclosure of Invention
The embodiment of the disclosure provides an abstract syntax tree parsing method and a computer program product.
In a first aspect, an embodiment of the present disclosure provides a method for parsing an abstract syntax tree, where the method includes:
acquiring a script to be analyzed and a compiling type of the script;
selecting a target compilation component based on the compilation type;
analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same.
Further, the target compilation components corresponding to different compilation types are different.
Further, the target compiling component comprises a lexical parser and a grammar parser; analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script, wherein the abstract syntax tree comprises:
analyzing the script into a word sequence by using the lexical analyzer;
parsing the sequence of words into the abstract syntax tree using the syntax parser.
Further, the method further comprises:
acquiring lexical analysis rules and/or grammar analysis rules written aiming at different compiling types;
generating, with a generator tool, the compiled component based on the lexical parsing rules and/or the grammar parsing rules.
Furthermore, in the lexical analysis rule and the grammar analysis rule, for different compiling types, grammar expression modes in different compiling types are classified based on the semantics and grammar structures of the script, and different analysis processing is performed for different types of grammar expression modes based on classification results.
Further, the type of the grammatical expression includes at least one of:
the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are also the same;
the corresponding semantics of different compiling types are the same and the grammar expression modes with different grammar structures are different;
the corresponding semantics of the same compiling type are the same, but the versions are different, and further the grammar structure is different in grammar expression mode;
a grammatical expression that is present in one compilation type and not present in the other compilation type.
Further, aiming at the same semantics in different compiling types and the grammar expression modes with the same grammar structure, dividing a plurality of composition nodes according to the semantics expressed by the grammar structure, and defining the composition nodes in the grammar parsing rule; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to different compiling types and have the same semantics and different grammar structures, defining all grammar expressions included in the different grammar structures as composition nodes in a grammar parsing rule, wherein the composition nodes form a superset of all grammar expressions in the different grammar structures, and defining the composition nodes in the grammar parsing rule, and emptying the composition nodes corresponding to the grammar expressions which are not supported or exist by the parsed compiling types when an abstract grammar tree is generated; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to the same compiling type, have the same semantics and different versions and further have different grammar structures, the grammar expression mode of normalization can be realized at a lexical level, and grammar expression modes which have the same semantics and different grammar structures are defined and analyzed into the same word sequence in a lexical analysis rule; the grammar expression mode which can not realize normalization at the lexical level is normalized in a grammar parsing rule;
and the grammar expression modes exist in one compiling type but do not exist in other compiling types, and the grammar in the grammar expression modes is separately defined in a grammar parsing rule to express the corresponding component nodes.
In a second aspect, an embodiment of the present disclosure provides a script detection method, where the script detection method includes:
acquiring a script to be detected;
calling a pre-deployed script safety detection interface to perform safety detection on the script; the script safety detection interface selects a target compiling component based on the compiling type of the script, analyzes the script based on the target compiling component, outputs an abstract syntax tree corresponding to the script, and detects the script based on the abstract syntax tree; the structure types of the abstract syntax trees output by the target compiling component corresponding to different compiling types are the same;
and outputting the safety detection result of the script.
In a third aspect, an embodiment of the present invention provides an apparatus for parsing an abstract syntax tree, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a script to be analyzed and a compiling type of the script;
a selection module configured to select a target compilation component based on the compilation type;
the analysis module is configured to analyze the script based on the target compiling component and output an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same.
In a fourth aspect, an embodiment of the present invention provides a script detecting apparatus, where the script detecting apparatus includes:
the third acquisition module is configured to acquire the script to be detected;
the calling module is configured to call a pre-deployed script safety detection interface to perform safety detection on the script; the script safety detection interface selects a target compiling component based on the compiling type of the script, analyzes the script based on the target compiling component, outputs an abstract syntax tree corresponding to the script, and detects the script based on the abstract syntax tree; the structure types of the abstract syntax trees output by the target compiling component corresponding to different compiling types are the same;
an output module configured to output a security detection result of the script.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the apparatus includes a memory configured to store one or more computer instructions that enable the apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.
In a fifth aspect, the disclosed embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement the method of any of the above aspects.
In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for use by any of the above apparatuses, the computer instructions, when executed by a processor, being configured to implement the method of any of the above aspects.
In a seventh aspect, the disclosed embodiments provide a computer program product comprising computer instructions, which when executed by a processor, are configured to implement the method of any one of the above aspects.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in the script parsing process, different target compiling components are preset for different compiling types, after a script to be parsed and the compiling type of the script are received, the corresponding target compiling component is selected based on the compiling type, then the script is parsed by the target compiling component, and an abstract syntax tree corresponding to the script is output, wherein the abstract syntax tree is used for expressing the syntax structure of each script statement in the script; in the embodiment of the disclosure, the types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same, so that scripts compiled in various languages can be normalized into the abstract syntax trees of the same type by the embodiment of the disclosure, and the adaptation efficiency and accuracy of the scripts can be improved; in addition, after the embodiment is applied to the field of security detection of the script, the security detection efficiency of the script can be improved, and the security detection resource of the script can be saved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 illustrates a flow diagram of a method of parsing an abstract syntax tree according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a structure of an abstract syntax tree generated for an if statement in an embodiment of the present disclosure;
FIGS. 3(a) and 3(b) are schematic structural diagrams illustrating abstract syntax trees generated for the semantics of the functions in java and php according to an embodiment of the present disclosure;
4(a) and 4(b) are schematic structural diagrams of abstract syntax trees generated for a java12 new grammar expression mode and a java original grammar expression mode according to an embodiment of the disclosure;
FIG. 5 is a diagram illustrating an abstract syntax tree for a unique syntax representation of the bash language according to an embodiment of the present disclosure;
FIG. 6 shows a flowchart of a script detection method according to an embodiment of the present disclosure;
FIG. 7 is a diagram illustrating an application scenario of a parsing method for an abstract syntax tree according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device suitable for implementing a parsing method and/or a script detection method of an abstract syntax tree according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, actions, components, parts, or combinations thereof, and do not preclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof are present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The details of the embodiments of the present disclosure are described in detail below with reference to specific embodiments.
Fig. 1 illustrates a flowchart of a parsing method of an abstract syntax tree according to an embodiment of the present disclosure. As shown in fig. 1, the parsing method of the abstract syntax tree includes the following steps:
in step S101, a script to be analyzed and a compiling type of the script are acquired;
in step S102, selecting a target compilation component based on the compilation type;
in step S103, parsing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same.
In this embodiment, a script is a program that can be interpreted and executed, and a script may include a plurality of script statements. The script statements may be executable statements written in a scripting language that may include, but is not limited to, jsp, php, asp, bash, python, js, vb, c #, powershell, and the like.
The embodiment of the disclosure can be applied to the field of security detection of scripts or other application scenarios using abstract syntax trees of the scripts. The following describes the field of security detection of scripts as an example. In order to detect the script, the script can be compiled first, the compiling process includes lexical analysis and grammar analysis, and finally an abstract syntax tree corresponding to the script statement in the script is obtained. After the abstract syntax tree corresponding to the script is generated, the script detection can be performed on the basis of the abstract syntax tree, and the accuracy of the script detection can be improved. The abstract syntax tree is an abstract representation of the source code syntax structure of the script statement, which represents the syntax structure of the script statement in the form of a tree, and each constituent node on the abstract syntax tree represents one syntax representation in the script statement.
The script detection may be detection on whether malicious codes exist in the script or other types of detection, and may be specifically determined according to actual needs, which is not specifically limited herein. The principle of script detection is that whether a script statement realizing a predetermined function exists in a script is determined by matching a grammar expression mode of the script statement in the script, and if the script statement exists, a safety detection result can be output based on the script statement. Functions implemented by the script statement may be judged from the semantics of the script statement, and thus the embodiments of the present disclosure parse the syntax structure in the script statement into the form of an abstract syntax tree based on the semantics of the script statement, and further detect the script by matching whether a branch of a predetermined syntax structure exists in the abstract syntax tree.
Considering that a script to be detected may be written by various types of scripting languages, in order to improve efficiency of script detection written in different languages and reduce complexity of script detection, the embodiments of the present disclosure provide an abstract syntax tree parsing method, which can parse script statements written in different languages to obtain an abstract syntax tree of a uniform type, and can understand the abstract syntax tree generated by the embodiments of the present disclosure in a uniform adaptation manner, without requiring customized adaptation of abstract syntax trees corresponding to script statements written in different languages, as in the prior art.
In this embodiment, after receiving a script to be analyzed and a compiling type of the script, a corresponding target compiling component is selected based on the compiling type. The compiling type is corresponding to the script language for compiling the script, and different script languages correspond to different compiling types. The target compilation component may include, but is not limited to, a lexical parser and a syntactic parser. The lexical parser is used for parsing script sentences in the script into word (token) sequences, and the grammar parser is used for generating an abstract grammar tree based on the word sequences.
Different compiling types correspond to different target compiling components, that is, scripts written in different languages need to select different target compiling components for compiling. For example, a script written in the jsp language employs a jsp-type target compilation component, while a script written in the python language employs a python-type target compilation component. It should be noted that the target compiling component mentioned here is not a native compiler corresponding to the scripting language, but is a compiling component that can compile scripts of different compiling types into an abstract syntax tree of the same type in the embodiment of the present disclosure.
And inputting the script into the target compiling component for analysis, and finally outputting the abstract syntax tree corresponding to the script. In the case where a plurality of script statements are included in the script, a plurality of abstract syntax trees may be output.
Although scripts written in different compiling types, that is, different scripting languages, are analyzed by using different compiling components, types of the finally output abstract syntax trees are the same, that is, structures of the abstract syntax trees output by the target compiling components corresponding to the different compiling types can be adapted in a uniform adaptation mode.
In the script parsing process, different target compiling components are preset for different compiling types, after a script to be parsed and the compiling type of the script are received, the corresponding target compiling component is selected based on the compiling type, then the script is parsed by the target compiling component, and an abstract syntax tree corresponding to the script is output, wherein the abstract syntax tree is used for expressing the syntax structure of each script statement in the script; in the embodiment of the disclosure, the types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same, so that scripts written in various languages can be normalized into the abstract syntax trees of the same type through the embodiment of the disclosure, and the adaptation efficiency and accuracy of the scripts can be improved.
In an optional implementation manner of this embodiment, the target compiling components corresponding to different compiling types are different.
In this optional implementation, different compiling types may correspond to different scripting languages, that is, scripts written in different scripting languages correspond to different compiling types. Because different scripting languages have different compiling modes, for scripts of different compiling types, corresponding target compiling components are preset in the embodiment of the disclosure, and the different compiling types correspond to the different target compiling components. While the structure type of the abstract syntax tree that is ultimately output by different target compilation components is the same. That is, the embodiments of the present disclosure implement normalization processing of the abstract syntax tree for multiple scripting languages, so that the finally output abstract syntax tree is consistent in type, and the abstract syntax tree output by different target compilation components corresponding to different compilation types can be adapted in a unified adaptation manner.
In an optional implementation manner of this embodiment, the target compiling component includes a lexical parser and a grammar parser; step S103, analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script, further including the following steps:
analyzing the script into a word sequence by using the lexical analyzer;
parsing the sequence of words into the abstract syntax tree using the syntax parser.
In this alternative implementation, the target compilation component includes a lexical parser and a grammar parser. When the script is analyzed by the target compiling component, the script statement in the script can be input into the lexical analyzer, and the lexical analyzer divides the script statement into word sequences.
For example, for a sentence written in java language "String a ═ abc", the lexical parser can parse out the following word sequence:
1.IDENTIFIER[String]
2.IDENTIFIER[a]
3.ASSIGN[=]
4.STRING[abc]
the grammar parser can parse the grammar structure of the script sentence according to the word sequence output by the lexical parser, and the grammar structure is displayed in the form of an abstract grammar tree.
In an optional implementation manner of this embodiment, the method further includes the following steps:
acquiring lexical analysis rules and/or grammar analysis rules written aiming at different compiling types;
generating, with a generator tool, the compiled component based on the lexical parsing rules and/or the grammar parsing rules.
In this optional implementation manner, in order to implement that scripts corresponding to different compilation types can be finally parsed into an abstract syntax tree of the same type, in the embodiment of the present disclosure, corresponding syntax parsing rules and lexical parsing rules are written in advance for script languages of different compilation types. In some embodiments, the grammar parsing rules and lexical parsing rules may be written in the form of regular expressions.
In some embodiments, a lexical parser may be generated for lexical parsing rules written by the relevant person using existing generator tools, and a syntactic parser may be generated based on the syntactic parsing rules.
In some embodiments, the generator tool may choose a combination of one or more of flex, bison, and re2 c.
In the embodiment, related personnel only need to modify the lexical analysis rule and the grammatical analysis rule when needed, the maintenance is relatively simple, the maintenance cost can be reduced, and the efficiency is improved.
It can be understood that the lexical parser or the syntactic parser is actually a piece of executable code, and during the execution process, for example, the lexical parser takes a script sentence as input and parses the script sentence into word sequences according to the lexical parsing rules, and the syntactic parser takes the output of the lexical parser as input, that is, takes the word sequence corresponding to the script sentence as input, parses the syntactic structure of the script sentence from the word sequence according to the syntactic parsing rules, and then displays the syntactic structure in the form of an abstract syntactic tree.
In an optional implementation manner of this embodiment, in the lexical analysis rule and the syntactic analysis rule, for different compiling types, the syntactic expression manners in the different compiling types are classified based on the semantics and the syntactic structures of the script, and different analysis processing is performed for different types of syntactic expression manners based on the classification result.
In this optional implementation manner, in the syntax parsing rules and the lexical parsing rules corresponding to different compiling types used in the embodiment of the present disclosure, by classifying syntax expression manners in different script languages, different parsing processing manners are adopted for syntax expression manners divided into different types, and the same parsing processing is adopted for syntax expression manners divided into the same type, so that abstract syntax trees with the same structure type can be finally obtained. Meanwhile, because the same parsing processing is adopted in the syntax expression modes of the same type, the abstract syntax trees parsed by sentences written in different languages with the same semantics are not only of the same type, but also of the same structure.
For example, if statements differ in grammatical expression in different languages, which are respectively exemplified as follows:
Figure BDA0003383982050000071
Figure BDA0003383982050000081
in the script sentences which are written in the four languages and represent the same semantics, the syntax expression modes are different, but the semantics and the syntax structure are the same, so that the syntax structure corresponding to the if sentence can be semantically divided into the following three component nodes: judging the expression, judging the sentence block successfully executed, and judging the sentence block failed to be executed.
Thus, abstract syntax trees of the same type and structure as shown in fig. 2 can be generated for such semantically identical syntax expressions.
Fig. 2 shows a schematic structural diagram of an abstract syntax tree generated for an if statement in the embodiment of the present disclosure. As shown in FIG. 2, the root node [ IF _ STATEMENT: ] represents an IF statement, and includes three component nodes: the three constituent nodes respectively correspond to a judgment expression, a judgment success execution statement BLOCK and a judgment failure execution statement BLOCK. The failed execution statement block in the statement is an if statement again, and thus is still parsed into an abstract syntax subtree comprising three constituent nodes in the above manner.
In an optional implementation manner of this embodiment, the type of the syntax expression includes at least one of the following:
the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are also the same;
the corresponding semantics of different compiling types are the same and the grammar expression modes with different grammar structures are different;
the corresponding semantics of the same compiling type are the same, but the versions are different, and further the grammar structure is different in grammar expression mode;
a grammatical expression that is present in one compilation type and not present in the other compilation type.
In an optional implementation manner of this embodiment, for syntax expression manners in which the same semantics and syntax structures in different compilation types are also the same, a plurality of component nodes are divided according to the semantics represented by the syntax structures, and the plurality of component nodes are defined in the syntax parsing rule; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to different compiling types and have the same semantics and different grammar structures, defining all grammar expressions included in the different grammar structures as composition nodes in a grammar parsing rule, wherein the composition nodes form a superset of all grammar expressions in the different grammar structures, and defining the composition nodes in the grammar parsing rule, and emptying the composition nodes corresponding to the grammar expressions which are not supported or exist by the parsed compiling types when an abstract grammar tree is generated; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to the same compiling type, have the same semantics and different versions and further have different grammar structures, the grammar expression mode of normalization can be realized at a lexical level, and grammar expression modes which have the same semantics and different grammar structures are defined and analyzed into the same word sequence in a lexical analysis rule; normalizing in a grammar analysis rule in a grammar expression mode that normalization cannot be realized at a lexical level;
and the grammar expression modes exist in one compiling type but do not exist in other compiling types, and the grammar in the grammar expression modes is separately defined in a grammar parsing rule to express the corresponding component nodes.
In this optional implementation manner, in the lexical analysis rule and the grammar analysis rule, grammar expression manners in the script language corresponding to different compiling types are at least divided into one or more combinations of the following four manners:
1. the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are the same:
in the grammar expression mode, when the semantics are the same and the grammar structure is the same, the grammar structure can be divided into a plurality of composition nodes according to the semantics expressed by the grammar structure, and the script statement with the grammar expression mode is analyzed into an abstract grammar tree of the grammar structure comprising the composition nodes. The if semantics mentioned above are the syntactic expressions of this type, and the syntactic expressions having the same semantics and the same syntactic structure, such as for cycle semantics, while cycle semantics, foreach semantics, etc., can be analyzed in this way. It is to be understood that the parsing processing manner described above is defined in the syntax parsing rule.
2. The corresponding semantemes of different compiling types are the same and the grammar expression modes with different grammar structures are as follows:
in the grammar expression mode, because the semantics are the same and the grammar structure is different, aiming at the same semantics which can be expressed from different grammar structures of various languages in the grammar expression mode, the grammar structure is divided into a plurality of composition nodes which correspond to all grammar representations in all grammar structures of the same semantics, namely the composition nodes of the abstract grammar tree corresponding to the grammar expression mode can comprise any grammar representation in the grammar expression modes in various languages, and all composition node types are defined in a semantic parsing rule; in parsing, it is sufficient to blank component nodes that do not exist in the grammar structure of the current language.
The following examples illustrate:
//java
public static void func(int a,char b)
{
return a+b;
}
//php
function func($a,$b){
return$a+$b;
}
in the grammatical expression modes of the function semantics in the two languages, the java language supports grammatical expressions such as annotation, function qualifier, function return value type, function name, function parameter list and function execution body.
In order to normalize the syntax expressions with the same semantic and different syntax structures, a superset of syntax expressions can be defined to express the syntax structures, the superset of syntax expressions comprises all syntax expressions in the syntax expressions with the same semantic and different syntax structures in various languages, and furthermore, the syntax expressions in the superset are defined as the component nodes of the syntax expressions with the same semantic and different syntax structures in the syntax parsing rules, and the component nodes corresponding to the absent or unsupported syntax expressions are emptied when parsing the syntax.
For example, the fields shown in table 1 below may be defined to represent the constituent nodes of the function definition part:
TABLE 1
Figure BDA0003383982050000101
Figure BDA0003383982050000111
It should be noted that, in practical applications, the fields that need to be defined by the function definition semantics need to be defined more than those in the superscript automatically, such as supporting annotation, threads semantics, default semantics, etc., and are not expanded one by one for illustrative purposes only.
If a field is not supported by a language, such as php does not support a function return value type, the field may be set to null.
Fig. 3(a) and 3(b) illustrate structural diagrams of abstract syntax trees generated for the semantics of the functions in java and php according to an embodiment of the present disclosure. As shown in fig. 3(a), it shows the structure of the abstract syntax tree generated by the target compilation component in the embodiment of the present disclosure for the FUNCTION definition statement written in java language above, the syntax tree structure of the FUNCTION definition statement includes two constituent nodes, [ FUNCTION _ HEADER: ] (FUNCTION definition HEADER) and [ BLOCK _ STATEMENT: ] (FUNCTION body), the FUNCTION definition HEADER includes four constituent nodes, respectively: [ decoding _ MODIFIER _ LIST: ], [ SPECIFIER _ VOID: ], [ IDENTIFIER: func ], and [ PARAMETER _ COMMA _ LIST: ], respectively represent a descriptor LIST (e.g., public, static, final, etc.), a function return value type (e.g., VOID, int, String, etc.), a function name, and a PARAMETER LIST. The descriptor list and the reference list in the four constituent nodes include the constituent nodes shown in fig. 3(a), and refer to fig. 3(a), which is not necessarily described herein.
As shown in fig. 3(b), it shows the structure of the abstract syntax tree generated by the target compiling component in the embodiment of the present disclosure for the FUNCTION definition statement written in php language, the root node of the syntax tree of the FUNCTION definition statement includes two constituent nodes, [ FUNCTION _ HEADER: ] (FUNCTION definition HEADER) and [ BLOCK _ STATEMENT: ] (FUNCTION body), the FUNCTION definition HEADER includes four constituent nodes, respectively: [ NULL ], [ IDENTIFIER: func ], and [ PARAMETER _ COMMA _ LIST ], where the first two constituent nodes correspond to the descriptor LIST and function return value type in the class syntax structure, but are nulled in the abstract syntax tree because php does not support both syntax representations, and the last two constituent nodes are similar to the java language, function name and PARAMETER LIST, respectively. The reference list in the latter two constituent nodes includes the constituent nodes shown in fig. 3(b), and particularly, the reference list is shown in fig. 3(b), which is not necessarily described herein.
As can be seen from fig. 3(a) and 3(b), the method proposed based on the embodiment of the present disclosure can parse script statements written in two different languages, which have the same semantics and different syntax structures, into abstract syntax trees with the same type.
3. Syntax expression modes with the same semantics and different versions and different syntax structures corresponding to the same compiling type
In this type of syntax expression, it is mainly considered that the same compiling type, i.e. the same used writing language, is used, but the language versions are different, resulting in that the two versions have different syntax expression for the same semantics, which is exemplified below.
For example, java15 later began to support the TextBlocks grammar, such as the following two grammatical expressions meant consistent:
//java
String a="abc";
//java
String a="""abc""";
in some embodiments, in order to obtain an abstract syntax tree with the same structure for the two syntax expression modes, the embodiment of the present disclosure performs normalization processing on a lexical parsing layer, that is, the two different syntax expression modes are compatible in a lexical parsing rule, for example, the text in the above example is inconsistent, but the lexical parser is required to output the same word sequence as follows:
1.IDENTIFIER[String]
2.IDENTIFIER[a]
3.ASSIGN[=]
4.STRING[abc]
after the same word sequence is output, subsequent syntactic analysis is not influenced, and further the generation of a subsequent abstract syntactic tree structure is not influenced.
In some embodiments, normalization processing may also be performed at a syntax parsing level when the lexical parsing level cannot be compatible, as illustrated below.
For example, simplified switch syntax expressions are supported starting from java12 as follows:
Figure BDA0003383982050000121
for the situation, normalization processing can be performed from a grammar parsing layer, and the two grammar expression modes can obtain two different word sequences from a lexical parsing layer:
novel expression word sequence of// java12
1.SWITCH[switch]
2.LP[(]
3.IDENTIFIER[str]
4.RP[)]
5.LC[{]
6.CASE[case]
7.STRING[1]
8.POINT_TO[->]
9.IDENTIFIER[a]
10.ASSIGN[=]
11.NUMBER[2]
12.SEMICOLON[;]
13.DEFAULT[default]
14.POINT_TO[->]
15.IDENTIFIER[a]
16.ASSIGN[=]
17.NUMBER[1]
18.SEMICOLON[;]
19.RC[}]
// java original expression word sequence
1.SWITCH[switch]
2.LP[(]
3.IDENTIFIER[str]
4.RP[)]
5.LC[{]
6.CASE[case]
7.STRING[1]
8.COLON[:]
9.IDENTIFIER[a]
10.ASSIGN[=]
11.NUMBER[2]
12.SEMICOLON[;]
13.DEFAULT[default]
14.COLON[:]
15.IDENTIFIER[a]
16.ASSIGN[=]
17.NUMBER[1]
18.SEMICOLON[;]
19.RC[}]
For the different word sequences, the recognition rules and the rules for establishing the abstract syntax tree structure can be defined from the syntax parsing level, that is, from the syntax parsing rules, so that the same type of abstract syntax tree structure can be generated for the two different word sequences.
Note that the above is only a few examples, and actually there are various expression sequences different between the same languages, addition and deletion of language characteristics, and similar recognition and processing can be performed similarly.
Fig. 4(a) and 4(b) are schematic structural diagrams of abstract syntax trees generated for the new syntax expression mode and the original syntax expression mode of java12 according to an embodiment of the present disclosure. As shown in FIG. 4(a), the abstract syntax tree corresponding to the SWITCH semantics corresponding to the java12 newly added syntax expression includes two component nodes, which are [ IDENTIFIER: num ] and [ SWITCH _ BLOCK: ] [ SWITCH _ STATEMENT _ GROUP _ LIST: ], and respectively correspond to the parameter IDENTIFIER and type, and the execution BLOCK of the SWITCH semantics, which includes two component nodes, [ SWITCH _ STATEMENT _ GROUP: ] and SWITCH _ STATEMENT _ GROUP: ], and respectively correspond to two SWITCH branch execution BLOCKs, which respectively include two component nodes, as shown in FIG. 3(a), and will not be described one by one here.
As shown in fig. 4(b), the abstract syntax tree structure corresponding to the switch semantics corresponding to the original syntax expression of java is basically the same as the abstract syntax tree structure corresponding to the switch semantics corresponding to the newly added syntax expression of java12, and the description thereof is not repeated.
4. Wherein grammatical expressions that exist in one compilation type but do not exist in other compilation types
Since this type of syntax expression does not exist in other compilation types, a parsing definition for this type of syntax expression may be added to the syntax parsing rule. The following examples are given.
For example, the grammar expression in the bash language, such as: cmd arg1"arg2"
The syntax expression has no corresponding syntax expression in other languages, so that a node type of an abstract syntax tree can be added, for example, the node types shown in table 2 below are added:
TABLE 2
Figure BDA0003383982050000141
Fig. 5 is a schematic structural diagram of an abstract syntax tree of a unique syntax expression of the bash language according to an embodiment of the present disclosure. As shown in FIG. 5, the root node [ COMMAND: ] of the abstract syntax tree includes two constituent nodes: [ IDENTIFIER: cmd ] and [ ARGUMENT _ COMMA _ LIST: ], respectively corresponding to a command name and an ARGUMENT LIST, the ARGUMENT LIST comprising two constituent nodes: [ IDENTIFIER: arg1] and [ STRING: [ STRING ] arg2], which are the types of two parameters in the parameter list, respectively.
Fig. 6 shows a flowchart of a script detection method according to an embodiment of the present disclosure. As shown in fig. 6, the script detection method includes the following steps:
in step S601, a script to be detected is acquired;
in step S602, a pre-deployed script security detection interface is called to perform security detection on the script; the script safety detection interface selects a target compiling component based on the compiling type of the script, analyzes the script based on the target compiling component, outputs an abstract syntax tree corresponding to the script, and detects the script based on the abstract syntax tree; the structure types of the abstract syntax trees output by the target compiling component corresponding to different compiling types are the same;
in step S603, a security detection result of the script is output.
In this embodiment, a script is a program that can be interpreted and executed, and a script may include a plurality of script statements. The script statements may be executable statements written in a scripting language that may include, but is not limited to, jsp, php, asp, bash, python, js, vb, c #, powershell, and the like.
The method may be performed in the cloud. The script safety detection interface can be deployed in the cloud in advance, the script safety detection interface can be a Saas (Software-as-a-service) interface, a demand party can obtain the use right of the script safety detection interface in advance, a script to be detected can be detected by calling the script safety detection interface when needed, and the script safety detection interface is used for realizing the script detection method provided by the embodiment of the disclosure.
In this embodiment, the demander can upload one or more scripts that need to detect to the high in the clouds, detects this one or more scripts by the script safety inspection interface who deploys in the high in the clouds to the safety inspection result of output every script, this safety inspection result can return to the demander. In the embodiment of the disclosure, the script safety detection interface may compile the script first in order to detect the script, where the compiling process includes lexical analysis and syntax analysis, and finally obtains an abstract syntax tree corresponding to a script statement in the script, and when one script includes a plurality of script statements, a plurality of abstract syntax trees may be generated, and each abstract syntax tree corresponds to one complete script statement. After the abstract syntax tree corresponding to the script is generated, the script detection can be performed on the basis of the abstract syntax tree, and the accuracy of the script detection can be improved. This is because if the detection is directly performed on the script text, false detection may be caused, for example, there are comments in the script; furthermore, the type of a certain text segment cannot be accurately determined by directly detecting the script text, for example, whether the text segment is a character in a character string cannot be distinguished. The abstract syntax tree is an abstract representation of the source code syntax structure of the script statement, which represents the syntax structure of the script statement in the form of a tree, and each constituent node on the abstract syntax tree represents one syntax representation in the script statement. If the script is analyzed into the form of an abstract syntax tree, functions called in script text, whether used character strings are sensitive character strings and the like can be directly determined in a rule matching mode.
The principle of script detection is that whether a script statement realizing a predetermined function exists in a script is determined by matching a grammar expression mode of the script statement in the script, and if the script statement exists, a safety detection result can be output based on the script statement. Functions implemented by the script statement may be judged from the semantics of the script statement, and thus the embodiments of the present disclosure parse the syntax structure in the script statement into the form of an abstract syntax tree based on the semantics of the script statement, and further detect the script by matching whether a branch of a predetermined syntax structure exists in the abstract syntax tree.
Taking the detection of malicious code as an example, after the script is parsed into the abstract syntax tree by the parsing method of the abstract syntax tree mentioned above, traversal may be performed from the root node of the abstract syntax tree to see whether a predetermined statement can be matched in the traversal process, where the predetermined statement corresponds to the malicious code, such as a statement for modifying a system command, a statement for creating a certain function, and the like. After the predetermined statements are matched, the script can be considered to have malicious codes, and security detection results, such as malicious code identifications and the positions of the malicious codes in the script, can be output.
The parsing process of the abstract syntax tree can be referred to the above description, and is not described herein again.
In the embodiment of the disclosure, for the script to be detected, the script written in any language is analyzed into the abstract syntax tree of a uniform type by using the above-mentioned analytic method of the abstract syntax tree; therefore, in the process of the script security detection, different security detection methods do not need to be customized for different languages, but a uniform security detection method is adopted to traverse the generated abstract syntax tree, and then a security detection result is obtained based on the traversal result. Through the embodiment of the disclosure, the safety detection efficiency of the script can be improved, and the safety detection resources of the script are saved.
In an optional implementation manner of this embodiment, the security detection result includes whether the script includes webshell and/or malicious code.
In this optional implementation manner, the script security detection interface may perform webshell detection and/or malicious code detection on the script to be detected. The webshell is a command execution environment existing in script files such as asp, jsp and PHP, and can also be called a web page backdoor, after an attacker invades a website server, the webshell backdoor file and a normal script file in a web directory of the website server are generally put together, and then a browser is used for accessing the webshell backdoor file to obtain the webshell command execution environment, so that the purpose of controlling the website server is achieved.
And for the script to be detected, whether the webshell exists in the script can be matched in a rule matching mode. In the embodiment of the disclosure, after the script to be detected is analyzed into the abstract syntax tree, whether the webshell exists in the script is determined from the structure of the abstract syntax tree by using a rule matching mode.
In addition, whether a predetermined function is called in the script to be detected or whether a sensitive character string is included or the like can be determined based on the abstract syntax tree structure, and the predetermined function can be a function including malicious code.
In an optional implementation manner of this embodiment, the target compiling components corresponding to different compiling types are different.
In this optional implementation, different compiling types may correspond to different scripting languages, that is, scripts written in different scripting languages correspond to different compiling types. Because different scripting languages have different compiling modes, for scripts of different compiling types, corresponding target compiling components are preset in the embodiment of the disclosure, and the different compiling types correspond to the different target compiling components. While the structure type of the abstract syntax tree that is ultimately output by different target compilation components is the same. That is, the embodiments of the present disclosure implement normalization processing of the abstract syntax tree for multiple scripting languages, so that the finally output abstract syntax tree is consistent in type, and the abstract syntax tree output by different target compilation components corresponding to different compilation types can be adapted in a unified adaptation manner.
In an optional implementation manner of this embodiment, the target compiling component includes a lexical parser and a grammar parser; analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script, wherein the method further comprises the following steps:
analyzing the script into a word sequence by using the lexical analyzer;
parsing the sequence of words into the abstract syntax tree using the syntax parser.
In this alternative implementation, the target compilation component includes a lexical parser and a grammar parser. When the script is analyzed by the target compiling component, the script statement in the script can be input into the lexical analyzer, and the lexical analyzer divides the script statement into word sequences.
The grammar parser can parse the grammar structure of the script sentence according to the word sequence output by the lexical parser, and the grammar structure is displayed in the form of an abstract grammar tree.
In an optional implementation manner of this embodiment, the method further includes the following steps:
the script safety detection interface acquires lexical analysis rules and/or grammar analysis rules written aiming at different compiling types, and generates the compiling component based on the lexical analysis rules and/or the grammar analysis rules by utilizing a generator tool.
In this optional implementation manner, in order to implement that scripts corresponding to different compilation types can be finally parsed into an abstract syntax tree of the same type, in the embodiment of the present disclosure, corresponding syntax parsing rules and lexical parsing rules are written in advance for script languages of different compilation types. In some embodiments, the grammar parsing rules and lexical parsing rules may be written in the form of regular expressions.
In some embodiments, a lexical parser may be generated for lexical parsing rules written by the relevant person using existing generator tools, while a syntactic parser is generated based on syntactic parsing rules.
In some embodiments, the generator tool may choose a combination of one or more of flex, bison, and re2 c.
In the embodiment, related personnel only need to modify the lexical analysis rule and the grammatical analysis rule when needed, the maintenance is relatively simple, the maintenance cost can be reduced, and the efficiency is improved.
It can be understood that the lexical parser or the syntactic parser is actually a piece of executable code, and during the execution process, for example, the lexical parser takes a script sentence as input and parses the script sentence into word sequences according to the lexical parsing rules, and the syntactic parser takes the output of the lexical parser as input, that is, takes the word sequence corresponding to the script sentence as input, parses the syntactic structure of the script sentence from the word sequence according to the syntactic parsing rules, and then displays the syntactic structure in the form of an abstract syntactic tree.
In an optional implementation manner of this embodiment, in the lexical analysis rule and the syntactic analysis rule, for different compiling types, the syntactic expression manners in the different compiling types are classified based on the semantics and the syntactic structures of the script, and different analysis processing is performed for different types of syntactic expression manners based on the classification result.
In this optional implementation manner, in the syntax parsing rules and the lexical parsing rules corresponding to different compiling types used in the embodiment of the present disclosure, by classifying syntax expression manners in different script languages, different parsing processing manners are adopted for syntax expression manners divided into different types, and the same parsing processing is adopted for syntax expression manners divided into the same type, so that abstract syntax trees with the same structure type can be finally obtained. Meanwhile, because the same parsing processing is adopted in the syntax expression modes of the same type, the abstract syntax trees parsed by sentences written in different languages with the same semantics are not only of the same type, but also of the same structure.
In an optional implementation manner of this embodiment, the type of the syntax expression includes at least one of the following:
the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are also the same;
the corresponding semantics of different compiling types are the same and the grammar expression modes with different grammar structures are different;
the corresponding semantics of the same compiling type are the same, but the versions are different, and further the grammar structure is different in grammar expression mode;
a grammatical expression that is present in one compilation type and not present in the other compilation type.
In an optional implementation manner of this embodiment, for syntax expression manners in which the same semantics and syntax structures in different compilation types are also the same, a plurality of component nodes are divided according to the semantics represented by the syntax structures, and the plurality of component nodes are defined in the syntax parsing rule; and/or the presence of a gas in the atmosphere,
aiming at grammar expression modes which correspond to different compiling types and have the same semantics and different grammar structures, defining all grammar expressions included in the different grammar structures as composition nodes in a grammar parsing rule, wherein the composition nodes form a superset of all grammar expressions in the different grammar structures, and defining the composition nodes in the grammar parsing rule, and emptying the composition nodes corresponding to the grammar expressions which are not supported or exist by the parsed compiling types when an abstract grammar tree is generated; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to the same compiling type, have the same semantics and different versions and further have different grammar structures, the grammar expression mode of normalization can be realized at a lexical level, and grammar expression modes which have the same semantics and different grammar structures are defined and analyzed into the same word sequence in a lexical analysis rule; the grammar expression mode which can not realize normalization at the lexical level is normalized in a grammar parsing rule;
and the grammar expression modes exist in one compiling type but do not exist in other compiling types, and the grammar in the grammar expression modes is separately defined in a grammar parsing rule to express the corresponding component nodes.
In this optional implementation manner, in the lexical analysis rule and the grammar analysis rule, grammar expression manners in the script language corresponding to different compiling types are at least divided into one or more combinations of the following four manners:
1. the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are the same:
in the grammar expression mode, when the semantics are the same and the grammar structure is the same, the grammar structure can be divided into a plurality of composition nodes according to the semantics expressed by the grammar structure, and the script statement with the grammar expression mode is analyzed into an abstract grammar tree of the grammar structure comprising the composition nodes. The if semantics mentioned above are the syntactic expressions of this type, and the syntactic expressions having the same semantics and the same syntactic structure, such as for cycle semantics, while cycle semantics, foreach semantics, etc., can be analyzed in this way. It is to be understood that the parsing processing manner described above is defined in the syntax parsing rule.
2. The corresponding semantemes of different compiling types are the same and the grammar expression modes with different grammar structures are as follows:
in the grammar expression mode, because the semantics are the same and the grammar structure is different, aiming at the same semantics which can be expressed from different grammar structures of various languages in the grammar expression mode, the grammar structure is divided into a plurality of composition nodes which correspond to all grammar representations in all grammar structures of the same semantics, namely the composition nodes of the abstract grammar tree corresponding to the grammar expression mode can comprise any grammar representation in the grammar expression modes in various languages, and all composition node types are defined in a semantic parsing rule; in parsing, it is sufficient to blank component nodes that do not exist in the grammar structure of the current language.
3. Syntax expression modes with the same semantics and different versions and different syntax structures corresponding to the same compiling type
In the syntax expression mode, the main consideration is that the compiling types are the same, namely the used compiling languages are the same, and the language versions are different, so that the two versions have different syntax expression modes for the same semantics.
4. A grammatical expression that is present in one compilation type and not present in the other compilation type.
Since this type of syntax expression does not exist in other compilation types, a parsing definition for this type of syntax expression may be added to the syntax parsing rule.
Fig. 7 is a schematic diagram illustrating an application scenario of a parsing method of an abstract syntax tree according to an embodiment of the present disclosure. As shown in fig. 7, a compilation component list library is deployed in the cloud server, and the compilation component list library includes a plurality of target compilation components corresponding to a plurality of languages, which are established in the manner described above. The cloud server can receive the script to be detected uploaded by the user and the compiling type of the script to be detected, and the cloud server calls the corresponding target compiling component to compile the script to be detected based on the compiling type of the script to be detected, so that the corresponding abstract syntax tree is generated.
And the cloud server calls a malicious script detection code to adapt the abstract syntax tree, if a script statement matched with a preset structure exists in the abstract syntax tree, the malicious code is considered to exist in the script to be detected, and the safety detection result is returned to the user.
In addition, related personnel with permission can upload the extended target compiling component to the remote server, and after the cloud server receives the target compiling component and confirms that no target compiling component corresponding to the corresponding compiling type exists in the current compiling list library, the target compiling component can be added into the compiling component list library so as to achieve the purpose of extension.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
According to the parsing apparatus of the abstract syntax tree of an embodiment of the present disclosure, the apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both. The parsing apparatus of the abstract syntax tree comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a script to be analyzed and a compiling type of the script;
a selection module configured to select a target compilation component based on the compilation type;
the analysis module is configured to analyze the script based on the target compiling component and output an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same.
In this embodiment, a script is a program that can be interpreted and executed, and a script may include a plurality of script statements. The script statements may be executable statements written in a scripting language that may include, but is not limited to, jsp, php, asp, bash, python, js, vb, c #, powershell, and the like.
In order to detect the script, the script may be compiled first, where the compiling process includes lexical analysis and syntax analysis, and an abstract syntax tree corresponding to a script statement in the script is finally obtained, where when one script includes multiple script statements, multiple abstract syntax trees may be generated, and each abstract syntax tree corresponds to one complete script statement. After the abstract syntax tree corresponding to the script is generated, the script detection can be performed on the basis of the abstract syntax tree, and the accuracy of the script detection can be improved. The abstract syntax tree is an abstract representation of the source code syntax structure of the script statement, which represents the syntax structure of the script statement in the form of a tree, and each constituent node on the abstract syntax tree represents one syntax representation in the script statement.
The script detection may be detection on whether malicious codes exist in the script or other types of detection, and may be specifically determined according to actual needs, which is not specifically limited herein. The principle of script detection is that whether a script statement realizing a predetermined function exists in a script is determined by matching a grammar expression mode of the script statement in the script, and if the script statement exists, a safety detection result can be output based on the script statement. Functions implemented by the script statement may be judged from the semantics of the script statement, and thus the embodiments of the present disclosure parse the syntax structure in the script statement into the form of an abstract syntax tree based on the semantics of the script statement, and further detect the script by matching whether a branch of a predetermined syntax structure exists in the abstract syntax tree.
Considering that a script to be detected may be written by various types of scripting languages, in order to improve efficiency of script detection written in different languages and reduce complexity of script detection, an embodiment of the present disclosure provides an abstract syntax tree parsing apparatus, which can parse script statements written in different languages to obtain an abstract syntax tree of a uniform type, and can understand the abstract syntax tree generated by using the embodiment of the present disclosure in a uniform adaptation manner, without requiring customized adaptation of abstract syntax trees corresponding to script statements written in different languages, as in the prior art.
In this embodiment, after receiving a script to be analyzed and a compiling type of the script, a corresponding target compiling component is selected based on the compiling type. The compiling type is corresponding to the script language for compiling the script, and different script languages correspond to different compiling types. The target compilation component may include, but is not limited to, lexical and grammatical parsers. The lexical parser is used for parsing script sentences in the script into word (token) sequences, and the grammar parser is used for generating an abstract grammar tree based on the word sequences.
Different compiling types correspond to different target compiling components, that is, scripts written in different languages need to select different target compiling components for compiling. For example, a script written in the jsp language employs a jsp-type target compilation component, while a script written in the python language employs a python-type target compilation component. It should be noted that the target compiling component mentioned here is not a native compiler corresponding to the scripting language, but is a compiling component that can compile scripts of different compiling types into an abstract syntax tree of the same type in the embodiment of the present disclosure.
And inputting the script into the target compiling component for analysis, and finally outputting the abstract syntax tree corresponding to the script. In the case where a plurality of script statements are included in the script, a plurality of abstract syntax trees may be output.
Although scripts written in different compiling types, that is, different scripting languages, are analyzed by using different compiling components, types of the finally output abstract syntax trees are the same, that is, structures of the abstract syntax trees output by the target compiling components corresponding to the different compiling types can be adapted in a uniform adaptation mode.
In the script parsing process, different target compiling components are preset for different compiling types, after a script to be parsed and the compiling type of the script are received, the corresponding target compiling component is selected based on the compiling type, then the script is parsed by the target compiling component, and an abstract syntax tree corresponding to the script is output, wherein the abstract syntax tree is used for expressing the syntax structure of each script statement in the script; in the embodiment of the disclosure, the types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same, so that scripts written in various languages can be normalized into the abstract syntax trees of the same type through the embodiment of the disclosure, and the adaptation efficiency and accuracy of the scripts can be improved.
In an alternative implementation manner of this embodiment, the target compiling components corresponding to different compiling types are different.
In this optional implementation, different compiling types may correspond to different scripting languages, that is, scripts written in different scripting languages correspond to different compiling types. Because different scripting languages have different compiling modes, corresponding target compiling components are preset according to scripts of different compiling types, and the different compiling types correspond to the different target compiling components. While the structure type of the abstract syntax tree that is ultimately output by different target compilation components is the same. That is, the embodiments of the present disclosure implement normalization processing of the abstract syntax tree for multiple scripting languages, so that the finally output abstract syntax tree is consistent in type, and the abstract syntax tree output by different target compilation components corresponding to different compilation types can be adapted in a unified adaptation manner.
In an optional implementation manner of this embodiment, the target compiling component includes a lexical parser and a grammar parser; the analysis module comprises:
a first parsing submodule configured to parse the script into a sequence of words using the lexical parser;
a second parsing submodule configured to parse the sequence of words into the abstract syntax tree using the parser.
In this alternative implementation, the target compilation component includes a lexical parser and a grammar parser. When the script is analyzed by the target compiling component, the script statement in the script can be input into the lexical analyzer, and the lexical analyzer divides the script statement into word sequences.
In an optional implementation manner of this embodiment, the apparatus further includes:
the second acquisition module is configured to acquire lexical analysis rules and/or grammar analysis rules written aiming at different compiling types;
a generation module configured to generate the compilation component based on the lexical and/or grammatical parsing rules using a generator tool.
In this optional implementation manner, in order to implement that scripts corresponding to different compilation types can be finally parsed into an abstract syntax tree of the same type, in the embodiment of the present disclosure, corresponding syntax parsing rules and lexical parsing rules are written in advance for script languages of different compilation types. In some embodiments, the grammar parsing rules and lexical parsing rules may be written in the form of regular expressions.
In some embodiments, a lexical parser may be generated for lexical parsing rules written by the relevant person using existing generator tools, and a syntactic parser may be generated based on the syntactic parsing rules.
In some embodiments, the generator tool may choose a combination of one or more of flex, bison, and re2 c.
It can be understood that the lexical parser or the syntactic parser is actually a piece of executable code, and during the execution process, for example, the lexical parser takes a script sentence as input and parses the script sentence into word sequences according to the lexical parsing rules, and the syntactic parser takes the output of the lexical parser as input, that is, takes the word sequence corresponding to the script sentence as input, parses the syntactic structure of the script sentence from the word sequence according to the syntactic parsing rules, and then displays the syntactic structure in the form of an abstract syntactic tree.
In an optional implementation manner of this embodiment, in the lexical analysis rule and the syntactic analysis rule, for different compiling types, the syntactic expression manners in the different compiling types are classified based on the semantics and the syntactic structures of the script, and different analysis processing is performed for different types of syntactic expression manners based on the classification result.
In this optional implementation manner, in the syntax parsing rules and the lexical parsing rules corresponding to different compiling types used in the embodiment of the present disclosure, by classifying syntax expression manners in different script languages, different parsing processing manners are adopted for syntax expression manners divided into different types, and the same parsing processing is adopted for syntax expression manners divided into the same type, so that abstract syntax trees with the same structure type can be finally obtained. Meanwhile, because the same parsing processing is adopted in the syntax expression modes of the same type, the abstract syntax trees parsed by sentences written in different languages with the same semantics are not only of the same type, but also of the same structure.
In an optional implementation manner of this embodiment, the type of the syntax expression includes at least one of the following:
the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are also the same;
the corresponding semantics of different compiling types are the same and the grammar expression modes with different grammar structures are different;
the corresponding semantics of the same compiling type are the same, but the versions are different, and further the grammar structure is different in grammar expression mode;
wherein grammatical expressions that are present in one compilation type and not present in other compilation types.
In an optional implementation manner of this embodiment, for syntax expression manners in which the same semantics and syntax structures in different compilation types are also the same, a plurality of component nodes are divided according to the semantics represented by the syntax structures, and the plurality of component nodes are defined in the syntax parsing rule; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to different compiling types and have the same semantics and different grammar structures, defining all grammar expressions included in the different grammar structures as composition nodes in a grammar parsing rule, wherein the composition nodes form a superset of all grammar expressions in the different grammar structures, and defining the composition nodes in the grammar parsing rule, and emptying the composition nodes corresponding to the grammar expressions which are not supported or exist by the parsed compiling types when an abstract grammar tree is generated; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to the same compiling type, have the same semantics and different versions and further have different grammar structures, the grammar expression mode of normalization can be realized at a lexical level, and grammar expression modes which have the same semantics and different grammar structures are defined and analyzed into the same word sequence in a lexical analysis rule; the grammar expression mode which can not realize normalization at the lexical level is normalized in a grammar parsing rule;
and the grammar expression modes exist in one compiling type but do not exist in other compiling types, and the grammar in the grammar expression modes is separately defined in a grammar parsing rule to express the corresponding component nodes.
In this optional implementation manner, in the lexical analysis rule and the grammar analysis rule, grammar expression manners in the script language corresponding to different compiling types are at least divided into one or more combinations of the following four manners:
1. the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are the same:
in the grammar expression mode, when the semantics are the same and the grammar structure is the same, the grammar structure can be divided into a plurality of composition nodes according to the semantics expressed by the grammar structure, and the script statement with the grammar expression mode is analyzed into an abstract grammar tree of the grammar structure comprising the composition nodes. The if semantics mentioned above are the syntactic expressions of this type, and the syntactic expressions having the same semantics and the same syntactic structure, such as for cycle semantics, while cycle semantics, foreach semantics, etc., can be analyzed in this way. It is understood that the parsing processing manner is defined in the syntax parsing rule.
2. The corresponding semantemes of different compiling types are the same and the grammar expression modes with different grammar structures are as follows:
in the grammar expression mode, because the semantics are the same and the grammar structure is different, aiming at the same semantics which can be expressed from different grammar structures of various languages in the grammar expression mode, the grammar structure is divided into a plurality of composition nodes which correspond to all grammar representations in all grammar structures of the same semantics, namely the composition nodes of the abstract grammar tree corresponding to the grammar expression mode can comprise any grammar representation in the grammar expression modes in various languages, and all composition node types are defined in a semantic parsing rule; in parsing, it is sufficient to blank component nodes that do not exist in the grammar structure of the current language.
The following examples illustrate:
//java
public static void func(int a,char b)
{
return a+b;
}
//php
function func($a,$b){
return$a+$b;
}
in the grammatical expression modes of the function semantics in the two languages, the java language supports grammatical expressions such as annotation, function qualifier, function return value type, function name, function parameter list and function execution body.
In order to normalize the syntax expressions with the same semantic and different syntax structures, a superset of syntax expressions can be defined to express the syntax structures, the superset of syntax expressions comprises all syntax expressions in the syntax expressions with the same semantic and different syntax structures in various languages, and furthermore, the syntax expressions in the superset are defined as the component nodes of the syntax expressions with the same semantic and different syntax structures in the syntax parsing rules, and the component nodes corresponding to the absent or unsupported syntax expressions are emptied when parsing the syntax.
3. Syntax expression modes with the same semantics and different versions and different syntax structures corresponding to the same compiling type
In this type of syntax expression, it is mainly considered that the same compiling type, i.e. the same used writing language, is used, but the language versions are different, resulting in that the two versions have different syntax expression for the same semantics, which is exemplified below.
For example, java15 later began to support the TextBlocks grammar, such as the following two grammatical expressions meant consistent:
//java
String a="abc";
//java
String a="""abc""";
in some embodiments, in order to obtain an abstract syntax tree with the same structure for the two syntax expression modes, the embodiment of the present disclosure performs normalization processing on a lexical parsing layer, that is, the two different syntax expression modes are compatible in a lexical parsing rule, for example, the text in the above example is inconsistent, but the lexical parser is required to output the same word sequence as follows:
1.IDENTIFIER[String]
2.IDENTIFIER[a]
3.ASSIGN[=]
4.STRING[abc]
after the same word sequence is output, subsequent grammar analysis is not influenced, and further the generation of a subsequent abstract grammar tree structure is not influenced.
In some embodiments, normalization processing may also be performed at a syntax parsing level when the lexical parsing level is not compatible.
4. Wherein grammatical expressions that exist in one compilation type but do not exist in other compilation types
Since this type of syntax expression does not exist in other compilation types, a parsing definition for this type of syntax expression may be added to the syntax parsing rule.
According to an embodiment of the present disclosure, a scenario detection apparatus includes:
the third acquisition module is configured to acquire the script to be detected;
the calling module is configured to call a pre-deployed script safety detection interface to perform safety detection on the script; the script safety detection interface selects a target compiling component based on the compiling type of the script, analyzes the script based on the target compiling component, outputs an abstract syntax tree corresponding to the script, and detects the script based on the abstract syntax tree; the structure types of the abstract syntax trees output by the target compiling component corresponding to different compiling types are the same;
an output module configured to output a security detection result of the script.
In this embodiment, a script is a program that can be interpreted and executed, and a script may include a plurality of script statements. The script statements may be executable statements written in a scripting language that may include, but is not limited to, jsp, php, asp, bash, python, js, vb, c #, powershell, and the like.
The device may be executed in the cloud. The script safety detection interface can be deployed in the cloud in advance, the script safety detection interface can be a Saas (Software-as-a-service) interface, a demand party can obtain the use right of the script safety detection interface in advance, a script to be detected can be detected by calling the script safety detection interface when needed, and the script safety detection interface is used for realizing the script detection device provided by the embodiment of the disclosure.
In this embodiment, the demander can upload one or more scripts that need to detect to the high in the clouds, detects this one or more scripts by the script safety inspection interface who deploys in the high in the clouds to the safety inspection result of output every script, this safety inspection result can return to the demander. In the embodiment of the disclosure, the script safety detection interface may compile the script first in order to detect the script, where the compiling process includes lexical analysis and syntax analysis, and finally obtains an abstract syntax tree corresponding to a script statement in the script, and when one script includes a plurality of script statements, a plurality of abstract syntax trees may be generated, and each abstract syntax tree corresponds to one complete script statement. After the abstract syntax tree corresponding to the script is generated, the script detection can be performed on the basis of the abstract syntax tree, and the accuracy of the script detection can be improved. This is because if the detection is directly performed on the script text, false detection may be caused, for example, there are comments in the script; furthermore, the type of a certain text segment cannot be accurately determined by directly detecting the script text, for example, whether the text segment is a character in a character string cannot be distinguished. The abstract syntax tree is an abstract representation of the source code syntax structure of the script statement, which represents the syntax structure of the script statement in the form of a tree, and each constituent node on the abstract syntax tree represents one syntax representation in the script statement. If the script is analyzed into the form of abstract syntax tree, the functions called in the script text, whether the used character strings are sensitive character strings and the like can be directly determined in a rule matching mode.
The principle of script detection is that whether a script statement realizing a predetermined function exists in a script is determined by matching a grammar expression mode of the script statement in the script, and if the script statement exists, a safety detection result can be output based on the script statement. Functions implemented by the script statement can be judged from the semantics of the script statement, so the embodiments of the present disclosure parse the syntax structure in the script statement into the form of an abstract syntax tree based on the semantics of the script statement, and then detect the script by matching whether a branch of a predetermined syntax structure exists in the abstract syntax tree.
Taking the detection of malicious code as an example, after the script is parsed into the abstract syntax tree by the parsing device of the abstract syntax tree mentioned above, traversal may be performed from the root node of the abstract syntax tree to see whether a predetermined statement can be matched in the traversal process, where the predetermined statement corresponds to the malicious code, such as a statement for modifying a system command, a statement for creating a certain function, and the like. After the predetermined statements are matched, the script can be considered to have malicious codes, and security detection results, such as malicious code identifications and the positions of the malicious codes in the script, can be output.
The parsing process of the abstract syntax tree can be referred to the above description, and is not described in detail herein.
In the embodiment of the disclosure, for the script to be detected, the script written in any language is analyzed into the abstract syntax tree of a uniform type by using the above-mentioned analytic device of the abstract syntax tree; therefore, in the process of the script safety detection, different safety detection devices do not need to be customized for different languages, but a uniform safety detection device is adopted to traverse the generated abstract syntax tree, and then the safety detection result is obtained based on the traversal result. Through the embodiment of the disclosure, the safety detection efficiency of the script can be improved, and the safety detection resources of the script are saved.
Fig. 8 is a schematic structural diagram of an electronic device suitable for implementing a parsing method and/or a script detection method of an abstract syntax tree according to an embodiment of the present disclosure.
As shown in fig. 8, electronic device 800 includes a processing unit 801, which may be implemented as a CPU, GPU, FPGA, NPU, or like processing unit. The processing unit 801 may execute various processes in the embodiment of any one of the above-described methods of the present disclosure according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing unit 801, ROM802, and RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to embodiments of the present disclosure, any of the methods described above with reference to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing any of the methods of the embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. A parsing method of abstract syntax tree, comprising:
acquiring a script to be analyzed and a compiling type of the script;
selecting a target compilation component based on the compilation type;
analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script; and the structure types of the abstract syntax trees output by the target compiling components corresponding to different compiling types are the same.
2. The method of claim 1, wherein the target compilation components corresponding to different compilation types are different.
3. The method of claim 1 or 2, wherein the target compilation component comprises a lexical parser and a grammar parser; analyzing the script based on the target compiling component, and outputting an abstract syntax tree corresponding to the script, wherein the abstract syntax tree comprises:
analyzing the script into a word sequence by using the lexical analyzer;
parsing the sequence of words into the abstract syntax tree using the syntax parser.
4. The method according to claim 1 or 2, wherein the method further comprises:
acquiring lexical analysis rules and/or grammar analysis rules written aiming at different compiling types;
generating, with a generator tool, the compiled component based on the lexical parsing rules and/or the grammar parsing rules.
5. The method according to claim 4, wherein the lexical analysis rule and the syntactic analysis rule classify the syntactic expressions in different compiling types based on the semantic meaning and the syntactic structure of the script for the different compiling types, and perform different analysis processes for the different syntactic expressions based on the classification result.
6. The method of claim 5, wherein the type of grammatical expression comprises at least one of:
the corresponding semantics of different compiling types are the same and the grammar expression modes of the grammar structure are also the same;
the corresponding semantics of different compiling types are the same and the grammar expression modes with different grammar structures are different;
the corresponding semantics of the same compiling type are the same, but the versions are different, and further the grammar structure is different in grammar expression mode;
a grammatical expression that is present in one compilation type and not present in the other compilation type.
7. The method according to claim 6, wherein for syntax expression modes in which the same semantics and syntax structures in different compilation types are also the same, a plurality of component nodes are divided according to the semantics represented by the syntax structures and defined in the syntax parsing rule; and/or the presence of a gas in the gas,
aiming at grammar expression modes which correspond to different compiling types and have the same semantics and different grammar structures, defining all grammar expressions included in the different grammar structures as composition nodes in a grammar parsing rule, wherein the composition nodes form a superset of all grammar expressions in the different grammar structures, and defining the composition nodes in the grammar parsing rule, and emptying the composition nodes corresponding to the grammar expressions which are not supported or exist by the parsed compiling types when an abstract grammar tree is generated; and/or the presence of a gas in the atmosphere,
aiming at grammar expression modes which correspond to the same compiling type, have the same semantics and different versions and further have different grammar structures, the grammar expression mode of normalization can be realized at a lexical level, and grammar expression modes which have the same semantics and different grammar structures are defined and analyzed into the same word sequence in a lexical analysis rule; the grammar expression mode which can not realize normalization at the lexical level is normalized in a grammar parsing rule;
and a grammar expression mode which exists in one compiling type but does not exist in other compiling types is defined in a grammar parsing rule, and a grammar in the grammar expression mode represents a corresponding component node.
8. A script detection method, comprising:
acquiring a script to be detected;
calling a pre-deployed script safety detection interface to perform safety detection on the script; the script safety detection interface selects a target compiling component based on the compiling type of the script, analyzes the script based on the target compiling component, outputs an abstract syntax tree corresponding to the script, and detects the script based on the abstract syntax tree; the structure types of the abstract syntax trees output by the target compiling component corresponding to different compiling types are the same;
and outputting the safety detection result of the script.
9. The method of claim 8, wherein the security detection result comprises whether webshell and/or malicious code is included in the script.
10. A computer program product comprising computer instructions, wherein the computer instructions, when executed by a processor, implement the method of any one of claims 1-8.
CN202111442982.6A 2021-11-30 2021-11-30 Method for parsing abstract syntax tree and computer program product Pending CN114443041A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111442982.6A CN114443041A (en) 2021-11-30 2021-11-30 Method for parsing abstract syntax tree and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111442982.6A CN114443041A (en) 2021-11-30 2021-11-30 Method for parsing abstract syntax tree and computer program product

Publications (1)

Publication Number Publication Date
CN114443041A true CN114443041A (en) 2022-05-06

Family

ID=81364572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111442982.6A Pending CN114443041A (en) 2021-11-30 2021-11-30 Method for parsing abstract syntax tree and computer program product

Country Status (1)

Country Link
CN (1) CN114443041A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis
CN115469875A (en) * 2022-08-22 2022-12-13 西安衍舆航天科技有限公司 Method and device for compiling domain-specific language DSL based on remote operation
CN117785884A (en) * 2023-12-28 2024-03-29 支付宝(杭州)信息技术有限公司 Graph logic execution plan generation method of graph query statement, data processing method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115469875A (en) * 2022-08-22 2022-12-13 西安衍舆航天科技有限公司 Method and device for compiling domain-specific language DSL based on remote operation
CN115469875B (en) * 2022-08-22 2023-08-15 西安衍舆航天科技有限公司 Compiling method and device of domain-specific language DSL based on remote control operation
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis
CN117785884A (en) * 2023-12-28 2024-03-29 支付宝(杭州)信息技术有限公司 Graph logic execution plan generation method of graph query statement, data processing method and device
CN117785884B (en) * 2023-12-28 2024-09-03 支付宝(杭州)信息技术有限公司 Graph logic execution plan generation method of graph query statement, data processing method and device

Similar Documents

Publication Publication Date Title
US11714611B2 (en) Library suggestion engine
US8707263B2 (en) Using a DSL for calling APIS to test software
US20210303274A1 (en) Method and System for Arbitrary-Granularity Execution Clone Detection
CN114443041A (en) Method for parsing abstract syntax tree and computer program product
US11531529B2 (en) Method and electronic device for deploying operator in deep learning framework
EP3695310A1 (en) Blackbox matching engine
CN108139891B (en) Method and system for generating suggestions to correct undefined token errors
CN106843840B (en) Source code version evolution annotation multiplexing method based on similarity analysis
US8850414B2 (en) Direct access of language metadata
KR100692172B1 (en) Universal string analyzer and method thereof
US9372683B2 (en) Automatic generation of class identifiers from source code annotations
US8954940B2 (en) Integrating preprocessor behavior into parsing
US20070050707A1 (en) Enablement of multiple schema management and versioning for application-specific xml parsers
US20160196204A1 (en) Smart Validated Code Searching System
US9311077B2 (en) Identification of code changes using language syntax and changeset data
CN115480760A (en) Intent-based machine programming
CN116028028B (en) Request function generation method, device, equipment and storage medium
US8516457B2 (en) Method, system and program storage device that provide for automatic programming language grammar partitioning
CN114153459A (en) Interface document generation method and device
CN104536769A (en) International file achieving method
CN113885876A (en) Parameter checking method, device, storage medium and computer system
US20080141230A1 (en) Scope-Constrained Specification Of Features In A Programming Language
US9304743B1 (en) Converting from incorrect program code to correct program code
US20130042224A1 (en) Application analysis device
CN114489653A (en) Compiler-based data processing method, compiler-based data processing device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination