CA3144052A1 - Method and apparatus for recognizing new sql statements in database audit systems - Google Patents
Method and apparatus for recognizing new sql statements in database audit systems Download PDFInfo
- Publication number
- CA3144052A1 CA3144052A1 CA3144052A CA3144052A CA3144052A1 CA 3144052 A1 CA3144052 A1 CA 3144052A1 CA 3144052 A CA3144052 A CA 3144052A CA 3144052 A CA3144052 A CA 3144052A CA 3144052 A1 CA3144052 A1 CA 3144052A1
- Authority
- CA
- Canada
- Prior art keywords
- statement
- values
- sql
- skeleton
- sql statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012550 audit Methods 0.000 title claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000000052 comparative effect Effects 0.000 claims abstract description 11
- 238000003860 storage Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012502 risk assessment Methods 0.000 claims description 3
- 238000009333 weeding Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method and an apparatus for recognizing new SQL statements in database audit systems relate to the technical field of big data, and feature a capability of effectively reducing resources occupancy of database audit systems, thereby enhancing system performance. The method includes: receiving and modularization processing a SQL statement in a network message to extract a skeleton statement; performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement. The apparatus implements the method.
Description
METHOD AND APPARATUS FOR RECOGNIZING NEW SQL STATEMENTS IN
DATABASE AUDIT SYSTEMS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of big data, and more particularly to a method and an apparatus for recognizing new SQL statements in database audit systems.
Description of Related Art
DATABASE AUDIT SYSTEMS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of big data, and more particularly to a method and an apparatus for recognizing new SQL statements in database audit systems.
Description of Related Art
[0002] Database audit systems are mainly used to audit various operations on database servers.
They acquire network messages of database servers by means of bypass images or plugins, and then extract and store SQL statements from the messages to their databases for subsequent operations such as querying, filtering, and analysis, thereby achieving monitoring and audit for database servers.
They acquire network messages of database servers by means of bypass images or plugins, and then extract and store SQL statements from the messages to their databases for subsequent operations such as querying, filtering, and analysis, thereby achieving monitoring and audit for database servers.
[0003] Database audit systems analyze the extracted SQL statements to determine whether database servers under audit have vulnerability or are attacked. From the perspective of technical implementations, the process is to extract, analyze, and store SQL
statements.
Therein, analysis is to audit whether SQL statements have vulnerability or are attacked.
For ensuring accuracy of audit, audit systems have to analyze all SQL
statements. As the number of database servers deployed by an audit system increases, the business flow of the database under audit increasingly grows, making the existing software/hardware platforms incompetent to meet requirements in the industry.
statements.
Therein, analysis is to audit whether SQL statements have vulnerability or are attacked.
For ensuring accuracy of audit, audit systems have to analyze all SQL
statements. As the number of database servers deployed by an audit system increases, the business flow of the database under audit increasingly grows, making the existing software/hardware platforms incompetent to meet requirements in the industry.
[0004] As revealed by analysis, in the actual business flow, there are many identical SQL
statements, and there are also some statements having the same skeleton with the only difference therebetween relying on parameter values. In fact, these identical or similar statements need only to be analyzed once in an audit system. Assuming that a database audit system is focused on different SQL statements, it will require merely a small part Date Recue/Date Received 2021-12-24 of the flow currently caused. To focus an audit system on the difference, it is essential to tell which statements appear for the first time prior to SQL analysis.
statements, and there are also some statements having the same skeleton with the only difference therebetween relying on parameter values. In fact, these identical or similar statements need only to be analyzed once in an audit system. Assuming that a database audit system is focused on different SQL statements, it will require merely a small part Date Recue/Date Received 2021-12-24 of the flow currently caused. To focus an audit system on the difference, it is essential to tell which statements appear for the first time prior to SQL analysis.
[0005] The conventional approaches to recognition of SQL statements mainly include: character matching, AT recognition, and SQL semantic parsing. These known approaches, however, have their shortcomings. For example, character matching is unable to ignore different parameter values. In a case with ID = 3 and ID = 4, with character matching, the two will be regarded as two different statements, making it less effecting in screening out repeated statements. In addition, handing massive extra-long SQL statements is highly demanding in terms of CPU and storage space. AT recognition in practical use takes time to learn for it to become accurate and it necessitates additional hardware servers, leading to increased costs for deployment. SQL semantic parsing is also resource-consuming, and the only purpose of recognizing new statements is to screen out numerous repeated statements and lower subsequent workload of SQL semantic parsing. Therefore, only new statements require semantic parsing.
SUMMARY OF THE INVENTION
SUMMARY OF THE INVENTION
[0006] The objective of the present invention is to provide a method and an apparatus for recognizing new SQL statements in database audit systems, featuring a capability of effectively reduce resource occupancy of database audit systems, thereby enhancing system performance.
[0007] For achieving the foregoing objective, in a first aspect, the present invention provides a method for recognizing new SQL statements in database audit systems, which comprises:
[0008] receiving and modularization processing a SQL statement in a network message to extract a skeleton statement;
[0009] performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and
[0010] checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement.
Date Recue/Date Received 2021-12-24
Date Recue/Date Received 2021-12-24
[0011] Preferably, the step of modularization processing a SQL statement to extract a skeleton statement comprises:
[0012] recognizing SQL semantic keywords in the SQL statement, and weeding out contents of the SQL statement other than the SQL semantic keywords so as to obtain the skeleton statement.
[0013] More preferably, before the step of performing digest computation on the skeleton statement, the method further comprises:
[0014] determining whether the skeleton statement contains combination values;
[0015] if the skeleton statement contains combination values, merging the combination values, and using a universal character to replace parameter values contained therein for devaluing; and
[0016] if the skeleton statement does not contain combination values, directly replacing parameter values contained therein with the universal character for devaluing, in which
[0017] the parameter values include variable parameters and/or character strings.
[0018] More preferably, before the step of performing digest computation on the skeleton statement, the method further comprises:
[0019] replacing non-ascii codes of the skeleton statement with the universal character.
[0020] Preferably, the step of performing digest computation on the skeleton statement, and converting digest values so as to obtain integer values comprises:
[0021] using a MD5 algorithm to perform the digest computation on the processed skeleton statement, so as to obtain the digest values; and
[0022] extracting some byte data with fixed positions from the digest values, and converting the byte data into the integer values by means of an integer algorithm.
[0023] Further, the step of checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement comprises:
statement comprises:
[0024] making the table contain a plurality of key-value pairs that correspond to the skeleton statement in a one-on-one manner, wherein the key values correspond to the integer values of the skeleton statement, and the value values are receipt times of the Date Recue/Date Received 2021-12-24 corresponding SQL statement;
[0025] checking the table for the integer values corresponding to the currently received SQL
statement, if the table contains key values identical to the integer values, determining that the SQL statement is not a new SQL statement, using the integer values as the key values while using the receipt times of the SQL statement as the value values, and storing them into the table in the form of key-value pairs; and
statement, if the table contains key values identical to the integer values, determining that the SQL statement is not a new SQL statement, using the integer values as the key values while using the receipt times of the SQL statement as the value values, and storing them into the table in the form of key-value pairs; and
[0026] if the table does not contain a key value identical to any of the integer values, determining that the SQL statement is a new SQL statement.
[0027] Further, after the step of determining that the SQL statement is not a new SQL statement, the method further comprises:
[0028] determining a difference of receipt times between the SQL statement newly added into the table and the SQL statement that already exists in the table and has the identical key-value pair, and when the difference of the receipt times exceeds a threshold, deleting the key-value pairs pre-existing in the table.
[0029] Preferably, the method of claim 1 further comprises:
[0030] marking a new statement identification of the SQL statement recognized as a new SQL
statement;
statement;
[0031] integrating identification results, the corresponding SQL statement, and the corresponding integer values into an entry of json data, submitting the json data to a message queue for consumption; and
[0032] using a risk engine to consume the json data from the message queue, so as to provide risk analysis services for the new SQL statement.
[0033] As compared to the prior art, the method for recognizing new SQL
statements in database audit systems of the present invention has the following beneficial effects:
statements in database audit systems of the present invention has the following beneficial effects:
[0034] According to the disclosed method for recognizing new SQL statements in database audit systems, for analyzing SQL statements in received network messages, modularized processing is followed by the process of extracting skeleton statements, which are then subject to digest computation. The resulting digest values are converted into integer values to be compared with a table. If there is a key value in the table identical to the Date Recue/Date Received 2021-12-24 integer value, it is determined that the SQL statement is not a new SQL
statement, and the integer values are used as key values while receipt times of SQL
statements are used as value values to be sotted into the table in the form of key-value pairs. If there is not a key value in the table identical to the integer value, it is determined that the SQL statement is a new SQL statement.
statement, and the integer values are used as key values while receipt times of SQL
statements are used as value values to be sotted into the table in the form of key-value pairs. If there is not a key value in the table identical to the integer value, it is determined that the SQL statement is a new SQL statement.
[0035] It is thus clear that the present invention first templates SQL
statements and extracts SQL
syntactic skeletons that are then subject to digest computation. At last, one integer value is used to mark a set of SQL statements having the same semantic meaning. As compared to the known character matching scheme, the scheme of the present invention shiels off changing value value information, and focuses on statement skeletons. As compared to the known SQL semantic parsing scheme, the scheme of the present invention only considers whether statements are identical instead of performing semantic parsing, so as to prevent system resources from being unnecessarily occupied by semantic parsing. As demonstrated by data from actual use in production environments, the method for recognizing new statements as described in the present invention has resource occupancy minimized to an ignorable level in a system, thereby effectively reducing resource consumption of the database audit system and improving system performance.
statements and extracts SQL
syntactic skeletons that are then subject to digest computation. At last, one integer value is used to mark a set of SQL statements having the same semantic meaning. As compared to the known character matching scheme, the scheme of the present invention shiels off changing value value information, and focuses on statement skeletons. As compared to the known SQL semantic parsing scheme, the scheme of the present invention only considers whether statements are identical instead of performing semantic parsing, so as to prevent system resources from being unnecessarily occupied by semantic parsing. As demonstrated by data from actual use in production environments, the method for recognizing new statements as described in the present invention has resource occupancy minimized to an ignorable level in a system, thereby effectively reducing resource consumption of the database audit system and improving system performance.
[0036] In a second aspect, the present invention provides an apparatus for recognizing new SQL
statements in database audit systems to be used in the method for recognizing new SQL
statements in database audit systems as disclosed in the foregoing technical scheme. The apparatus comprises:
statements in database audit systems to be used in the method for recognizing new SQL
statements in database audit systems as disclosed in the foregoing technical scheme. The apparatus comprises:
[0037] a skeleton extracting unit, for receiving and modularization processing a SQL statement in a network message to extract a skeleton statement;
[0038] an integer processing unit, for performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and
[0039] a recognizing unit, for checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement.
statement.
[0040] As compared to the prior art, the disclosed apparatus for recognizing new SQL statements in database audit systems provides beneficial effects that are similar to those provided by Date Recue/Date Received 2021-12-24 the disclosed method for recognizing new SQL statements in database audit systems as enumerated above, and thus no repetitions are made herein.
[0041] In a third aspect, the present invention provides a computer-readable storage medium storing therein a computer program, wherein the computer program when executed by a processor performing the steps of the method for recognizing new SQL
statements in database audit systems as described previously.
statements in database audit systems as described previously.
[0042] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclosed method for recognizing new SQL statements in database audit systems as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The accompanying drawings are provided herein for better understanding of the present invention and form a part of this disclosure. The illustrative embodiments and their descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:
[0044] FIG. 1 is a flowchart of a method for recognizing new SQL statements in database audit systems according to one embodiment of the present invention;
[0045] FIG. 2 is a flowchart of a process for templating SQL statements according to one embodiment of the present invention;
[0046] FIG. 3 is a flowchart of a process for recognizing a new SQL statement according to one embodiment of the present invention; and
[0047] FIG. 4 is a flowchart of a process for a risk engine to detect a new SQL statement according to one embodiment of the present invention DETAILED DESCRIPTION OF THE INVENTION
[0048] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some Date Recue/Date Received 2021-12-24 embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be embraced in the scope of the present invention Embodiment 1
[0049] Referring to FIG. 1-FIG. 4, the present embodiment provides a method for recognizing new SQL statements in database audit systems. The method comprises:
[0050] receiving and modularization processing a SQL statement in a network message to extract a skeleton statement; performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement.
[0051] According to the disclosed method for recognizing new SQL statements in database audit systems, for analyzing SQL statements in received network messages, modularized processing is followed by the process of extracting skeleton statements, which are then subject to digest computation. The resulting digest values are converted into integer values to be compared with a table. If there is a key value in the table identical to the integer value, it is determined that the SQL statement is not a new SQL
statement, and the integer values are used as key values while receipt times of SQL
statements are used as value values to be sotted into the table in the form of key-value pairs. If there is not a key value in the table identical to the integer value, it is determined that the SQL statement is a new SQL statement.
statement, and the integer values are used as key values while receipt times of SQL
statements are used as value values to be sotted into the table in the form of key-value pairs. If there is not a key value in the table identical to the integer value, it is determined that the SQL statement is a new SQL statement.
[0052] It is thus clear that the present embodiment first templates SQL
statements and extracts SQL syntactic skeletons that are then subject to digest computation. At last, one integer value is used to mark a set of SQL statements having the same semantic meaning. As compared to the known character matching scheme, the scheme of the present embodiment shiels off changing value value information, and focuses on statement Date Recue/Date Received 2021-12-24 skeletons. As compared to the known SQL semantic parsing scheme, the scheme of the present embodiment only considers whether statements are identical instead of performing semantic parsing, so as to prevent system resources from being unnecessarily occupied by semantic parsing. As demonstrated by data from actual use in production environments, the method for recognizing new statements as described in the present embodiment has resource occupancy minimized to an ignorable level in a system, thereby effectively reducing resource consumption of the database audit system and improving system performance.
statements and extracts SQL syntactic skeletons that are then subject to digest computation. At last, one integer value is used to mark a set of SQL statements having the same semantic meaning. As compared to the known character matching scheme, the scheme of the present embodiment shiels off changing value value information, and focuses on statement Date Recue/Date Received 2021-12-24 skeletons. As compared to the known SQL semantic parsing scheme, the scheme of the present embodiment only considers whether statements are identical instead of performing semantic parsing, so as to prevent system resources from being unnecessarily occupied by semantic parsing. As demonstrated by data from actual use in production environments, the method for recognizing new statements as described in the present embodiment has resource occupancy minimized to an ignorable level in a system, thereby effectively reducing resource consumption of the database audit system and improving system performance.
[0053] Referring to FIG. 2, in the embodiment described above, the step of modularization processing a SQL statement to extract a skeleton statement comprises:
[0054] recognizing SQL semantic keywords in the SQL statement, and weeding out contents of the SQL statement other than the SQL semantic keywords so as to obtain the skeleton statement. To be exemplificative, the SQL semantic keywords include SELECT and INSERT.
[0055] In the foregoing embodiment, before the step of performing digest computation on the skeleton statement, the method further comprises:
[0056] determining whether the skeleton statement contains combination values;
if the skeleton statement contains combination values, merging the combination values, and using a universal character to replace parameter values contained therein for devaluing; and if the skeleton statement does not contain combination values, directly replacing parameter values contained therein with the universal character for devaluing, in which, for example, the combination values are VALUES (1,`TOM'), (2,`JIMMV), (3,`SAM'), (4,`LF).
In specific implementations, the universal character may be "?", and the value value replaced with the universal character is VALUES(?). The parameter values include variable parameters and/or character strings. To be exemplificative, the variable parameter may be "1", and the character string may be "TOM".
if the skeleton statement contains combination values, merging the combination values, and using a universal character to replace parameter values contained therein for devaluing; and if the skeleton statement does not contain combination values, directly replacing parameter values contained therein with the universal character for devaluing, in which, for example, the combination values are VALUES (1,`TOM'), (2,`JIMMV), (3,`SAM'), (4,`LF).
In specific implementations, the universal character may be "?", and the value value replaced with the universal character is VALUES(?). The parameter values include variable parameters and/or character strings. To be exemplificative, the variable parameter may be "1", and the character string may be "TOM".
[0057] In the foregoing embodiment, before the step of performing digest computation on the skeleton statement, the method further comprises:
[0058] replacing non-ascii codes in the skeleton statements with the universal characters. For Date Recue/Date Received 2021-12-24 example, for concat('%',VAM,'%'), the Chinese character string is replaced by "?".
[0059] In the foregoing embodiment, the step of performing digest computation on the skeleton statement, and converting digest values so as to obtain integer values comprises:
[0060] using a MD5 algorithm to perform the digest computation on the processed skeleton statement, so as to obtain the digest values; and extracting some byte data with fixed positions from the digest values, and converting the byte data into the integer values by means of an integer algorithm. To be exemplificative, for a digest value that is a 16-bit number, 10055897659947028311, the byte data of the 8th through 15th bits among the 16 bits are taken and transformed into a 64-bit integer value.
[0061] In the foregoing embodiment, the step of checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement comprises:
statement comprises:
[0062] making the table contain a plurality of key-value pairs that correspond to the skeleton statement in a one-on-one manner, wherein the key values correspond to the integer values of the skeleton statement, and the value values are receipt times of the corresponding SQL statement; checking the table for the integer values corresponding to the currently received SQL statement, if the table contains key values identical to the integer values, determining that the SQL statement is not a new SQL statement, using the integer values as the key values while using the receipt times of the SQL
statement as the value values, and storing them into the table in the form of key-value pairs;
and if the table does not contain a key value identical to any of the integer values, determining that the SQL statement is a new SQL statement.
statement as the value values, and storing them into the table in the form of key-value pairs;
and if the table does not contain a key value identical to any of the integer values, determining that the SQL statement is a new SQL statement.
[0063] In the foregoing embodiment, after the step of determining that the SQL
statement is not a new SQL statement, the method further comprises:
statement is not a new SQL statement, the method further comprises:
[0064] determining a difference of receipt times between the SQL statement newly added into the table and the SQL statement that already exists in the table and has the identical key-value pair, and when the difference of the receipt times exceeds a threshold, deleting the key-value pairs pre-existing in the table. In specific implementations, the system periodically detects key-value pairs in the table. If the key value of a newly added SQL
Date Recue/Date Received 2021-12-24 statement has a matched key-value pair in the table, the difference between the receipt times is calculated according to the value value of the newly added SQL
statement and the corresponding value value in the table. If the difference between the receipt times exceeds the threshold, the pre-existing key-value pair is deleted from the table.
Date Recue/Date Received 2021-12-24 statement has a matched key-value pair in the table, the difference between the receipt times is calculated according to the value value of the newly added SQL
statement and the corresponding value value in the table. If the difference between the receipt times exceeds the threshold, the pre-existing key-value pair is deleted from the table.
[0065] In the foregoing embodiment, after the step of checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement, the method further comprises:
[0066] marking a new statement identification of the SQL statement recognized as a new SQL
statement; integrating identification results, the corresponding SQL
statement, and the corresponding integer values into an entry of json data, submitting the json data to a message queue for consumption; and using a risk engine to consume the json data from the message queue, so as to provide risk analysis services for the new SQL
statement.
statement; integrating identification results, the corresponding SQL
statement, and the corresponding integer values into an entry of json data, submitting the json data to a message queue for consumption; and using a risk engine to consume the json data from the message queue, so as to provide risk analysis services for the new SQL
statement.
[0067] For facilitating understanding, the scheme of the present embodiment will be described exemplarily below:
[0068] In specific implementations, SQL statements are templated. Only SQL
semantic keywords, e.g., SELECT, INSERT, in the SQL statements are retained, and value values, such as numeral values and character string, int the SQL statements are screened off.
Additionally, some particular statements WHERE AND( ), multi-line statements inserted by INSERT are merged so that only SQL syntactic skeletons are retained and extracted into one skeleton statement that does not include any specific numeral value.
Then the extracted skeleton statement is then subject to MD5 digest computation. The highest 8 bits among the 16 bits are taken and converted into a 64-bit integer value.
The purpose of this is to use one long integer value to represent a SQL statement. At last, the long integer value is stored in the HASH table. A specific implementation may be realized through the following steps.
semantic keywords, e.g., SELECT, INSERT, in the SQL statements are retained, and value values, such as numeral values and character string, int the SQL statements are screened off.
Additionally, some particular statements WHERE AND( ), multi-line statements inserted by INSERT are merged so that only SQL syntactic skeletons are retained and extracted into one skeleton statement that does not include any specific numeral value.
Then the extracted skeleton statement is then subject to MD5 digest computation. The highest 8 bits among the 16 bits are taken and converted into a 64-bit integer value.
The purpose of this is to use one long integer value to represent a SQL statement. At last, the long integer value is stored in the HASH table. A specific implementation may be realized through the following steps.
[0069] Step 1 involves receiving SQL statements extracted from network messages, wherein a SQL statement is a complete SQ statement that includes INSERT and/or SELECT
SQL
semantic keywords, such as an INSERT semantic keyword of INSERT INTO
USERINFO 0 (USERID, USERNAME) VALUES (1,`TOM'), (2,`JIMMY'), (3,' SAM'), Date Recue/Date Received 2021-12-24 (4,1F), and a SELECT semantic keyword of SELECT * FROM USERINFO 0 WHERE
ID=1;
SQL
semantic keywords, such as an INSERT semantic keyword of INSERT INTO
USERINFO 0 (USERID, USERNAME) VALUES (1,`TOM'), (2,`JIMMY'), (3,' SAM'), Date Recue/Date Received 2021-12-24 (4,1F), and a SELECT semantic keyword of SELECT * FROM USERINFO 0 WHERE
ID=1;
[0070] Step 2 is about devaluing the SQL statement by replacing all the numeral values or character strings in the SQL statement with"?". For example, WHERE ID=1 is rewritten into WHERE ID=?, and WHERE NAME="titlename" is rewritten into WHERE
NAME="?". It is to be noted that a completer numeral value can be replaced by one question mark only. For example, 123 is replaced by "?", but not"? ? ?".
NAME="?". It is to be noted that a completer numeral value can be replaced by one question mark only. For example, 123 is replaced by "?", but not"? ? ?".
[0071] After the processing of Step 2, the SELECT semantic keyword as mentioned above is rewritten as SELECT * FROM USERINFO 0 WHERE ID=?.
[0072] At Step 3, the combined value of multiple entries of records in the INSERT statement are merged. In an example where the INSERT statement inserts a JSON array object data that contains multiple entries of records, the different record numbers will be recognized as different statements, and thus these records have to be merged into one.
[0073] After Step 3, the INSERT statement is rewritten as INSERT INTO USERINFO
(USERID, USERNAME) VALUES (?, ?).
(USERID, USERNAME) VALUES (?, ?).
[0074] Step 4 involves merging other statements in the SELECT semantic keywords that need a merged value. For example, plural conditions in the statements WHERE AND, WHERE
OR, and WHERE NOT are merged.
OR, and WHERE NOT are merged.
[0075] At Step 5, the non-ascii code is processed. For example, in concat('%',WM,'%'), the Chinese character string is replaced by "?".
[0076] At Step 6, the single skeleton statement obtained after the process of Step 5 is subject to MD5 digest operation. For example, the digest values corresponding to INSERT
statement and the SELECT statement at the previous step, respectively, are 10055897659947028311 and 3935916992245373588.
statement and the SELECT statement at the previous step, respectively, are 10055897659947028311 and 3935916992245373588.
[0077] Step 7 is about simplifying the digest value by taking the 8th through the 15th byte data among the 16 bytes, and transforming the 8-bit value into a 64-bit integer value.
[0078] When a complete SQL statement is simplified into one integer value through the steps shown in FIG. 2, they are adopted into a hashmap table whose key is the simplified integer value of the SQL statement and value is the timestamp of when the SQL
statement was Date Recue/Date Received 2021-12-24 received.
statement was Date Recue/Date Received 2021-12-24 received.
[0079] Step 1: extracting SQL statements from network messages, templating the SQL
statements through for example the step shown in FIG. 2, and then extracting digest values;
statements through for example the step shown in FIG. 2, and then extracting digest values;
[0080] Step 2: using the integer value as the product of digest computation as a key for reading the hashmap storage table, if the key can be read from the hashmap storage table, determining that it is not a new SQL statement, for which case only the timestamp of the node is updated, wherein the timestamp is used for determining timeout, and if the key cannot be read from the hashmap storage table, determining that it is a new SQL statement, for which case the key value and the value value are added into the hashmap storage table;
[0081] Step 3: if it is a new SQL statement, adding a new statement mark, and organizing the new statement mark, the SQL statement, and the SQL statement digest integer value into an entry of json data;
[0082] Step 4: submitting the json data to a message queue for consumption.
[0083] FIG. 4 is a flowchart illustrates how a risk engine processes a new SQL
statement. The risk engine performs security analysis on the SQL statement, and the risk engine only processes SQL statements with new statement marks.
statement. The risk engine performs security analysis on the SQL statement, and the risk engine only processes SQL statements with new statement marks.
[0084] At Step 1, the risk engine reads JSON data form message queues, wherein the JSON data contains a complete SQL statement and a new statement mark.
[0085] At Step 2, the risk engine determines that the SQL statement is a new statement according to the new statement mark, and performs statement analysis to check whether a risk exists.
If the SQL statement is not a new statement, it is not processed.
If the SQL statement is not a new statement, it is not processed.
[0086] To sum up, the present embodiment has the following beneficial effects:
[0087] SQL statement templating: SQL statements are devalued, merged, extracted for statement skeletons to form SQL statement templates, thereby reducing occupation of system resources and improving system performance.
[0088] Digest processing: the processed SQL statement template are subject to digest processing, so that a skeleton statement can be represented by a long integer numeral value, which facilitates fast table looking-up.
Date Recue/Date Received 2021-12-24
Date Recue/Date Received 2021-12-24
[0089] KMP optimizing algorithm: For statement matching required by the template process, the known, intuitive character matching has a time complexity of 0(MxN), yet the KMP
algorithm has a time complexity of 0(M+N). The present embodiment optimizes the KMP algorithm for SQL statements, and has a reduced time complexity of 0(M-2N) or below.
Embodiment 2
algorithm has a time complexity of 0(M+N). The present embodiment optimizes the KMP algorithm for SQL statements, and has a reduced time complexity of 0(M-2N) or below.
Embodiment 2
[0090] The present embodiment provides an apparatus for recognizing new SQL
statements in database audit systems, which comprises:
statements in database audit systems, which comprises:
[0091] a skeleton extracting unit, for receiving and modularization processing a SQL statement in a network message to extract a skeleton statement;
[0092] an integer processing unit, for performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and
[0093] a recognizing unit, for checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement.
statement.
[0094] As compared to the prior art, the disclosed apparatus for recognizing new SQL statements in database audit systems provides beneficial effects that are similar to those provided by the disclosed method for recognizing new SQL statements in database audit systems as enumerated above, and thus no repetitions are made herein.
Embodiment 3
Embodiment 3
[0095] The present embodiment provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method for recognizing new SQL statements in database audit systems as described previously.
[0096] As compared to the prior art, the disclosed apparatus for recognizing new SQL statements in database audit systems provides beneficial effects that are similar to those provided by the disclosed method for recognizing new SQL statements in database audit systems as enumerated above, and thus no repetitions are made herein.
[0097] As will be appreciated by people of ordinary skill in the art, implementation of all or a part of the steps of the method of the present invention as described previously may be Date Recue/Date Received 2021-12-24 realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments.
The storage medium may be a ROM/RAM, a disk, a compact disk, a memory card or the like.
The storage medium may be a ROM/RAM, a disk, a compact disk, a memory card or the like.
[0098] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.
Date Recue/Date Received 2021-12-24
Date Recue/Date Received 2021-12-24
Claims (10)
1. A method for recognizing new SQL statements in database audit systems, wherein comprising:
receiving and modularization processing a SQL statement in a network message to extract a skeleton statement;
performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement.
receiving and modularization processing a SQL statement in a network message to extract a skeleton statement;
performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL statement.
2. The method of claim 1, wherein the step of modularization processing a SQL
statement to extract a skeleton statement comprises:
recognizing SQL semantic keywords in the SQL statement, and weeding out contents of the SQL
statement other than the SQL semantic keywords so as to obtain the skeleton statement.
statement to extract a skeleton statement comprises:
recognizing SQL semantic keywords in the SQL statement, and weeding out contents of the SQL
statement other than the SQL semantic keywords so as to obtain the skeleton statement.
3. The method of claim 1 or 2, wherein before the step of performing digest computation on the skeleton statement, the method further comprises:
determining whether the skeleton statement contains combination values;
if the skeleton statement contains combination values, merging the combination values, and using a universal character to replace parameter values contained therein for devaluing; and if the skeleton statement does not contain combination values, directly replacing parameter values contained therein with the universal character for devaluing, in which the parameter values include variable parameters and/or character strings.
determining whether the skeleton statement contains combination values;
if the skeleton statement contains combination values, merging the combination values, and using a universal character to replace parameter values contained therein for devaluing; and if the skeleton statement does not contain combination values, directly replacing parameter values contained therein with the universal character for devaluing, in which the parameter values include variable parameters and/or character strings.
4. The method of claim 1 or 3, wherein before the step of performing digest computation on the skeleton statement, the method further comprises:
replacing non-ascii codes of the skeleton statement with the universal character.
Date Recue/Date Received 2021-12-24
replacing non-ascii codes of the skeleton statement with the universal character.
Date Recue/Date Received 2021-12-24
5. The method of claim 3 or 4, wherein the step of performing digest computation on the skeleton statement, and converting digest values so as to obtain integer values comprises:
using a MD5 algorithm to perform the digest computation on the processed skeleton statement, so as to obtain the digest values; and extracting some byte data with fixed positions from the digest values, and converting the byte data into the integer values by means of an integer algorithm.
using a MD5 algorithm to perform the digest computation on the processed skeleton statement, so as to obtain the digest values; and extracting some byte data with fixed positions from the digest values, and converting the byte data into the integer values by means of an integer algorithm.
6. The method of claim 5, wherein the step of checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement comprises:
making the table contain a plurality of key-value pairs that correspond to the skeleton statement in a one-on-one manner, wherein the key values correspond to the integer values of the skeleton statement, and the value values are receipt times of the corresponding SQL
statement;
checking the table for the integer values corresponding to the currently received SQL statement, if the table contains key values identical to the integer values, determining that the SQL statement is not a new SQL statement, using the integer values as the key values while using the receipt times of the SQL statement as the value values, and storing them into the table in the form of key-value pairs; and if the table does not contain a key value identical to any of the integer values, determining that the SQL statement is a new SQL statement.
statement comprises:
making the table contain a plurality of key-value pairs that correspond to the skeleton statement in a one-on-one manner, wherein the key values correspond to the integer values of the skeleton statement, and the value values are receipt times of the corresponding SQL
statement;
checking the table for the integer values corresponding to the currently received SQL statement, if the table contains key values identical to the integer values, determining that the SQL statement is not a new SQL statement, using the integer values as the key values while using the receipt times of the SQL statement as the value values, and storing them into the table in the form of key-value pairs; and if the table does not contain a key value identical to any of the integer values, determining that the SQL statement is a new SQL statement.
7. The method of claim 6, wherein after the step of determining that the SQL
statement is not a new SQL statement, the method further comprises:
determining a difference of receipt times of the SQL statements corresponding to the SQL
statement newly added into the table with the identical key values pre-existing in the table, and when the difference of the receipt times exceeds a threshold, deleting the key-value pairs pre-existing in the table.
Date Recue/Date Received 2021-12-24
statement is not a new SQL statement, the method further comprises:
determining a difference of receipt times of the SQL statements corresponding to the SQL
statement newly added into the table with the identical key values pre-existing in the table, and when the difference of the receipt times exceeds a threshold, deleting the key-value pairs pre-existing in the table.
Date Recue/Date Received 2021-12-24
8. The method of claim 1, wherein further comprising:
marking a new statement identification of the SQL statement recognized as a new SQL statement;
integrating identification results, the corresponding SQL statement, and the corresponding integer values into an entry of json data, submitting the json data to a message queue for consumption; and using a risk engine to consume the json data from the message queue, so as to provide risk analysis services for the new SQL statement.
marking a new statement identification of the SQL statement recognized as a new SQL statement;
integrating identification results, the corresponding SQL statement, and the corresponding integer values into an entry of json data, submitting the json data to a message queue for consumption; and using a risk engine to consume the json data from the message queue, so as to provide risk analysis services for the new SQL statement.
9. An apparatus for recognizing new SQL statements in database audit systems, wherein comprising:
a skeleton extracting unit, for receiving and modularization processing a SQL
statement in a network message to extract a skeleton statement;
an integer processing unit, for performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and a recognizing unit, for checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement.
a skeleton extracting unit, for receiving and modularization processing a SQL
statement in a network message to extract a skeleton statement;
an integer processing unit, for performing digest computation on the skeleton statement, and processing a resulting digest value into an integer value; and a recognizing unit, for checking a table for comparing the integer value, and based on a comparative result determining whether the SQL statement is a new SQL
statement.
10. A computer-readable storage medium storing therein a computer program, wherein the computer program when executed by a processor performing a method as described in any of claims 1 through 8.
Date Recue/Date Received 2021-12-24
Date Recue/Date Received 2021-12-24
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011579141.5A CN112686029B (en) | 2020-12-28 | 2020-12-28 | SQL new statement identification method and device for database audit system |
CN202011579141.5 | 2020-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA3144052A1 true CA3144052A1 (en) | 2022-06-28 |
Family
ID=75452621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA3144052A Pending CA3144052A1 (en) | 2020-12-28 | 2021-12-24 | Method and apparatus for recognizing new sql statements in database audit systems |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112686029B (en) |
CA (1) | CA3144052A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114172890B (en) * | 2021-11-03 | 2024-02-27 | 阿里巴巴(中国)有限公司 | File second transmission processing method and device, storage medium and electronic equipment |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101453358B (en) * | 2007-12-06 | 2011-05-11 | 北京启明星辰信息技术股份有限公司 | Sql sentence audit method and system for oracle database binding variable |
AU2014205389A1 (en) * | 2013-01-11 | 2015-06-04 | Db Networks, Inc. | Systems and methods for detecting and mitigating threats to a structured data storage system |
CN103365979A (en) * | 2013-07-03 | 2013-10-23 | 交通银行股份有限公司 | Long-distance double-center online processing method and system based on open database |
CN103886021B (en) * | 2014-02-24 | 2018-10-02 | 上海上讯信息技术股份有限公司 | A kind of database audit result interpretation method and system based on natural language |
CN105373607B (en) * | 2015-11-13 | 2020-04-24 | 国网智能电网研究院 | Method for compressing SQL access log of power business system |
CN105912594B (en) * | 2016-04-05 | 2020-01-07 | 深信服科技股份有限公司 | SQL statement processing method and system |
-
2020
- 2020-12-28 CN CN202011579141.5A patent/CN112686029B/en active Active
-
2021
- 2021-12-24 CA CA3144052A patent/CA3144052A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN112686029B (en) | 2024-07-26 |
CN112686029A (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180232204A1 (en) | Intelligent data extraction | |
CN111352907A (en) | Method and device for analyzing pipeline file, computer equipment and storage medium | |
CN110929125A (en) | Search recall method, apparatus, device and storage medium thereof | |
CN112035599A (en) | Query method and device based on vertical search, computer equipment and storage medium | |
CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
CN114818643A (en) | Log template extraction method for reserving specific service information | |
CN116866241A (en) | Internet of things terminal detection method, system and storage medium based on DPI | |
CN113688240A (en) | Threat element extraction method, device, equipment and storage medium | |
CA3144052A1 (en) | Method and apparatus for recognizing new sql statements in database audit systems | |
CN114372144A (en) | Gambling domain name identification method based on certificate and domain name resolution | |
CN111460268B (en) | Method and device for determining database query request and computer equipment | |
CN116302089B (en) | Picture similarity-based code clone detection method, system and storage medium | |
CN115801455B (en) | Method and device for detecting counterfeit website based on website fingerprint | |
CN113472686B (en) | Information identification method, device, equipment and storage medium | |
CN112711678A (en) | Data analysis method, device, equipment and storage medium | |
CN115392238A (en) | Equipment identification method, device, equipment and readable storage medium | |
CN116414976A (en) | Document detection method and device and electronic equipment | |
CN114281761A (en) | Data file loading method and device, computer equipment and storage medium | |
CN114528908A (en) | Network request data classification model training method, classification method and storage medium | |
EP1365331A2 (en) | Determination of a semantic snapshot | |
CN115048543B (en) | Image similarity judgment method, image searching method and device | |
CN113064597B (en) | Redundant code identification method, device and equipment | |
CN118467669A (en) | Index construction method, field searching method, device, equipment and medium | |
CN115952459A (en) | Error reporting identification method, device, equipment and storage medium | |
CN117520538A (en) | Mobile payment transaction information sorting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |