CN116594987A - Database analysis system and method based on big data - Google Patents
Database analysis system and method based on big data Download PDFInfo
- Publication number
- CN116594987A CN116594987A CN202310720127.XA CN202310720127A CN116594987A CN 116594987 A CN116594987 A CN 116594987A CN 202310720127 A CN202310720127 A CN 202310720127A CN 116594987 A CN116594987 A CN 116594987A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- analysis
- mining
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 122
- 238000000034 method Methods 0.000 title claims description 16
- 238000007418 data mining Methods 0.000 claims abstract description 53
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000012544 monitoring process Methods 0.000 claims abstract description 13
- 230000002159 abnormal effect Effects 0.000 claims abstract description 5
- 238000012216 screening Methods 0.000 claims description 76
- 238000005065 mining Methods 0.000 claims description 51
- 238000007405 data analysis Methods 0.000 claims description 20
- 238000013480 data collection Methods 0.000 claims description 19
- 238000013079 data visualisation Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 10
- 238000013461 design Methods 0.000 claims description 8
- 238000003745 diagnosis Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000003066 decision tree Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000013499 data model Methods 0.000 claims description 4
- 230000007547 defect Effects 0.000 claims description 4
- 238000007726 management method Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000011835 investigation Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 1
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The database analysis system and the database analysis method based on big data improve the accuracy and the comprehensiveness of database analysis by collecting massive data, monitor and alarm the collected database running state and SQL execution condition parameter information in real time by the database monitoring module, count and analyze the database performance parameter by the database performance analysis module, and check and optimize abnormal data, evaluate and analyze the database data safety by the database safety analysis module, provide early warning and precautionary measures of safety threat, provide high-efficiency and accurate database analysis service for users by collecting, processing and analyzing massive data, discover the relation, the rule and the trend in the data by applying a data mining algorithm, acquire the rule and the trend of the data by analyzing the knowledge and experience in the data field, and discover the knowledge and the value in the data.
Description
Technical Field
The invention relates to the field of databases, in particular to a database analysis system and method based on big data.
Background
With the rapid development of the internet and the internet of things, various enterprises and organizations need to know consumer and market conditions in order to better make decisions and plans. Although the conventional database analysis system can meet certain requirements, when facing a large amount of data and a rapidly-changing market environment, the analysis efficiency and the accuracy of the database analysis system are required to be improved, and compared with structured data, the unstructured data is more difficult to analyze and process, the processing capacity of the unstructured data of the conventional database analysis system is required to be further improved, and the processing capacity of the conventional database analysis system is still limited to a certain extent when the conventional database analysis system is used for processing massive data, so that the data analysis efficiency is possibly low, and therefore, the database analysis system and the database analysis method based on the large data are improved.
Disclosure of Invention
The invention aims at: aiming at the problems of the prior art. In order to achieve the above object, the present invention provides the following technical solutions: the database analysis system based on big data comprises a data collection module, a data screening module, a data analysis module, a data visualization module and a data mining module, wherein the data collection module is used for collecting massive database information from various channels by adopting various means and modes;
the data collection module is in data connection with the data screening module, and the data screening module is used for cleaning and processing the collected data, removing invalid data and performing duplication removal and uniform format operation;
the data screening module is in data connection with the data analysis module, and the data analysis module is used for analyzing the screened data, finding rules and trends in the screened data and providing diagnosis and analysis results of the database;
the analytical formula is:
wherein: mu (mu) x 、μ y Is xMean square error of data of (a); c 1 、c 2 Is x->Is a data format constant of (1); sigma (sigma) x 、σ y Is x->Is a function of the respective variance of (2); />Data obtained by screening the original data x>
The data analysis module is in data connection with the data visualization module, and the data visualization module is used for visually displaying analysis results and visually presenting the analysis results to a user in a chart and report mode;
the data visualization module is in data connection with the data mining module, and the data mining module is used for mining and analyzing according to data in the database, exploring potential association and modes and further assisting a user in carrying out service analysis and decision.
The invention also comprises a database monitoring module, a database performance analysis module and a database security analysis module, wherein the database monitoring module is in data connection with the database performance analysis module, and the database performance analysis module is in data connection with the database security analysis module.
As a preferable technical scheme of the invention, the database monitoring module monitors and alarms in real time on collecting database running state and SQL execution condition parameter information.
As a preferable technical scheme of the invention, the database performance analysis module performs statistics and analysis on the performance parameters of the database, and investigation and optimization of abnormal data.
As a preferable technical scheme of the invention, the database security analysis module evaluates and analyzes the data security of the database and provides early warning and precautionary measures of security threat.
The database analysis method based on big data comprises the following steps S1, data collection, wherein mass database information is collected from various channels by adopting various means and modes, the mass database information comprises a data table structure, data records and SQL sentences, and the data collection can acquire required data from various websites through the data collection of Web crawlers;
s2, data screening, namely cleaning and processing the collected data, removing invalid data, removing duplication and performing uniform format operation to ensure the accuracy and consistency of the data, wherein the data screening comprises condition screening, filter screening, database query statement screening and data mining algorithm screening;
s3, data analysis, namely, analyzing the cleaned data to find rules and trends in the data, and providing diagnosis and analysis results of a database, wherein the data analysis comprises database performance analysis, SQL statement analysis, database architecture analysis, database security analysis and data mining algorithm analysis;
s4, data visualization, wherein the analysis result is visually displayed, and the data visualization process comprises the following display modes including chart display, report display and dynamic display;
s5, data mining, namely mining potential association and mode through mining and analyzing according to data in a database, so as to assist a user in carrying out service analysis and decision, wherein the data mining process comprises statistic mining, data distribution mining, time sequence mining, data mining and domain knowledge mining.
As a preferred embodiment of the present invention, the condition screening in S2: screening the data according to known conditions, and screening the data according to date, region and index conditions;
the filter screens: using an automatic screening or advanced screening function in Excel electronic form software, and setting a deleting condition, namely screening data;
the database query statement screening: screening data in the database according to conditions through SQL database query sentences;
the database running state execution calculation formula is as follows:
wherein: operating a monitoring target by theta data; v 1 、v 2 Is the running value of the data; sigma (sigma) 1 、∑ 2 Is the data mean vector and covariance.
The data mining algorithm screens: and (3) intelligently screening the data by using clustering, classifying and association rule data mining algorithms to find rules and modes in the data.
As a preferred embodiment of the present invention, the database performance analysis in S3: the performance bottleneck of the database is found out by analyzing the performance indexes of the database, the response time, the processing capacity and the I/O indexes, and the database is optimized;
the SQL statement analysis: by analyzing the execution plan information of the SQL sentence, potential problems in the SQL sentence are found out, and the SQL sentence is optimized;
the database architecture analysis: by analyzing the system structure of the database, the relation model and the data type of the field, finding out the defects in the database design and optimizing the data model design;
the database security analysis: the security problem of the database is found out and the security policy of the database is optimized by analyzing the security setting and the authority control of the database;
the data mining algorithm analyzes: by applying a data mining algorithm, rules and patterns in the data are found, and the database management strategy is optimized.
As a preferred technical solution of the present invention, the statistics mining in S5: mining the distribution condition and central trend of the data by calculating the mean, median, mode and standard deviation statistics of the data; the data distribution mining: the distribution condition of the data is mined by drawing a histogram and a probability distribution map, and the range and the change trend of the data are mastered; the time series mining: and (5) finding out the characteristics of regularity and trend by mining the periodicity, trend and seasonality of the time series data.
As a preferred technical solution of the present invention, S6 is the data mining: by applying a data mining algorithm, the relationship, rule and trend in the data are found, and the relationship, rule and decision tree are clustered; the field knowledge mining: and (3) obtaining rules and trends of the data through analysis of knowledge and experience in the field of the data, and finding out the knowledge and value in the data.
Compared with the prior art, the invention has the beneficial effects that:
in the scheme of the invention:
1. by collecting mass data, the accuracy and the comprehensiveness of database analysis are improved. The data mining module is used for mining and analyzing according to the data in the database, and mining potential association and modes so as to assist a user in carrying out service analysis and decision.
2. And the database monitoring module is used for monitoring and alarming the collected database running state and SQL execution condition parameter information in real time. And carrying out statistics and analysis on the performance parameters of the database and checking and optimizing abnormal data through the database performance analysis module. The database security analysis module is used for evaluating and analyzing the data security of the database, and the early warning and precautionary measures for providing security threat provide high-efficiency and accurate database analysis service for users by collecting, processing and analyzing mass data.
3. By applying a data mining algorithm, the relationship, rule and trend in the data are found, and the relationship, rule and decision tree are clustered; the field knowledge mining: and (3) obtaining rules and trends of the data through analysis of knowledge and experience in the field of the data, and finding out the knowledge and value in the data.
Drawings
FIG. 1 is a schematic diagram of a system frame structure provided by the present invention;
FIG. 2 is a schematic flow chart of the method provided by the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of some embodiments of the invention. All other embodiments obtained by those skilled in the art without making any creative effort based on the embodiments of the present invention are within the protection scope of the present invention, and it should be noted that the embodiments of the present invention and features and technical solutions of the embodiments are combined with each other without collision: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Example 1: referring to fig. 1-2, a database analysis system based on big data includes a data collection module, a data screening module, a data analysis module, a data visualization module and a data mining module, where the data collection module is configured to collect massive database information from various channels by adopting various means and modes; the data collection module is in data connection with the data screening module, and the data screening module is used for cleaning and processing the collected data, removing invalid data and performing duplication removal and uniform format operation; the data screening module is in data connection with the data analysis module, and the data analysis module is used for analyzing the screened data, finding rules and trends in the data and providing diagnosis and analysis results of the database;
the analytical formula is:
wherein: mu (mu) x 、μ y Is xMean square error of data of (a); c 1 、c 2 Is x->Is a data format constant of (1); sigma (sigma) x 、σ y Is x->Is a function of the respective variance of (2); />Data obtained by screening the original data x>The data analysis module is in data connection with the data visualization module, and the data visualization module is used for visually displaying analysis results and visually presenting the analysis results to a user in a chart and report mode;
the data visualization module is in data connection with the data mining module, and the data mining module is used for mining and analyzing according to the data in the database, and exploring potential association and modes so as to assist a user in carrying out service analysis and decision. The system also comprises a database monitoring module, a database performance analysis module and a database security analysis module, wherein the database monitoring module is in data connection with the database performance analysis module, and the database performance analysis module is in data connection with the database security analysis module.
The database monitoring module is used for carrying out real-time monitoring and alarming on the collected database running state and SQL execution condition parameter information, and the database performance analysis module is used for carrying out statistics and analysis on the database performance parameters and checking and optimizing abnormal data. The database security analysis module evaluates and analyzes the data security of the database and provides early warning and precautionary measures of security threat.
Example 2: the database analysis method based on big data comprises the following steps S1, data collection, wherein mass database information is collected from various channels by adopting various means and modes, the mass database information comprises a data table structure, data records and SQL sentences, and the data collection can acquire required data from various websites through the data collection of Web crawlers; s2, data screening, namely cleaning and processing the collected data, removing invalid data, removing duplication and performing uniform format operation to ensure the accuracy and consistency of the data, wherein the data screening comprises condition screening, filter screening, database query statement screening and data mining algorithm screening;
s3, data analysis, namely analyzing the cleaned data to find rules and trends in the data, and providing diagnosis and analysis results of a database, wherein the data analysis comprises database performance analysis, SQL statement analysis, database architecture analysis, database security analysis and data mining algorithm analysis; s4, data visualization, wherein the analysis result is visually displayed, and the data visualization process comprises the following display modes including chart display, report display and dynamic display;
s5, data mining, namely mining potential association and mode through mining and analyzing according to data in a database, so as to assist a user in carrying out service analysis and decision, wherein the data mining process comprises statistic mining, data distribution mining, time sequence mining, data mining and domain knowledge mining.
S2, screening the data according to known conditions, and screening the data according to date, region and index conditions; and (3) screening a filter: using an automatic screening or advanced screening function in Excel electronic form software, and setting a deleting condition, namely screening data; database query statement screening: screening data in the database according to conditions through SQL database query sentences; screening a data mining algorithm: and (3) intelligently screening the data by using clustering, classifying and association rule data mining algorithms to find rules and modes in the data.
Database performance analysis in S3: the performance bottleneck of the database is found out by analyzing the performance indexes of the database, the response time, the processing capacity and the I/O indexes, and the database is optimized;
SQL statement analysis: by analyzing the execution plan information of the SQL sentence, potential problems in the SQL sentence are found out, and the SQL sentence is optimized;
database architecture analysis: by analyzing the system structure of the database, the relation model and the data type of the field, finding out the defects in the database design and optimizing the data model design;
database security analysis: the security problem of the database is found out and the security policy of the database is optimized by analyzing the security setting and the authority control of the database;
data mining algorithm analysis: by applying a data mining algorithm, rules and patterns in the data are found, and the database management strategy is optimized.
And S5, uniformly excavating: mining the distribution condition and central trend of the data by calculating the mean, median, mode and standard deviation statistics of the data; data distribution mining: the distribution condition of the data is mined by drawing a histogram and a probability distribution map, and the range and the change trend of the data are mastered; time sequence mining: and (5) finding out the characteristics of regularity and trend by mining the periodicity, trend and seasonality of the time series data.
S6, data mining: by applying a data mining algorithm, the relationship, rule and trend in the data are found, and the relationship, rule and decision tree are clustered; and (3) field knowledge mining: and (3) obtaining rules and trends of the data through analysis of knowledge and experience in the field of the data, and finding out the knowledge and value in the data.
Working principle: in the using process, massive database information is collected from various channels by adopting various means and modes, wherein the database information comprises a data table structure, data records and SQL sentences, and the data collection can acquire required data from various websites through the data collection of Web crawlers; the collected data is cleaned and processed, invalid data is removed, duplicate removal and unified format operation are carried out, the accuracy and consistency of the data are ensured, and data screening comprises condition screening, filter screening, database query statement screening and data mining algorithm screening; by analyzing the cleaned data, rules and trends are found, and diagnosis and analysis results of the database are provided.
The analytical formula is:
wherein: mu (mu) x 、μ y Is xMean square error of data of (a); c 1 、c 2 Is x->Is a data format constant of (1); sigma (sigma) x 、σ y Is x->Is a function of the respective variance of (2); />Data obtained by screening the original data x>
The data analysis comprises database performance analysis, SQL statement analysis, database architecture analysis, database security analysis and data mining algorithm analysis;
the analysis result is visually displayed, and the data visual process comprises the following display modes including chart display, report display and dynamic display; by mining and analyzing the data in the database, potential association and modes are mined, so that a user is assisted in carrying out service analysis and decision making, and the data mining process comprises statistic mining, data distribution mining, time sequence mining, data mining and domain knowledge mining.
Screening the data according to known conditions, and screening the data according to date, region and index conditions; and (3) screening a filter: using an automatic screening or advanced screening function in Excel electronic form software, and setting a deleting condition, namely screening data; database query statement screening: screening data in the database according to conditions through SQL database query sentences; screening a data mining algorithm: and (3) intelligently screening the data by using clustering, classifying and association rule data mining algorithms to find rules and modes in the data.
The performance bottleneck of the database is found out by analyzing the performance indexes of the database, the response time, the processing capacity and the I/O indexes, and the database is optimized; by analyzing the execution plan information of the SQL sentence, potential problems in the SQL sentence are found out, and the SQL sentence is optimized; by analyzing the system structure of the database, the relation model and the data type of the field, finding out the defects in the database design and optimizing the data model design; the security problem of the database is found out and the security policy of the database is optimized by analyzing the security setting and the authority control of the database; by applying a data mining algorithm, rules and patterns in the data are found, and the database management strategy is optimized.
Mining the distribution condition and central trend of the data by calculating the mean, median, mode and standard deviation statistics of the data; data distribution mining: the distribution condition of the data is mined by drawing a histogram and a probability distribution map, and the range and the change trend of the data are mastered; time sequence mining: the method comprises the steps of finding out characteristics of regularity and trending by mining periodicity, trending and seasonality of time sequence data, and finding out relations, regularity and trending, clustering, association rules and decision trees in the data by applying a data mining algorithm; and (3) field knowledge mining: and (3) obtaining rules and trends of the data through analysis of knowledge and experience in the field of the data, and finding out the knowledge and value in the data.
The above embodiments are only for illustrating the present invention and not for limiting the technical solutions described in the present invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above specific embodiments, and thus any modifications or substitutions are made thereto; all technical solutions and modifications thereof that do not depart from the spirit and scope of the invention are intended to be included in the scope of the appended claims.
Claims (10)
1. The database analysis system based on big data is characterized by comprising a data collection module, a data screening module, a data analysis module, a data visualization module and a data mining module, wherein the data collection module is used for collecting massive database information from various channels by adopting various means and modes;
the data collection module is in data connection with the data screening module, and the data screening module is used for cleaning and processing the collected data, removing invalid data and performing duplication removal and uniform format operation;
the data screening module is in data connection with the data analysis module, and the data analysis module is used for analyzing the screened data, finding rules and trends in the screened data and providing diagnosis and analysis results of the database;
the analytical formula is:
wherein: mu (mu) x 、μ y Is xMean square error of data of (a); c 1 、c 2 Is x->Is a data format constant of (1); sigma (sigma) x 、σ y Is x->Is a function of the respective variance of (2); />Data obtained by screening the original data x>
The data analysis module is in data connection with the data visualization module, and the data visualization module is used for visually displaying analysis results and visually presenting the analysis results to a user in a chart and report mode;
the data visualization module is in data connection with the data mining module, and the data mining module is used for mining and analyzing according to data in the database, exploring potential association and modes and further assisting a user in carrying out service analysis and decision.
2. The big data based database analysis system of claim 1, further comprising a database monitor module, a database performance analysis module, and a database security analysis module, wherein the database monitor module is in data connection with the database performance analysis module, and the database performance analysis module is in data connection with the database security analysis module.
3. The database analysis system based on big data according to claim 2, wherein the database monitoring module monitors and alarms the collected database running state and SQL execution condition parameter information in real time; the database running state execution calculation formula is as follows:
wherein: operating a monitoring target by theta data; v 1 、v 2 Is the running value of the data; sigma (sigma) 1 、∑ 2 Is the data mean vector and covariance.
4. A big data based database analysis system according to claim 3, wherein the database performance analysis module performs statistics and analysis on the performance parameters of the database, and investigation and optimization of abnormal data.
5. The big data based database analysis system of claim 4, wherein the database security analysis module evaluates and analyzes the data security of the database to provide security threat pre-warning and countermeasure.
6. The database analysis method based on big data is characterized by comprising the following steps of S1, collecting data, namely collecting massive database information from various channels by adopting various means and modes, wherein the database information comprises a data table structure, data records and SQL sentences, and the data collection can acquire required data from various websites through the data collection of Web crawlers;
s2, data screening, namely cleaning and processing the collected data, removing invalid data, removing duplication and performing uniform format operation to ensure the accuracy and consistency of the data, wherein the data screening comprises condition screening, filter screening, database query statement screening and data mining algorithm screening;
s3, data analysis, namely, analyzing the cleaned data to find rules and trends in the data, and providing diagnosis and analysis results of a database, wherein the data analysis comprises database performance analysis, SQL statement analysis, database architecture analysis, database security analysis and data mining algorithm analysis;
s4, data visualization, wherein the analysis result is visually displayed, and the data visualization process comprises the following display modes including chart display, report display and dynamic display;
s5, data mining, namely mining potential association and mode through mining and analyzing according to data in a database, so as to assist a user in carrying out service analysis and decision, wherein the data mining process comprises statistic mining, data distribution mining, time sequence mining, data mining and domain knowledge mining.
7. The big data based database analysis method of claim 6, wherein the condition filtering in S2: screening the data according to known conditions, and screening the data according to date, region and index conditions;
the filter screens: using an automatic screening or advanced screening function in Excel electronic form software, and setting a deleting condition, namely screening data;
the database query statement screening: screening data in the database according to conditions through SQL database query sentences;
the data mining algorithm screens: and (3) intelligently screening the data by using clustering, classifying and association rule data mining algorithms to find rules and modes in the data.
8. The big data based database analysis method of claim 7, wherein the database performance analysis in S3: the performance bottleneck of the database is found out by analyzing the performance indexes of the database, the response time, the processing capacity and the I/O indexes, and the database is optimized;
the SQL statement analysis: by analyzing the execution plan information of the SQL sentence, potential problems in the SQL sentence are found out, and the SQL sentence is optimized;
the database architecture analysis: by analyzing the system structure of the database, the relation model and the data type of the field, finding out the defects in the database design and optimizing the data model design;
the database security analysis: the security problem of the database is found out and the security policy of the database is optimized by analyzing the security setting and the authority control of the database;
the data mining algorithm analyzes: by applying a data mining algorithm, rules and patterns in the data are found, and the database management strategy is optimized.
9. The big data based database analysis method of claim 8, wherein the statistics mining in S5: mining the distribution condition and central trend of the data by calculating the mean, median, mode and standard deviation statistics of the data; the data distribution mining: the distribution condition of the data is mined by drawing a histogram and a probability distribution map, and the range and the change trend of the data are mastered; the time series mining: and (5) finding out the characteristics of regularity and trend by mining the periodicity, trend and seasonality of the time series data.
10. The method for analyzing big data based database according to claim 9, wherein S6 is the data mining: by applying a data mining algorithm, the relationship, rule and trend in the data are found, and the relationship, rule and decision tree are clustered; the field knowledge mining: and (3) obtaining rules and trends of the data through analysis of knowledge and experience in the field of the data, and finding out the knowledge and value in the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310720127.XA CN116594987A (en) | 2023-06-18 | 2023-06-18 | Database analysis system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310720127.XA CN116594987A (en) | 2023-06-18 | 2023-06-18 | Database analysis system and method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116594987A true CN116594987A (en) | 2023-08-15 |
Family
ID=87595721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310720127.XA Pending CN116594987A (en) | 2023-06-18 | 2023-06-18 | Database analysis system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116594987A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020078039A1 (en) * | 2000-12-18 | 2002-06-20 | Ncr Corporation By Paul M. Cereghini | Architecture for distributed relational data mining systems |
US20110161030A1 (en) * | 2009-12-31 | 2011-06-30 | Semiconductor Manufacturing International (Shanghai) Corporation | Method And Device For Monitoring Measurement Data In Semiconductor Process |
CN102622441A (en) * | 2012-03-09 | 2012-08-01 | 山东大学 | Automatic performance identification tuning system based on Oracle database |
KR20170079648A (en) * | 2015-12-30 | 2017-07-10 | 대한민국(국민안전처 국립재난안전연구원장) | Analysis system for predicting future risks |
KR101765292B1 (en) * | 2016-06-21 | 2017-08-04 | 어니컴 주식회사 | Apparatus and method for providing data analysis tool based on purpose |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109977661A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of network safety protection method and system based on big data platform |
CN111949502A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Database early warning method and device, computing equipment and medium |
CN115640158A (en) * | 2022-10-28 | 2023-01-24 | 合肥长月科技有限公司 | Detection analysis method and device based on database |
-
2023
- 2023-06-18 CN CN202310720127.XA patent/CN116594987A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020078039A1 (en) * | 2000-12-18 | 2002-06-20 | Ncr Corporation By Paul M. Cereghini | Architecture for distributed relational data mining systems |
US20110161030A1 (en) * | 2009-12-31 | 2011-06-30 | Semiconductor Manufacturing International (Shanghai) Corporation | Method And Device For Monitoring Measurement Data In Semiconductor Process |
CN102622441A (en) * | 2012-03-09 | 2012-08-01 | 山东大学 | Automatic performance identification tuning system based on Oracle database |
KR20170079648A (en) * | 2015-12-30 | 2017-07-10 | 대한민국(국민안전처 국립재난안전연구원장) | Analysis system for predicting future risks |
KR101765292B1 (en) * | 2016-06-21 | 2017-08-04 | 어니컴 주식회사 | Apparatus and method for providing data analysis tool based on purpose |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109977661A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of network safety protection method and system based on big data platform |
CN111949502A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Database early warning method and device, computing equipment and medium |
CN115640158A (en) * | 2022-10-28 | 2023-01-24 | 合肥长月科技有限公司 | Detection analysis method and device based on database |
Non-Patent Citations (1)
Title |
---|
王莉;张勇: "基于大数据平台的图像数据库架构的设计与实现", 软件工程, vol. 22, no. 02 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bailis et al. | Macrobase: Prioritizing attention in fast data | |
CN111190876A (en) | Log management system and operation method thereof | |
CN116662989B (en) | Security data analysis method and system | |
CN106371986A (en) | Log treatment operation and maintenance monitoring system | |
Allam | An Exploratory Survey of Hadoop Log Analysis Tools | |
EP2747365A1 (en) | Network security management | |
CN112988509B (en) | Alarm message filtering method and device, electronic equipment and storage medium | |
CN117971606B (en) | Log management system and method based on elastic search | |
CN117172751A (en) | Construction method of intelligent operation and maintenance information analysis model | |
CN116755992B (en) | Log analysis method and system based on OpenStack cloud computing | |
CN114817681B (en) | Financial wind control system based on big data analysis and management equipment thereof | |
Yamini | A violent crime analysis using fuzzy c-means clustering approach | |
CN117194919A (en) | Production data analysis system | |
CN118260695A (en) | Big data anomaly analysis method and system for digital online service | |
Isafiade et al. | Citisafe: Adaptive spatial pattern knowledge using fp-growth algorithm for crime situation recognition | |
CN116991932B (en) | Data analysis and management system and method based on artificial intelligence | |
CN116594987A (en) | Database analysis system and method based on big data | |
CN116862109A (en) | Regional carbon emission situation awareness early warning method | |
CN112256549B (en) | Log processing method and device | |
CN115130793A (en) | Enterprise management analysis system and method based on big data | |
Kumar et al. | Crime Data Analysis using Big Data Analytics and Visualization using Tableau | |
CN116703321B (en) | Pharmaceutical factory management method and system based on green production | |
Zhong et al. | Leveraging decision making in cyber security analysis through data cleaning | |
CN118820216A (en) | Data element extraction analysis system and data element extraction analysis method | |
Balaskó et al. | What happens to process data in chemical industry? From source to applications–an overview |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20230815 |