CN108874819B

CN108874819B - Data mining method for database

Info

Publication number: CN108874819B
Application number: CN201710329637.9A
Authority: CN
Inventors: 雷晓军; 周京
Original assignee: Shanghai Alcohol Information Technology Co ltd
Current assignee: Shanghai Alcohol Information Technology Co ltd
Priority date: 2017-05-11
Filing date: 2017-05-11
Publication date: 2021-09-03
Anticipated expiration: 2037-05-11
Also published as: CN108874819A

Abstract

A data mining method of a database comprises the steps of converting a data mode of an existing relational database into a proprietary ontology to form a proprietary ontology base, converting data in the existing relational database into an RDF (resource description framework) knowledge graph corresponding to the proprietary ontology, and then carrying out node operation on a semantic network formed by the proprietary ontology to obtain data in the RDF knowledge graph corresponding to nodes. The invention simplifies the process of data mining, so that the data can be obtained by non-IT staff, and the labor productivity is greatly improved.

Description

Data mining method for database

Technical Field

The invention relates to the field of semantic search and big data, in particular to a data mining method of a database.

Background

The combination of computers and the internet creates a vast amount of information that soon gives us the feeling of being overwhelmed. This is true, as well, and we are constantly making new information while dealing with unconventional vast amounts of information. This amount of information grows in a geometric progression. It is desirable to effectively process massive information by a computer, and it is expected that the massive information can be utilized better, while being released from information inundation.

Information processing of a computer is initially limited to data having a simple structure, and the structure is relatively simple although the amount of data may be large. With the rapid increase in the hardware capacity of computers, which are used to cope with complex problems, the complexity of the structure of data increases greatly. Through different accumulation of data by the internet, data of different data sources begin to be gathered together, so that data processing becomes more complex.

The database makes our daily work very concise and efficient. As the use of databases is deepened, the ecology of the databases in use is more and more complicated, and at the same time, more and more databases need to be integrated or merged to generate greater benefits. Since the database design is a bottom-up approach nowadays, when a database becomes very complex, the database itself becomes a legacy (legacy) system, and the bottom is a huge black hole, which makes it difficult for people to reach. When these complex and exotic databases need to be integrated or merged with homogeneous databases, the task becomes very laborious and impossible (mission observable).

With respect to searching, one thinks of the relevant results that are then given for a query made using "search terms" for textual descriptions in text or images. Text is also referred to as unstructured data. For Structured data (i.e. row data, stored in a Database, and implemented data can be logically expressed by a two-dimensional table structure) stored in the Database, IT is a matter of course to hand the DBA (Database Administrator) or corresponding IT personnel to Query the desired data, and let them write the Query statement of SQL using the Query Language of a relational Database, such as SQL (Structured Query Language), and then obtain these data and corresponding data reports. For example, a health management company project ideally knows the data of 50-60 year old men and 45-55 year old women whose glycemic index is close to diabetes in their managed population, and the project manager gives this request to DBA personnel, who write corresponding SQL query statements, query and extract relevant data from the database, and then browse and analyze the data. If any problem is found and further data is needed, the manager must ask the other requirements, for example, classify the data according to profession, and the DBA staff needs to do further data query and extraction. This process is very cumbersome and fraught with possible human error.

Disclosure of Invention

The invention provides a data mining method of a database, which simplifies the data mining process, enables the data to be obtained by non-IT staff and greatly improves the labor productivity.

In order to achieve the above object, the present invention provides a data mining method for a database, comprising the steps of:

step S1, converting the data mode of the existing relational database into a proprietary ontology to form a proprietary ontology library;

step S2, converting the data in the existing relational database into an RDF knowledge graph corresponding to the proprietary ontology;

and step S3, performing node operation on the semantic network formed by the proprietary ontology, and acquiring data in the RDF knowledge graph corresponding to the nodes.

The step S1 specifically includes the following steps:

s1.1, extracting a data mode of a relational database;

s1.2, converting the data mode into a proprietary ontology;

a table in the relational database represents an entity in an ontology, and fields owned by the table in the relational database are attributes of the entity;

and S1.3, after the special ontology is edited by experts in the special field, generating an expert-level special ontology, and storing the expert-level special ontology in a special ontology library.

In step S2, the data originally stored in the table of the relational database forms the semantic web graph in the RDF knowledge graph.

The step S3 specifically includes the following steps:

s3.1, the classes and the attributes of the special ontology in the special ontology library form a semantic network graph;

s3.2, selecting a plurality of nodes on the semantic network to generate a sub-network;

and S3.3, selecting data corresponding to the nodes from the RDF knowledge graph according to the sub-networks to obtain search data.

The step of generating a sub-network in step S3.2 specifically includes: and selecting a plurality of nodes on the semantic network, filtering the nodes which are not selected, and forming a sub-network by the selected nodes.

After a sub-network is generated, the semantic network is reset to the initial state of the semantic network, so that a next new sub-network can be generated, or the nodes can be continuously selected on the basis of the current sub-network, so that a new sub-network is generated.

The invention applies the proprietary ontology to data mining and converts the structured data into the knowledge graph, thereby carrying out semantic search through keywords, simplifying the process of data mining, leading the data to be obtained to be operated by non-IT staff and greatly improving the labor productivity.

Drawings

Fig. 1 is a flowchart of a data mining method for a database according to the present invention.

Fig. 2 is a specific schematic diagram of a data mining method for a database according to the present invention.

Detailed Description

The preferred embodiment of the present invention is described in detail below with reference to fig. 1 and 2.

Ontologies and proprietary ontologies are emerging in the computer science and artificial intelligence communities to deal with such complex data processing. The ontology and the proprietary ontology are the foundation of the third generation internet, namely the Semantic Web, and are also the cornerstone of Semantic search. Third generation internet and semantic search are the basis for big data processing. Soon after the introduction of ontology into the computer field, this concept was also introduced by some people into database design and development, and the design of databases has also changed from the bottom to the top of the past to a top-down approach: firstly, the composition relationship of concepts and entities in the field and the specific attributes of the concepts and the entities are determined and designed, a proprietary field ontology is established, and the data of the database is tightly surrounded around the proprietary field ontology. Such database design, development and maintenance biases are in the completeness of concepts and entities and the straightforward handlability of domain experts. Moreover, the evolution of the database is firstly embodied in the knowledge ontology and then implemented in the underlying data system. The ontology-driven database thoroughly changes the database's popularity, so that database integration and consolidation become the maintenance and updating of the ontology, while changes to the bottom level of the database are automated.

According to the top-down concept, as shown in fig. 1, the present invention provides a data mining method for a database, comprising the following steps:

As shown in fig. 2, the step S1 specifically includes the following steps:

s1.1, extracting a data mode of a relational database;

the relational Database is composed of a series of tables in which data is stored, and various tables in the relational Database are determined by data patterns, which are established by Database administrators (DBAs for short);

s1.2, converting the data mode into a proprietary ontology;

the proprietary ontology is established by experts in the proprietary domain;

generally, a table in a relational database represents an entity in an ontology, and fields owned by the table in the relational database are attributes of the entity; some table fields are called foreign keys, namely, primary keys of another table; from an ontology perspective, this indicates that the two entities are related, one entity being the attribute value of the other entity; in the same way, the method can be applied to all tables of the database, so that the data mode can be roughly converted into the proprietary ontology, and the existing proprietary ontology participates in the conversion process;

s1.3, after the special ontology is edited by experts in the special field, generating an expert-level special ontology, and storing the expert-level special ontology in a special ontology library;

the editing refers to adding, modifying and deleting.

The step S2 specifically includes the following steps:

the data in the relational database are originally stored in the table, the positions of the data are indicated by the fields in the table, the data are extracted now, the attributes in the entity corresponding to the proprietary ontology are the values of the attributes, namely, the data are arranged in the table in the relational database, but in the RDF knowledge graph, the data directly form a semantic network graph.

As shown in fig. 2, the step S3 specifically includes the following steps:

because a proprietary ontology can have a large number of classes and a corresponding large number of attributes, the number of nodes on the semantic network graph is large, and the relationship is complex, the network graph is generated on a computer interface by using the existing Javascript technology, so that the nodes formed by the classes and the attributes can be clicked, after the nodes representing the classes or the attributes are clicked, the nodes and the relationships connected with the nodes are highlighted, and the nodes become the focus of attention;

clicking a plurality of nodes on the semantic network, filtering the nodes which are not clicked, wherein the clicked nodes form a sub-network which represents a part of data in the whole data;

according to different selected nodes, different sub-networks can be generated, after one sub-network is generated, the semantic network is reset to return to the initial state of the semantic network, a next new sub-network can be generated, or the nodes can be continuously selected on the basis of the current sub-network to generate a new sub-network;

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A method for mining data of a database, comprising the steps of:

step S3, performing node operation on the semantic network formed by the proprietary ontology to acquire data in the RDF knowledge graph corresponding to the nodes;

the step S1 specifically includes the following steps:

s1.1, extracting a data mode of a relational database;

s1.2, converting the data mode into a proprietary ontology;

2. The method of data mining of database of claim 1, wherein in step S2, the data originally stored in the table of the relational database forms a semantic web graph in the RDF knowledge graph.

3. The method for mining data of a database according to claim 1, wherein said step S3 specifically comprises the steps of:

4. The method of data mining of a database according to claim 3, characterized in that the step of generating a sub-network in step S3.2 comprises: and selecting a plurality of nodes on the semantic network, filtering the nodes which are not selected, and forming a sub-network by the selected nodes.

5. The method of data mining of database of claim 4, wherein after a sub-network is created, a new sub-network is created by resetting the semantic network back to the initial state of the semantic network, or by continuing to select nodes based on the current sub-network.