Nothing Special   »   [go: up one dir, main page]

CN115964514A - Data restoration method, data restoration device, electronic device, medium, and program product - Google Patents

Data restoration method, data restoration device, electronic device, medium, and program product Download PDF

Info

Publication number
CN115964514A
CN115964514A CN202310075472.2A CN202310075472A CN115964514A CN 115964514 A CN115964514 A CN 115964514A CN 202310075472 A CN202310075472 A CN 202310075472A CN 115964514 A CN115964514 A CN 115964514A
Authority
CN
China
Prior art keywords
information
data
parent
superior
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310075472.2A
Other languages
Chinese (zh)
Inventor
戎伟峰
伍如意
秦家祥
叶鸿浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310075472.2A priority Critical patent/CN115964514A/en
Publication of CN115964514A publication Critical patent/CN115964514A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a method, an apparatus, an electronic device, a medium, and a computer program product for data restoration based on a knowledge graph. The method and the device can be used in the technical field of artificial intelligence. The data restoration method based on the knowledge graph comprises the following steps: acquiring m pieces of storage process information in a t second time period, wherein each piece of storage process information comprises a source and a destination of data; extracting superior information, subordinate information and a parent-child relationship between the superior information and the subordinate information according to the m pieces of stored process information, wherein the superior information comprises a parent stored process-table-field, and the subordinate information comprises a child stored process-table-field; constructing a knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein the nodes of the knowledge graph are constructed according to the superior information and the inferior information, and the edges of the knowledge graph are constructed according to the parent-child relationship; determining a problem data chain in the knowledge graph according to problem data obtained in advance within a t second time period; and repairing the problem data chain.

Description

Data restoration method, data restoration device, electronic device, medium, and program product
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, an electronic device, a medium, and a computer program product for data restoration based on a knowledge graph.
Background
Production data of the financial market trading system flows into an oracle database, is processed layer by layer through one or more storage processes and then is applied to each business module and downstream application. When the processing logic of the table field needs to be determined, manual analysis and investigation are often needed. And when production data is wrong, various business modules and downstream applications associated with the financial market trading system are affected, possibly triggering production problems. The data error may be caused by various reasons, such as a program logic error, a network interruption, or a server downtime.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for data restoration based on a knowledge graph, which have high intelligence degree, efficiency, and accuracy.
One aspect of the present disclosure provides a data restoration method based on a knowledge graph, including: acquiring m pieces of stored process information in a t second time period, wherein each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1; extracting superior information, subordinate information and a parent-child relationship between the superior information and the subordinate information according to the m pieces of stored process information, wherein the superior information comprises a parent stored process-table-field, and the subordinate information comprises a child stored process-table-field; constructing a knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein nodes of the knowledge graph are constructed according to the superior information and the inferior information, and edges of the knowledge graph are constructed according to the parent-child relationship; determining a problem data chain in the knowledge graph according to the problem data obtained in advance within the t second time period; and repairing the problem data chain.
According to the data restoration method based on the knowledge graph, the superior information, the inferior information and the parent-child relationship are extracted from the m pieces of storage process information in the t second time period, and the knowledge graph of the storage process information in the t second time period can be constructed according to the superior information, the inferior information and the parent-child relationship, so that other data related to the problem data in the t second time period can be found based on the side relationship of the knowledge graph, and the problem data and the related data can be restored. The data restoration method disclosed by the invention is low in dependence on manual degree, high in intelligent degree and high in efficiency and accuracy.
In some embodiments, the extracting, from the m pieces of stored procedure information, superior information, inferior information, and a parent-child relationship between the superior information and the inferior information includes: extracting superior information, inferior information and parent-child relationship in each piece of stored process information to obtain an intermediate data result; and integrating m intermediate data results according to an integration rule to obtain a final data result, wherein the final data result comprises superior information and inferior information in the m pieces of stored process information and a parent-child relationship between the superior information and the inferior information.
In some embodiments, the integration rules comprise: and merging the m intermediate data results, and in every two intermediate data results, when the table-field in the upper information of one intermediate data result is the same as the table-field in the lower information of the other intermediate data result, regarding the lower information of the other intermediate data result as upper information and regarding the upper information of the one intermediate data result as lower information in the final data result.
In some embodiments, the determining a problem data chain in the knowledge-graph according to the problem data in the t second time period obtained in advance comprises: matching nodes in the knowledge graph as problem nodes according to problem data obtained in advance in the t second time period, wherein the problem data comprises a problem table and/or a problem field; and determining a problem data chain by using a graph traversal algorithm according to the problem node.
In some embodiments, the repairing the problem data chain includes: determining an upstream node of the problem data in the problem data chain as a node to be confirmed; determining the problem data and the downstream nodes thereof in the problem data chain as nodes to be repaired; and repairing the node to be repaired.
In some embodiments, the repairing the node to be repaired includes: writing a custom script of the node to be repaired and/or performing task readjustment on the node to be repaired.
Another aspect of the present disclosure provides a knowledge-graph based data recovery apparatus comprising: the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for executing acquisition of m pieces of stored process information in a time period of t seconds, each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1; an extraction module, configured to extract, according to the m pieces of stored procedure information, superior information, inferior information, and a parent-child relationship between the superior information and the inferior information, where the superior information includes a parent stored procedure-table-field, and the inferior information includes a child stored procedure-table-field; the building module is used for building a knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein the nodes of the knowledge graph are built according to the superior information and the inferior information, and the edges of the knowledge graph are built according to the parent-child relationship; a determining module, configured to perform determining a problem data chain in the knowledge graph according to the problem data obtained in advance within the t second time period; and a repair module for performing a repair of the problem data chain.
Another aspect of the present disclosure provides an electronic device comprising one or more processors and one or more memories, wherein the memories are used for storing executable instructions, which when executed by the processors, implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program product comprising a computer program comprising computer executable instructions for implementing the method as described above when executed.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which the methods, apparatus, and methods may be applied, in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of knowledge-graph based data remediation, in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram for extracting superior information, inferior information, and parent-child relationships between the superior information and the inferior information from m stored process information, according to an embodiment of the disclosure;
FIG. 4 schematically shows a schematic diagram of a knowledge-graph according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram for determining a chain of issue data in a knowledge graph from obtained issue data according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow diagram for repairing an issue data chain in accordance with an embodiment of the disclosure;
FIG. 7 schematically illustrates a flow chart for repairing a node to be repaired according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a knowledge-graph based data remediation device according to an embodiment of the present disclosure;
FIG. 9 schematically shows a block diagram of an extraction module according to an embodiment of the disclosure;
FIG. 10 schematically illustrates a block diagram of the structure of a determination module according to an embodiment of the present disclosure;
FIG. 11 schematically shows a block diagram of a repair module according to an embodiment of the present disclosure;
fig. 12 schematically shows a block diagram of a repair unit according to an embodiment of the present disclosure;
FIG. 13 schematically illustrates a block diagram of a knowledge-graph based data remediation device according to an embodiment of the present disclosure;
FIG. 14 schematically illustrates a workflow diagram of a knowledge-graph based data remediation device according to an embodiment of the present disclosure;
FIG. 15 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated. In the technical scheme of the disclosure, the processing of data acquisition, collection, storage, use, processing, transmission, provision, disclosure, application and the like all conform to the regulations of relevant laws and regulations, necessary security measures are taken, and the customs of public sequences is not violated.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
Where a convention analogous to "at least one of A, B, or C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, or C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.
Production data influx for financial market trading systems o The racle database is processed layer by layer through one or more storage processes and then applied to each business module and downstream applications. When the processing logic of the table field needs to be determined, manual analysis and investigation are often needed. And when production data is wrong, various business modules and downstream applications associated with the financial market trading system are affected, possibly triggering production problems. The data error may be caused by various reasons, such as a program logic error, a network interruption, or a server downtime.
Data errors generally affect some fields of a table, and a storage process that updates only other fields of the table needs to be excluded, and the impact of erroneous data needs to be analyzed in terms of storage process, table, and field dimensions. The existing emergency scheme is capable of determining field processing logic and repairing error data, but depends on the familiarity degree of workers to the storage process of the financial market, the analysis time is possibly longer, and the data repairing timeliness is lower.
Embodiments of the present disclosure provide a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for data restoration based on a knowledge graph. The data restoration method based on the knowledge graph comprises the following steps: acquiring m pieces of stored process information in a t second time period, wherein each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1; extracting superior information, subordinate information and a parent-child relationship between the superior information and the subordinate information according to the m pieces of stored process information, wherein the superior information comprises a parent stored process-table-field, and the subordinate information comprises a child stored process-table-field; constructing a knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein nodes of the knowledge graph are constructed according to the superior information and the inferior information, and edges of the knowledge graph are constructed according to the parent-child relationship; determining a problem data chain in the knowledge graph according to problem data obtained in advance within a t second time period; and repairing the problem data chain.
It should be noted that the method, the apparatus, the electronic device, the computer-readable storage medium, and the computer program product for data restoration based on knowledge graph according to the present disclosure may be applied to the technical field of artificial intelligence, and may also be applied to any fields other than the technical field of artificial intelligence, such as the financial field, and the field of the present disclosure is not limited herein.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which the method, apparatus, electronic device, computer-readable storage medium, and computer program product for knowledge-graph based data remediation may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the data repairing method based on knowledge graph provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the knowledge-graph based data remediation device provided by embodiments of the present disclosure may generally be disposed in the server 105. The data restoration method based on the knowledge graph provided by the embodiment of the present disclosure may also be performed by a server or a server cluster which is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the data restoring apparatus based on knowledge graph provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The knowledge-graph-based data restoration method according to the embodiment of the present disclosure will be described in detail with reference to fig. 2 to 7 based on the scenario described in fig. 1.
FIG. 2 schematically shows a flow diagram of a method of knowledge-graph based data remediation, according to an embodiment of the present disclosure.
As shown in FIG. 2, the knowledge-graph-based data restoration method of the embodiment includes operations S210 to S250.
In operation S210, m pieces of stored procedure information within a time period of t seconds are obtained, where each piece of stored procedure information includes a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1. For example, stored procedure information of tables and fields within a time period of t seconds may be obtained FROM a service database of a financial market, where the number of the stored procedure information may be m, and each stored procedure information may include keywords such as INSERT, UPDATE, MERGE, and/or FROM.
In operation S220, superior information, subordinate information, and a parent-child relationship between the superior information and the subordinate information are extracted according to the m pieces of stored procedure information, wherein the superior information includes a parent stored procedure-table-field, and the subordinate information includes a child stored procedure-table-field. It will be appreciated that the parent stored procedure-table-field and the child stored procedure-table-field may be determined from each key of the stored procedure information.
For example, in the storage process P1, the storage process information records an INSERT INTO table B (field B) SELECT field a FROM table a, so that it can be determined that the upper level information is P1-table a-field a, and the lower level information is P1-table B-field B; in the stored process P1, the stored process information records an UPDATE table B SET field B = (SELECT field a FROM table aware table B. Field c = table a. Field d), so that it can be determined that the upper level information is P1-table a-field a, and the lower level information is P1-table B-field B; in the stored procedure P1, stored procedure information is described:
MERGE INTO TABLE B
USING table a ON table b
WHEN MATCHED THEN
UPDATE SET table b. Field b = table a. Field a
WHEN NOT MATCHED THEN
INSERT (table b. Field b) VALUES (table a. Field a),
it can be determined that the upper level information is P1-table a-field a and the lower level information is P1-table B-field B.
In operation S230, a knowledge-graph is constructed according to the superior information, the inferior information, and the parent-child relationship, wherein nodes of the knowledge-graph are constructed according to the superior information and the inferior information, and edges of the knowledge-graph are constructed according to the parent-child relationship.
In operation S240, a problem data chain is determined in the knowledge graph according to problem data obtained in advance within a time period of t seconds, where the problem data may be a table or a field in the table, a problem node may be found in the knowledge graph through the problem table and/or the field, and then a node and an edge having a direct association relationship or an indirect association relationship with the problem node are found to determine the problem data chain.
In operation S250, a problem data chain is repaired.
According to the data restoration method based on the knowledge graph, the superior information, the inferior information and the parent-child relationship are extracted from the m pieces of storage process information in the t second time period, and the knowledge graph of the storage process information in the t second time period can be constructed according to the superior information, the inferior information and the parent-child relationship, so that other data related to the problem data in the t second time period can be found based on the side relationship of the knowledge graph, and the problem data and the related data can be restored. The data restoration method disclosed by the invention is low in dependence on manual degree, high in intelligent degree and high in efficiency and accuracy.
Fig. 3 schematically shows a flowchart for extracting superior information, inferior information, and parent-child relationship between the superior information and the inferior information from m pieces of stored procedure information according to an embodiment of the present disclosure.
The operation S220 of extracting the upper information, the lower information, and the parent-child relationship between the upper information and the lower information according to the m pieces of stored procedure information includes operations S221 and S222.
In operation S221, the upper level information, the lower level information, and the parent-child relationship in each stored procedure information are extracted, resulting in an intermediate data result.
In operation S222, the m intermediate data results are integrated according to the integration rule to obtain a final data result, where the final data result includes upper information, lower information, and a parent-child relationship between the upper information and the lower information in the m pieces of stored procedure information.
In some specific examples, the consolidation rule may include merging the m intermediate data results, and in every two intermediate data results, when a table-field in the upper level information of one of the intermediate data results is the same as a table-field in the lower level information of the other intermediate data result, taking the lower level information of the other intermediate data result as the upper level information and taking the upper level information of the one of the intermediate data results as the lower level information in the final data result.
The following description will be given by taking the stored procedures P1 and P2 as examples. The intermediate data results obtained by extracting the superior information, the inferior information, and the parent-child relationship in the stored process information of the stored process P1 are shown in table 1.
TABLE 1
Parent stored procedure-table-field Child storage process-table-field
P1-T2-F2 P1-T1-F1
P1-T3-F3 P1-T2-F2
P1-T4-F5 P1-T1-F4
The intermediate data results obtained by extracting the superior information, the inferior information, and the parent-child relationship in the stored process information of the stored process P2 are shown in table 2.
TABLE 2
Parent storage process-table-field Child storage process-table-field
P2-T5-F6 P2-T3-F3
P2-T6-F7 P2-T5-F6
P2-T8-F9 P2-T7-F8
P2-T9-F10 P2-T7-F8
The intermediate data results obtained from the stored procedures P1 and P2 are integrated according to the integration rules, and the final data results are shown in table 3.
TABLE 3
Parent storage process-table-field Child storage process-table-field
P1-T2-F2 P1-T1-F1
P1-T3-F3 P1-T2-F2
P1-T4-F5 P1-T1-F4
P2-T5-F6 P2-T3-F3
P2-T6-F7 P2-T5-F6
P2-T8-F9 P2-T7-F8
P2-T9-F10 P2-T7-F8
P2-T3-F3 P1-T3-F3
In tables 1, 2 and 3, P represents a stored procedure, T represents a table, and F represents a field. Extracting the upper information, the lower information, and the parent-child relationship between the upper information and the lower information from the m pieces of stored procedure information may be facilitated by operations S221 and S222.
In which, according to the data in table 3, the knowledge graph shown in fig. 4 can be constructed by using the parent storage process-table-field and the child storage process-table-field as nodes and the parent-child relationship between the parent storage process-table-field and the child storage process-table-field as edges.
FIG. 5 schematically illustrates a flow diagram for determining a chain of issue data in a knowledge graph from issue data over a pre-obtained t-second time period, according to an embodiment of the disclosure.
Operation S240 determines in the knowledge-graph that the question data chain includes operation S241 and operation S242 according to the question data within the t-second period obtained in advance.
In operation S241, nodes in the knowledge-graph are matched as problem nodes according to problem data within a t-second period obtained in advance, the problem data including problem tables and/or problem fields.
In operation S242, a problem data chain is determined using a graph traversal algorithm according to the problem node. Taking the knowledge graph in fig. 4 as an example, assuming that F3 is problem data, since nodes P1-T3-F3 and P2-T3-F3 both include F3, the problem nodes are determined to be P1-T3-F3 and P2-T3-F3, nodes P1-T3-F3 and P2-T3-F3 are found in the knowledge graph by using a graph traversal algorithm, and a connected subgraph including nodes P1-T3-F3 and P2-T3-F3 is determined to be a problem data chain, which is P2-T6-F7 → P2-T5-F6 → P2-T3-F3 → P1-T2-F2 → P1-T1-F1 in fig. 4. Determining a question data chain in the knowledge-graph based on the obtained question data may be facilitated by operations S241 and S242.
FIG. 6 schematically illustrates a flow diagram for repairing an issue data chain in accordance with an embodiment of the disclosure.
Operation S250 repairs the problem data chain, including operations S251 to S253.
In operation S251, an upstream node of the problem data in the problem data chain is determined as a node to be confirmed.
In operation S252, the problem data in the problem data chain and the downstream node thereof are determined as nodes to be repaired.
In operation S253, the node to be repaired is repaired.
Taking the problem data chain P2-T6-F7 → P2-T5-F6 → P2-T3-F3 → P1-T3-F3 → P1-T2-F2 → P1-T1-F1 as an example for explanation, the upstream nodes P2-T6-F7 and P2-T5-F6 of the problem node P2-T3-F3 are determined as the nodes to be confirmed; determining downstream nodes P1-T2-F2 and P1-T1-F1 of the problem node P2-T3-F3 as nodes to be repaired, and determining the problem nodes P1-T3-F3 and P2-T3-F3 as nodes to be repaired; therefore, nodes P1-T2-F2, P1-T1-F1, P1-T3-F3, and P2-T3-F3 need to be repaired. For the node to be confirmed, whether the node to be confirmed is a problem node can be confirmed in a manual checking mode, and then corresponding measures are taken. The problem data chain can be repaired conveniently through operations S251 to S253.
Fig. 7 schematically shows a flowchart of repairing a node to be repaired according to an embodiment of the present disclosure.
Operation S253 repairs the node to be repaired, including operation S2531.
In operation S2531, a custom script of the node to be repaired is written and/or a task readjustment is performed on the node to be repaired. Repairing the node to be repaired may be conveniently implemented through operation S2531.
Based on the data restoration method based on the knowledge graph, the present disclosure also provides a data restoration device 10 based on the knowledge graph. The knowledge-graph based data restoration device 10 will be described in detail below in conjunction with fig. 8-12.
Fig. 8 schematically shows a block diagram of the knowledge-graph based data recovery apparatus 10 according to an embodiment of the present disclosure.
The data restoration device 10 based on the knowledge graph comprises an acquisition module 1, an extraction module 2, a construction module 3, a determination module 4 and a restoration module 5.
An obtaining module 1, where the obtaining module 1 is configured to perform operation S210: acquiring m pieces of stored process information in a t second time period, wherein each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1.
An extraction module 2, the extraction module 2 being configured to perform operation S220: according to the m pieces of stored process information, extracting superior information, subordinate information and a parent-child relationship between the superior information and the subordinate information, wherein the superior information comprises a parent stored process-table-field, and the subordinate information comprises a child stored process-table-field.
A building block 3, the building block 3 being configured to perform operation S230: and constructing the knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein the nodes of the knowledge graph are constructed according to the superior information and the inferior information, and the edges of the knowledge graph are constructed according to the parent-child relationship.
A determining module 4, the determining module 4 being configured to perform operation S240: and determining a problem data chain in the knowledge graph according to the problem data in the t second time period obtained in advance.
A repair module 5, the repair module 5 being configured to perform operation S250: and repairing the problem data chain.
Fig. 9 schematically shows a block diagram of the structure of the extraction module 2 according to an embodiment of the present disclosure. The extraction module 2 includes an extraction unit 21 and a first determination unit 22.
And the extracting unit 21 is used for extracting the superior information, the inferior information and the parent-child relationship in each piece of stored process information to obtain an intermediate data result.
And the first determining unit 22, wherein the first determining unit 22 is configured to integrate the m intermediate data results according to an integration rule to obtain a final data result, and the final data result includes the upper information, the lower information, and a parent-child relationship between the upper information and the lower information in the m stored process information.
Fig. 10 schematically shows a block diagram of the structure of the determination module 4 according to the embodiment of the present disclosure. The determination module 4 includes a second determination unit 41 and a third determination unit 42.
A second determining unit 41, the second determining unit 41 being configured to match nodes in the knowledge-graph as problem nodes according to problem data within a t-second time period obtained in advance, the problem data comprising problem tables and/or problem fields.
And the third determining unit 42, wherein the third determining unit 42 is configured to determine the problem data chain by using a graph traversal algorithm according to the problem node.
Fig. 11 schematically shows a block diagram of the repair module 5 according to an embodiment of the present disclosure. The repair module 5 includes a fourth determination unit 51, a fifth determination unit 52, and a repair unit 53.
A fourth determining unit 51, where the fourth determining unit 51 is configured to determine an upstream node of the problem data in the problem data chain as the node to be confirmed.
A fifth determining unit 52, where the fifth determining unit 52 is configured to determine the problem data in the problem data chain and the downstream node thereof as the nodes to be repaired.
And the repairing unit 53, wherein the repairing unit 53 is used for repairing the node to be repaired.
Fig. 12 schematically shows a block diagram of the repair unit 53 according to an embodiment of the present disclosure. The repair unit 53 includes a repair element 531.
And the repairing element 531 is used for writing a custom script of the node to be repaired and/or resetting a task of the node to be repaired.
According to the data restoring apparatus 10 based on the knowledge graph of the embodiment of the present disclosure, by extracting the upper level information, the lower level information, and the parent-child relationship from the m pieces of stored process information in the t second period, the knowledge graph of the stored process information in the t second period can be constructed according to the upper level information, the lower level information, and the parent-child relationship, and therefore, other data associated with the problem data in the t second period can be found based on the side relationship of the knowledge graph, so that the problem data and the associated data can be restored. The data restoration method disclosed by the invention is low in dependence on manual degree, high in intelligent degree and high in efficiency and accuracy.
In addition, according to the embodiment of the present disclosure, any multiple modules of the obtaining module 1, the extracting module 2, the constructing module 3, the determining module 4, and the repairing module 5 may be combined and implemented in one module, or any one module may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module.
According to the embodiment of the present disclosure, at least one of the obtaining module 1, the extracting module 2, the constructing module 3, the determining module 4 and the repairing module 5 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementation manners of software, hardware and firmware, or implemented by a suitable combination of any several of them.
Alternatively, at least one of the obtaining module 1, the extracting module 2, the constructing module 3, the determining module 4 and the repairing module 5 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
The knowledge-graph based data restoration apparatus according to an embodiment of the present disclosure is described in detail below with reference to fig. 13 and 14. It is to be understood that the following description is illustrative only and is not intended to be in any way limiting of the present disclosure.
The knowledge-graph-based data restoration device according to an embodiment of the present disclosure includes the following 5 devices, as shown in fig. 13.
1. Storage process scanning means: inputting the names of all financial market storage processes, downloading corresponding storage process files, scanning storage process information to obtain tables and fields related to each storage process, and recording the tables and the fields as storage process _ table _ fields. And according to keywords such as INSERT, SELECT, UPDATE, MERGE, FROM and the like, the parent-child relationship among the fields can be determined by each storage process. For example, the following INSERT statements (SQL is greatly simplified) exist in the stored procedure P1:
INSERT INTO T1(F1)
SELECT T2.F2
FROM T2
INNER JOIN T3
ON T2.F3=T3.F4
it is obtained that the stored procedure _ table _ field P1_ T1_ F1 originates from the stored procedure _ table _ field P1_ T2_ F2 (P stands for stored procedure, T stands for table, F stands for field).
2. A metadata storage device: the stored procedure _ table _ field information and parent-child relationship information of each stored procedure acquired by the stored procedure scanning apparatus are stored in the relational database, taking the stored procedures P1 and P2 as an example, and the data are as shown in tables 4 and 5.
TABLE 4
Parent storage process-table-field Child storage process-table-field
P1-T2-F2 P1-T1-F1
P1-T3-F3 P1-T2-F2
P1-T4-F5 P1-T1-F4
TABLE 5
Parent storage process-table-field Child storage process-table-field
P2-T5-F6 P2-T3-F3
P2-T6-F7 P2-T5-F6
P2-T8-F9 P2-T7-F8
P2-T9-F10 P2-T7-F8
Stored procedure P1 uses the T3_ F3 field, while stored procedure P2 updates the T3_ F3 field to indicate that a parent-child relationship exists between the two. Matching the parent stored procedure _ table _ field and the child stored procedure _ table _ field of two different stored procedures can result in a parent-child relationship across the stored procedures. The processes P1 and P2 are stored in an integrated manner, and the data results are shown in Table 6.
TABLE 6
Parent storage process-table-field Child store process-table-field
P1-T2-F2 P1-T1-F1
P1-T3-F3 P1-T2-F2
P1-T4-F5 P1-T1-F4
P2-T5-F6 P2-T3-F3
P2-T6-F7 P2-T5-F6
P2-T8-F9 P2-T7-F8
P2-T9-F10 P2-T7-F8
P2-T3-F3 P1-T3-F3
3. The graph conversion device comprises: the 'storage process table field' is taken as a node of the graph, the parent-child node dependency relationship is taken as an edge of the graph, and the graph is drawn by the node and the edge.
The source trajectory of the P1_ T1_ F1 node in Table 6 is P2_ T6_ F7 → P2_ T5_ F6 → P2_ T3_ F3 → P1_ T3_ F3 → P1_ T2_ F2 → P1_ T1_ F1, according to which Table 6 can be transformed into a graph.
4. Graph query transpose: any table and field information is input, the corresponding storage process _ table _ field node can be matched, a subgraph is searched through a graph traversal algorithm (such as a depth-first traversal algorithm, a breadth-first traversal algorithm and the like), the upstream and downstream dependency relationship of the table and the field can be determined, and a data flow graph is obtained.
5. The data restoration device: inputting any table and field information, calling a graph query device, positioning a storage process _ table _ field node, displaying related upstream and downstream storage process _ table _ field nodes, defining the upstream node as a node to be confirmed, defining the node and the downstream node as nodes to be repaired, arranging data repair operation buttons beside each node, and selecting options comprising writing a custom script, executing task rescaling and the like. In addition, a one-key repair button is provided, after the button is turned on, all the nodes involved are displayed, and each node can freely select a repair mode.
A flow chart of the operation of the data recovery device based on knowledge-maps is shown in fig. 14. The specific working steps of the data recovery device based on the knowledge-graph are as follows.
1. The retrieval of each stored procedure with the stored procedure scanning means involves the information of the stored procedure table fields and the internal parent-child relationships.
2. The parent-child relationship information for stored procedure _ table _ fields within and across individual stored procedures is stored in a relational database using a metadata store.
3. The stored procedure _ table _ field and the parent-child relationship stored in the relational database are converted into a graph by the graph conversion means. The storage process _ table _ field is taken as a node of the graph, the dependency relationship of a parent node and a child node is taken as an edge of the graph, and the graph is drawn by the node and the edge.
4. For the requirement of the upstream and downstream information of the query table field, the graph query device matches the corresponding storage process _ table _ field node, and searches the subgraph related to the storage process _ table _ field node through a graph traversal algorithm, so that the upstream and downstream dependency relationship of the field of the graph can be determined, and a data flow graph is obtained.
5. For the requirement of repairing the error data, according to the input error table field, the simulation step 4 can locate the upstream and downstream dependency relationship of the field of the table to obtain the data flow graph. According to the data flow graph, an upstream node is defined as a node to be confirmed, the node and a downstream node are defined as nodes to be repaired, and a field corresponding to the graph is inquired to determine a minimum data repair range. Then, according to the specific situation, the field of the corresponding table is directly updated by using the function of writing the custom script in the data recovery device, or the corresponding storage process is readjusted by using the function of performing task readjustment of the data recovery device, so that the purpose of recovering the error data is achieved.
The present disclosure proposes an apparatus capable of determining financial market table field processing logic and repairing error data at a minimum time cost and calculation cost, which can quickly determine table fields that need to be updated and tasks that need to be rerun to repair the contents affected by error data. The method adopts a graph calculation technology to construct a depth-first traversal algorithm based on a graph to construct a relation graph of storage process, table and field dimensions. Based on the relational graph, the updating position of the table field can be quickly positioned, the dependence on skilled personnel is reduced, the data error repairing speed is improved, and the time cost and the calculation cost are saved.
Fig. 15 schematically shows a block diagram of an electronic device adapted to implement the above method according to an embodiment of the present disclosure.
As shown in fig. 15, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The driver 910 is also connected to an input/output (I/O) interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated by the flow chart. The program code is for causing a computer system to carry out the methods of the embodiments of the disclosure when the computer program product is run on the computer system.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal over a network medium, distributed, and downloaded and installed via the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the disclosure, and these alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (10)

1. A data restoration method based on knowledge graph is characterized by comprising the following steps:
acquiring m pieces of stored process information in a t second time period, wherein each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1;
extracting superior information, subordinate information and a parent-child relationship between the superior information and the subordinate information according to the m pieces of stored process information, wherein the superior information comprises a parent stored process-table-field, and the subordinate information comprises a child stored process-table-field;
constructing a knowledge graph according to the superior information, the inferior information and the parent-child relationship, wherein nodes of the knowledge graph are constructed according to the superior information and the inferior information, and edges of the knowledge graph are constructed according to the parent-child relationship;
determining a problem data chain in the knowledge graph according to the problem data obtained in advance within the t second time period; and
and repairing the problem data chain.
2. The method of claim 1, wherein extracting superior information, inferior information, and parent-child relationships between the superior information and the inferior information from the m stored procedure information comprises:
extracting superior information, inferior information and parent-child relationship in each storage process information to obtain an intermediate data result; and
and integrating m intermediate data results according to an integration rule to obtain a final data result, wherein the final data result comprises superior information and inferior information in the m pieces of stored process information and a parent-child relationship between the superior information and the inferior information.
3. The method of claim 2, wherein the aggregation rule comprises:
and merging the m intermediate data results, and in every two intermediate data results, when the table-field in the upper information of one intermediate data result is the same as the table-field in the lower information of the other intermediate data result, regarding the lower information of the other intermediate data result as upper information and regarding the upper information of the one intermediate data result as lower information in the final data result.
4. The method of claim 1, wherein determining a chain of problem data in the knowledge-graph from problem data obtained in advance over the t-second time period comprises:
matching nodes in the knowledge graph as problem nodes according to problem data obtained in advance in the t second time period, wherein the problem data comprises a problem table and/or a problem field; and
and determining a problem data chain by using a graph traversal algorithm according to the problem node.
5. The method of claim 1, wherein the repairing the problem data chain comprises:
determining an upstream node of the problem data in the problem data chain as a node to be confirmed;
determining the problem data and the downstream nodes thereof in the problem data chain as nodes to be repaired; and
and repairing the node to be repaired.
6. The method according to claim 5, wherein the repairing the node to be repaired comprises:
writing a custom script of the node to be repaired and/or performing task readjustment on the node to be repaired.
7. A knowledge-graph-based data remediation device, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for executing acquisition of m pieces of stored process information in a time period of t seconds, each piece of stored process information comprises a source and a destination of data, m is an integer greater than or equal to 1, and t is greater than or equal to 1;
an extraction module, configured to extract, according to the m pieces of stored procedure information, superior information, inferior information, and a parent-child relationship between the superior information and the inferior information, where the superior information includes a parent stored procedure-table-field, and the inferior information includes a child stored procedure-table-field;
a building module, configured to implement building of a knowledge graph according to the superior information, the inferior information, and the parent-child relationship, where nodes of the knowledge graph are built according to the superior information and the inferior information, and edges of the knowledge graph are built according to the parent-child relationship;
a determination module for performing a determination of a problem data chain in the knowledge graph according to problem data obtained in advance within the t second time period; and
a repair module to perform a repair of the problem data chain.
8. An electronic device, comprising:
one or more processors;
one or more memories for storing executable instructions that, when executed by the processor, implement the method of any one of claims 1-6.
9. A computer-readable storage medium, characterized in that the storage medium has stored thereon executable instructions which, when executed by a processor, implement the method according to any one of claims 1 to 6.
10. A computer program product, comprising a computer program comprising one or more executable instructions which, when executed by a processor, implement the method according to any one of claims 1 to 6.
CN202310075472.2A 2023-01-16 2023-01-16 Data restoration method, data restoration device, electronic device, medium, and program product Pending CN115964514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310075472.2A CN115964514A (en) 2023-01-16 2023-01-16 Data restoration method, data restoration device, electronic device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310075472.2A CN115964514A (en) 2023-01-16 2023-01-16 Data restoration method, data restoration device, electronic device, medium, and program product

Publications (1)

Publication Number Publication Date
CN115964514A true CN115964514A (en) 2023-04-14

Family

ID=87361884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310075472.2A Pending CN115964514A (en) 2023-01-16 2023-01-16 Data restoration method, data restoration device, electronic device, medium, and program product

Country Status (1)

Country Link
CN (1) CN115964514A (en)

Similar Documents

Publication Publication Date Title
US9928155B2 (en) Automated anomaly detection service on heterogeneous log streams
US11238058B2 (en) Search and retrieval of structured information cards
CN111061833B (en) Data processing method and device, electronic equipment and computer readable storage medium
US11599539B2 (en) Column lineage and metadata propagation
US20120023586A1 (en) Determining privacy risk for database queries
US8775226B2 (en) Computing and managing conflicting functional data requirements using ontologies
US10810009B2 (en) Visualizations of software project and contributor activity
US10437840B1 (en) Focused probabilistic entity resolution from multiple data sources
US11605012B2 (en) Framework for processing machine learning model metrics
US20200349128A1 (en) Clustering within database data models
CN114706856A (en) Fault processing method and device, electronic equipment and computer readable storage medium
CN116414855A (en) Information processing method and device, electronic equipment and computer readable storage medium
CN115964514A (en) Data restoration method, data restoration device, electronic device, medium, and program product
CN114186555A (en) Demand identification method, apparatus, electronic device, medium, and computer program
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium
CN113468244A (en) Atmospheric environmental pollution source management system, method, electronic device and storage medium
US20160373402A1 (en) Information Management and Notification System
CN115033416B (en) Method, device, electronic equipment and storage medium for determining abnormal information
CN114254081B (en) Enterprise big data search system, method and electronic equipment
US11803357B1 (en) Entity search engine powered by copy-detection
CN117453697A (en) Information processing method, device, equipment and storage medium
CN116484388A (en) Code entrainment identification method and device, electronic equipment and medium
CN114253946A (en) Tracking method, apparatus, electronic device, medium, and program product
CN115438151A (en) Method, device, equipment and medium for determining standard clauses
CN115905579A (en) Archive compiling and researching data generating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination