CN113742137A - Data disaster recovery method and system - Google Patents
Data disaster recovery method and system Download PDFInfo
- Publication number
- CN113742137A CN113742137A CN202111026638.9A CN202111026638A CN113742137A CN 113742137 A CN113742137 A CN 113742137A CN 202111026638 A CN202111026638 A CN 202111026638A CN 113742137 A CN113742137 A CN 113742137A
- Authority
- CN
- China
- Prior art keywords
- data
- service information
- current service
- hfile
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000011084 recovery Methods 0.000 title claims abstract description 40
- 238000004364 calculation method Methods 0.000 claims abstract description 118
- 238000012545 processing Methods 0.000 claims abstract description 46
- 238000013500 data storage Methods 0.000 claims abstract description 24
- 230000001360 synchronised effect Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 abstract description 24
- 230000000875 corresponding effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000013480 data collection Methods 0.000 description 4
- 238000011068 loading method Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data disaster recovery method and a data disaster recovery system, wherein the data disaster recovery system comprises a main platform and a recovery platform, the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a read cluster and a write cluster, the data acquisition layer is used for acquiring the current service information of a service system and sending the current service information to the data calculation layer; the data calculation layer receives the current service information, performs data processing on the current service information, converts the obtained current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster; the write cluster synchronizes the detected HFILE file to the read cluster and the standby platform. The invention can solve the problems that related data acquisition, data ETL and data index processing are repeated and data storage redundancy is caused by setting two sets of clusters and the two-place data comparison is inconsistent because of errors caused by various processing processes in the prior art.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a data disaster recovery method and a data disaster recovery system.
Background
The financial industry deposits and remits the relevant core transaction system, according to the requirements of the regulatory agency, need to realize the corresponding disaster recovery backup setting. Although the regulatory body has not made certain requirements on the disaster recovery of the big data, the big data technology has started to set corresponding disaster recovery operations on important application to external channels.
In the prior art, the same city multi-activity setting of a traditional business system is mainly simulated, and specifically, two sets of same hadoop clusters and application services are set in the same city. However, two sets of clusters are arranged, which not only easily causes the situations of related data acquisition, data ETL and data index processing repetition and data storage redundancy, but also easily causes the situation of inconsistent data comparison between two places due to errors caused by various processing processes.
Disclosure of Invention
In view of this, the invention provides a data disaster recovery method and system, so as to solve the problems in the prior art that related data acquisition, data ETL, and data index processing are repeated and data storage is redundant due to the arrangement of two sets of clusters, and the comparison between two data is inconsistent due to errors caused by various processing processes.
The first aspect of the invention discloses a data disaster recovery system, which comprises: the system comprises a main platform and a standby platform, wherein the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, and the system comprises:
the data acquisition layer is used for acquiring current service information of a service system and sending the current service information to the data calculation layer; wherein, the current service information comprises a plurality of current service data;
the data calculation layer is used for receiving the current service information, performing data processing on the current service information, converting an obtained current service information calculation result into an HFILE file, and inserting the HFILE file into the write cluster;
the write cluster is used for synchronizing the HFILE file to the read cluster and the standby platform when the inserted HFILE file is detected.
Optionally, the main platform further includes a data storage layer, and the data acquisition layer is further configured to:
and acquiring the current service information and storing the current service information.
Optionally, the receiving the current service information, performing data processing on the current service information, converting an obtained current service information calculation result into an HFILE file, and inserting the HFILE file into the data calculation layer of the write cluster, where the HFILE file is specifically configured to:
receiving the current service information sent by the data acquisition layer, and carrying out ETL processing on the current service information to obtain a current service information calculation result, wherein the current service information calculation result comprises a calculation result of each piece of current service data;
utilizing a putlist function to sequentially obtain a first preset number of calculation results of the service data from the current service information calculation results, converting the data formats of the preset number of calculation results of the service data into a target data format, generating an HFILE subfile, and inserting the HFILE subfile into one target Region node in the write cluster until all the calculation results of the service data are inserted;
the write cluster comprises a plurality of Region nodes, a second preset number of target Region nodes are preset in the Region nodes, the number of the target Region nodes is equal to the number of the HFILE subfiles, and the HFILE file is composed of the HFILE subfiles.
Optionally, when the HFILE file inserted is detected, the HFILE file is synchronized to the read cluster and the write cluster of the standby platform, and the HFILE file is specifically configured to:
synchronizing the HFILE subfiles to the read cluster and the standby platform when the inserted HFILE subfiles are detected until all the HFILE subfiles related to the current service information are synchronized to the read cluster and the standby platform.
Optionally, the system further includes:
and the read cluster is used for merging various HFILE subfiles related to the current service information.
The second aspect of the invention discloses a data disaster recovery method, which is applied to a data disaster recovery system, wherein the data disaster recovery system comprises: the method comprises the following steps that the main platform and the standby platform are respectively arranged, the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, and the method comprises the following steps:
the data acquisition layer acquires current service information of a service system and sends the current service information to the data calculation layer; wherein, the current service information comprises a plurality of current service data;
the data calculation layer receives the current service information, performs data processing on the current service information, converts an obtained current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster;
and when the writing cluster detects the inserted HFILE file, synchronizing the HFILE file to the reading cluster and the standby platform.
Optionally, the host platform further includes a data storage layer, and the method further includes:
and the data storage layer acquires the current service information and stores the current service information.
Optionally, the receiving, by the data computation layer, the current service information, performing data processing on the current service information, converting an obtained current service information computation result into an HFILE file, and inserting the HFILE file into the write cluster, where the method includes:
receiving the current service information sent by the data acquisition layer, and carrying out ETL processing on the current service information to obtain a current service information calculation result, wherein the current service information calculation result comprises a calculation result of each piece of current service data;
utilizing a putlist function to sequentially obtain a first preset number of calculation results of the service data from the current service information calculation results, converting the data formats of the preset number of calculation results of the service data into a target data format, generating an HFILE subfile, and inserting the HFILE subfile into one target Region node in the write cluster until all the calculation results of the service data are inserted;
the write cluster comprises a plurality of Region nodes, a second preset number of target Region nodes are preset in the Region nodes, the number of the target Region nodes is equal to the number of the HFILE subfiles, and the HFILE file is composed of the HFILE subfiles.
Optionally, when the write cluster detects the inserted HFILE file, synchronizing the HFILE file to the read cluster and the standby platform includes:
synchronizing the HFILE subfiles to the read cluster and the standby platform when the inserted HFILE subfiles are detected until all the HFILE subfiles related to the current service information are synchronized to the read cluster and the standby platform.
Optionally, when the write cluster detects the inserted HFILE file, after synchronizing the HFILE file to the read cluster, the method further includes:
and the read cluster combines all the HFILE subfiles related to the current service information.
The invention provides a data disaster backup method and a system, wherein two sets of Hadoop clusters are arranged in a data disaster backup system, the two Hadoop clusters are divided into a main platform and a backup platform, a data service layer, a data acquisition layer and a data calculation layer are arranged in the main platform, current service information of a service system is acquired through the data acquisition layer, the current service information is sent to the data calculation layer, the data calculation layer carries out data processing on the current service information after receiving the current service information to obtain a current service information calculation result, the HFILE file is inserted into a write cluster after the current service information calculation result is converted into a corresponding HFILE file, and the HFILE file is synchronized to a read cluster and the backup platform when the write cluster detects the inserted HFILE file. According to the technical scheme provided by the invention, two sets of Hadoop clusters are divided into a main platform and a standby platform, and a writing cluster and a reading cluster are arranged in the main platform, after the main platform finishes processing collected service information, an obtained result is inserted into the writing cluster, and finally the inserted result is synchronized to the reading cluster and the standby platform through the writing cluster, namely, the process of processing the service information is realized through the main platform, and the standby platform only needs to store the result synchronized by the main platform, so that the problems that related data acquisition, data ETL and data index processing are repeated and data storage redundancy are generated due to the arrangement of the two sets of Hadoop clusters, and the two-place data comparison is inconsistent due to errors caused by various processing processes in the prior art are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a data disaster recovery system according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of another data disaster recovery system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another data disaster recovery system according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of a data disaster recovery method according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of another data disaster recovery method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules, or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules, or units.
It is noted that references to "a", "an", and "the" modifications in the disclosure are exemplary rather than limiting, and that those skilled in the art will understand that "one or more" unless the context clearly dictates otherwise.
Referring to fig. 1, a schematic structural diagram of a data disaster recovery system provided in an embodiment of the present invention is shown, where the data disaster recovery system includes a main platform 101 and a backup platform 102, the main platform includes a data service layer, a data acquisition layer, and a data computation layer, and the data service layer includes a read cluster and a write cluster.
In the embodiment of the application, two Hadoop clusters can be divided into a main platform and a standby platform by arranging two Hadoop clusters in the data disaster recovery system, a data service layer, a data acquisition layer and a data calculation layer are arranged in the main platform, and a read cluster and a write cluster are arranged in the data service layer.
And the data acquisition layer is used for acquiring current service information of the service system and sending the current service information to the data calculation layer, wherein the current service information comprises a plurality of pieces of current service data.
In the embodiment of the present application, the current service information collecting manner of the service system by the data obtaining layer may be specifically divided into two manners, one is active data collection, and the other is passive data collection.
The active data acquisition process specifically comprises the following steps: the data acquisition layer acquires the current service information of the service system in batch by adopting a big data acquisition technology or acquires the current service information of the service system in real time by adopting kafka + flex.
It should be noted that the data acquisition layer calls the sqoop component to acquire the current service information of the service system in batch by using a big data acquisition technology.
The passive data acquisition process specifically comprises the following steps: and receiving current service information actively sent by the service system, and loading the received current service information.
And the data calculation layer is used for receiving the current service information sent by the data acquisition layer, processing data on the received current service information to obtain a current service information calculation result, converting the obtained current service information calculation result into an HFILE file, and inserting the obtained HFILE file into the write cluster.
In the embodiment of the present application, in order to provide efficiency of inserting a write cluster, a data calculation layer receives current service information sent by a data acquisition layer, performs ETL processing on the current service information to obtain a current service information calculation result, and then may obtain a calculation result of a first preset number of service data from the obtained current service information calculation result by using a putlist function, and convert a data format of the calculation result of the first preset number of service data into a target data format to generate an HFILE subfile; and inserting the HFILE subfile into a target Region node in the write cluster until all the calculation results of the business data are inserted. The target data format may be a binary format, and the ETL is an abbreviation of Extract, transform, load, which means that data processing such as extraction, cleaning, conversion, loading, and the like can be performed on current service information.
Or, after receiving the current service information sent by the data acquisition layer, the data calculation layer performs ETL processing on the current service information to obtain a current service information calculation result, and may sequentially obtain, by using put and in an asynchronous manner, calculation results of a first preset number of service data from the obtained current service information calculation result, convert the data format of the calculation results of the first preset number of service data into a target data format, generate an HFILE subfile, and insert the HFILE subfile into one target Region node in the write cluster until all calculation results of the service data are inserted. The target data format may be a binary format.
In this embodiment of the present application, the write cluster includes a plurality of Region nodes, in order to ensure reasonable use of the Region nodes in the write cluster, a second preset number of target Region nodes may be set in advance in the plurality of Region nodes of the write cluster, and the number of the set target Region nodes may be equal to the number of the HFILE subfiles to be generated.
Optionally, referring to fig. 2 in conjunction with fig. 1, the main platform provided by the present invention further includes a data storage layer.
The data storage layer is used for acquiring current service information of the data acquisition layer acquisition service system, storing the acquired current service information so that the data calculation layer can acquire the current service information from the data storage layer, carrying out ETL processing on the current service information, converting an acquired current service information calculation result into an HFILE file, and inserting the HFILE file into the write cluster.
Furthermore, the data calculation layer can also acquire historical service information of the service system from the data storage layer, correlate the historical service information of the service system with the current service information of the service system, and then perform ETL processing on the current service information by using the correlated historical service information to obtain a current service information calculation result.
And the write cluster is used for synchronizing the HFILE file to the read cluster and the standby platform when the inserted HFILE file is detected.
Optionally, referring to fig. 3 in combination with fig. 2, a read cluster is preset in the standby platform, and the write cluster may synchronize the HFILE file to the read cluster of the standby platform.
In this embodiment of the present application, when the write cluster detects the inserted HFIE subfile, the write cluster synchronizes the HFILE subfile to the read cluster of the host platform and the read cluster of the standby platform until all HFILE subfiles related to the current service information are synchronized to the read cluster of the host platform and the read cluster of the standby platform.
Further, in the embodiment of the present application, in order to reduce the number of HFILE subfiles stored in the read cluster of the main platform or the read cluster of the standby platform, related HFILE subfiles in the read cluster may be merged, so as to implement fast retrieval and avoid reading too long delay. The read cluster of the host platform or the read cluster of the standby platform may merge the HFILE subfiles associated with the current service information.
Furthermore, it should be noted that, in order to improve the data localization rate of the main platform and the standby platform, the read cluster of the main platform or the read cluster of the standby platform may perform a major _ compact on the currently stored data in the service low peak area; in addition, in order to avoid unstable cluster resources caused by merging of a large number of HFILE subfiles, the read cluster of the host platform or the read cluster of the standby platform may be closed to avoid the traffic overhead caused by a large number of compacts.
It should be noted that when the service system accesses the data of the read cluster, the high-frequency batch data query response efficiency can be improved and the number of times of Remote Procedure call protocol (RPC) can be reduced by reasonably setting the number of cache lines of the batch data query scan, which is 5 times of the number of single query lines.
The invention provides a data disaster backup system, which is characterized in that two sets of Hadoop clusters are arranged in the data disaster backup system, the two Hadoop clusters are divided into a main platform and a backup platform, a data service layer, a data acquisition layer and a data calculation layer are arranged in the main platform, current service information of a service system is acquired through the data acquisition layer, the current service information is sent to the data calculation layer, the data calculation layer carries out data processing on the current service information after receiving the current service information to obtain a current service information calculation result, the current service information calculation result is converted into a corresponding HFILE file, and the HFILE file is inserted into a write cluster so that the write cluster synchronizes the HFILE file to a read cluster and the backup platform when detecting the inserted HFILE file. According to the technical scheme provided by the invention, two sets of Hadoop clusters are divided into a main platform and a standby platform, and a writing cluster and a reading cluster are arranged in the main platform, after the main platform finishes processing collected service information, an obtained result is inserted into the writing cluster, and finally the inserted result is synchronized to the reading cluster and the standby platform through the writing cluster, namely, the process of processing the service information is realized through the main platform, and the standby platform only needs to store the result synchronized by the main platform, so that the problems that related data acquisition, data ETL and data index processing are repeated and data storage redundancy are generated due to the arrangement of the two sets of Hadoop clusters, and the two-place data comparison is inconsistent due to errors caused by various processing processes in the prior art are solved.
Based on the data disaster recovery system shown in fig. 1, the invention also correspondingly discloses a data disaster recovery method, and the data disaster recovery system comprises: the data disaster recovery method comprises a main platform and a backup platform, wherein the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, as shown in fig. 4, a flow diagram of the data disaster recovery method is provided for the embodiment of the invention, and the data disaster recovery method specifically comprises the following steps:
s401: the data acquisition layer acquires current service information of the service system and sends the current service information to the data calculation layer; wherein the current service information includes a plurality of pieces of current service data.
In the embodiment of the present application, the current service information collecting manner of the service system by the data obtaining layer may be specifically divided into two manners, one is active data collection, and the other is passive data collection.
The active data acquisition process specifically comprises the following steps: the data acquisition layer acquires the current service information of the service system in batch by adopting a big data acquisition technology or acquires the current service information of the service system in real time by adopting kafka + flex.
It should be noted that the data acquisition layer calls the sqoop component to acquire the current service information of the service system in batch by using a big data acquisition technology.
The passive data acquisition process specifically comprises the following steps: and receiving current service information actively sent by the service system, and loading the received current service information.
S402: and the data calculation layer receives the current service information, performs data processing on the current service information, converts the obtained current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster.
And the data calculation layer is used for receiving the current service information sent by the data acquisition layer, processing data on the received current service information to obtain a current service information calculation result, converting the obtained current service information calculation result into a layer HFILE file, and inserting the obtained HFILE file into the write cluster.
In this embodiment of the present application, in order to provide efficiency of inserting a write cluster, after receiving current service information sent by a data acquisition layer, a data calculation layer processes an ETL to obtain a current service information calculation result, a putist function may be used to sequentially acquire a calculation result of a first preset number of service data from the obtained current service information calculation result, convert a data format of the calculation result of the first preset number of service data into a target data format, generate an HFILE subfile, and insert the HFILE subfile into one target Region node in the write cluster until all calculation results of the service data are inserted. The target data format may be a binary format. Wherein, ETL is an abbreviation of Extract, transform, load, and can only perform data processing such as extraction, cleaning, conversion, loading, etc. on current service information.
Or, after the data calculation layer receives the current service information sent by the data acquisition layer, processes the ETL to obtain the calculation result of the current service information, sequentially obtains the calculation result of the first preset number of service data from the calculation result of the current service information by using a put in an asynchronous manner, converts the data format of the calculation result of the first preset number of service data into a target data format, generates an HFILE subfile, and inserts the HFILE subfile into one target Region node in the write cluster until all the calculation results of the service data are inserted. The target data format may be a binary format.
In this embodiment of the present application, the write cluster includes a plurality of Region nodes, and in order to ensure reasonable use of the Region nodes in the write cluster, a second preset number of target regions may be set in the plurality of Region nodes of the write cluster in advance, and data of the set target regions is equal to data volume of the HFILE subfile to be generated.
S403: and when the write cluster detects the inserted HFILE file, synchronizing the HFILE file to the read cluster and the standby platform.
In this embodiment of the present application, when the write cluster detects the inserted HFIE subfile, the write cluster synchronizes the HFILE subfile to the read cluster of the host platform and the read cluster of the standby platform until all HFILE subfiles related to the current service information are synchronized to the read cluster of the host platform and the read cluster of the standby platform.
Further, in the embodiment of the present application, in order to reduce the number of HFILE subfiles stored in the read cluster of the main platform or the read cluster of the standby platform, related HFILE subfiles in the read cluster may be merged, so as to implement fast retrieval and avoid reading too long delay. The read cluster of the host platform or the read cluster of the standby platform may merge the HFILE subfiles associated with the current service information.
Furthermore, it should be noted that, in order to improve the data localization rate of the main platform and the standby platform, the read cluster of the main platform or the read cluster of the standby platform may perform a major _ compact on the currently stored data in the service low peak area; in addition, in order to avoid unstable cluster resources caused by merging of a large number of HFILE subfiles, the read cluster of the host platform or the read cluster of the standby platform may be closed to avoid the traffic overhead caused by a large number of compacts.
It should be noted that when the service system accesses the data of the read cluster, the high-frequency batch data query response efficiency can be improved and the number of times of Remote Procedure call protocol (RPC) can be reduced by reasonably setting the number of cache lines of the batch data query scan, which is 5 times of the number of single query lines.
The invention provides a data disaster backup method, which comprises the steps of setting two sets of Hadoop clusters in a data disaster backup system, dividing the two Hadoop clusters into a main platform and a backup platform, setting a data service layer, a data acquisition layer and a data calculation layer in the main platform, acquiring current service information of a service system through the data acquisition layer, sending the current service information to the data calculation layer, enabling the data calculation layer to process the current service information after receiving the current service information to obtain a current service information calculation result, converting the current service information calculation result into a corresponding HFILE file, and then inserting the HFILE file into a write cluster so as to enable the write cluster to synchronize the HFILE file to a read cluster and the backup platform when detecting the inserted HFILE file. According to the technical scheme provided by the invention, two sets of Hadoop clusters are divided into a main platform and a standby platform, and a writing cluster and a reading cluster are arranged in the main platform, after the main platform finishes processing collected service information, an obtained result is inserted into the writing cluster, and finally the inserted result is synchronized to the reading cluster and the standby platform through the writing cluster, namely, the process of processing the service information is realized through the main platform, and the standby platform only needs to store the result synchronized by the main platform, so that the problems that related data acquisition, data ETL and data index processing are repeated and data storage redundancy are generated due to the arrangement of the two sets of Hadoop clusters, and the two-place data comparison is inconsistent due to errors caused by various processing processes in the prior art are solved.
Referring to fig. 5, another data disaster recovery method provided in the embodiment of the present invention is shown, where the data disaster recovery method applies a data disaster recovery system, and the data disaster recovery system includes: the data disaster recovery method comprises a main platform and a standby platform, wherein the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, and the data disaster recovery method specifically comprises the following steps:
s501: the data acquisition layer acquires current service information of the service system and sends the current service information to the data storage layer; wherein the current service information includes a plurality of pieces of current service data.
In the process of specifically executing step S501, the specific execution process and implementation principle of step S501 are the same as the specific execution process and implementation principle of step S401 in fig. 4 disclosed in the foregoing embodiment of the present invention, and reference may be made to corresponding parts in fig. 4 disclosed in the foregoing embodiment of the present invention, which is not described again here.
S502: and the data storage layer acquires the current service information and stores the current service information.
In the process of specifically executing step S502, the data storage layer acquires current service information of the data acquisition layer acquisition service system, and stores the acquired current service information, so that the data calculation layer acquires the current service information from the data storage layer, performs ETL processing on the current service information, converts the acquired current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster.
S503: and the data calculation layer acquires the current service information from the data storage layer, performs ETL processing on the current service information, converts the acquired current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster.
And the data calculation layer acquires the current service information of the service system from the data storage, performs data processing on the acquired current service information to obtain a current service information calculation result, converts the obtained current service information calculation result into an HFILE file, and inserts the obtained HFILE file into the write cluster. For a specific process of converting the obtained current service information calculation result into an HFILE file and inserting the obtained HFILE file into the write cluster, reference may be made to the corresponding part in step S402 in fig. 4 disclosed in the foregoing embodiment of the present invention, which is not described herein again.
Furthermore, the data calculation layer can also acquire historical service information of the service system from the data storage layer, correlate the historical service information of the service system with the current service information of the service system, and then perform ETL processing on the current service information by using the correlated historical service information to obtain a current service information calculation result.
S504: and when the write cluster detects the inserted HFILE file, synchronizing the HFILE file to the read cluster and the standby platform.
In the process of specifically executing step S504, the specific execution process and implementation principle of step S504 are the same as the specific execution process and implementation principle of step S403 in fig. 4 disclosed in the foregoing embodiment of the present invention, and reference may be made to the corresponding parts in fig. 4 disclosed in the foregoing embodiment of the present invention, which are not described again here.
In the embodiment of the application, the data storage layer is arranged in the main platform, so that after the data acquisition layer acquires the current service information of the service system, the data storage layer acquires the current service information acquired by the data acquisition layer, and stores the acquired current service information.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are merely illustrative, wherein units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.
Claims (10)
1. A data disaster recovery system, comprising: the system comprises a main platform and a standby platform, wherein the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, and the system comprises:
the data acquisition layer is used for acquiring current service information of a service system and sending the current service information to the data calculation layer; wherein, the current service information comprises a plurality of current service data;
the data calculation layer is used for receiving the current service information, performing data processing on the current service information, converting an obtained current service information calculation result into an HFILE file, and inserting the HFILE file into the write cluster;
the write cluster is used for synchronizing the HFILE file to the read cluster and the standby platform when the inserted HFILE file is detected.
2. The system of claim 1, wherein the host platform further comprises a data storage layer, and wherein the data acquisition layer is further configured to:
and acquiring the current service information and storing the current service information.
3. The system according to claim 1, wherein the receiving the current service information, performing data processing on the current service information, converting an obtained current service information calculation result into an HFILE file, and inserting the HFILE file into a data calculation layer of the write cluster, is specifically configured to:
receiving the current service information sent by the data acquisition layer, and carrying out ETL processing on the current service information to obtain a current service information calculation result, wherein the current service information calculation result comprises a calculation result of each piece of current service data;
utilizing a putlist function to sequentially obtain a first preset number of calculation results of the service data from the current service information calculation results, converting the data formats of the preset number of calculation results of the service data into a target data format, generating an HFILE subfile, and inserting the HFILE subfile into one target Region node in the write cluster until all the calculation results of the service data are inserted;
the write cluster comprises a plurality of Region nodes, a second preset number of target Region nodes are preset in the Region nodes, the number of the target Region nodes is equal to the number of the HFILE subfiles, and the HFILE file is composed of the HFILE subfiles.
4. The system of claim 3, wherein the synchronizing the HFILE file to the read cluster and a write cluster of the standby platform when the inserted HFILE file is detected is specifically configured to:
synchronizing the HFILE subfiles to the read cluster and the standby platform when the inserted HFILE subfiles are detected until all the HFILE subfiles related to the current service information are synchronized to the read cluster and the standby platform.
5. The system of claim 3, further comprising:
and the read cluster is used for merging various HFILE subfiles related to the current service information.
6. The data disaster recovery method is characterized by being applied to a data disaster recovery system, wherein the data disaster recovery system comprises: the method comprises the following steps that the main platform and the standby platform are respectively arranged, the main platform comprises a data service layer, a data acquisition layer and a data calculation layer, the data service layer comprises a reading cluster and a writing cluster, and the method comprises the following steps:
the data acquisition layer acquires current service information of a service system and sends the current service information to the data calculation layer; wherein, the current service information comprises a plurality of current service data;
the data calculation layer receives the current service information, performs data processing on the current service information, converts an obtained current service information calculation result into an HFILE file, and inserts the HFILE file into the write cluster;
and when the writing cluster detects the inserted HFILE file, synchronizing the HFILE file to the reading cluster and the standby platform.
7. The method of claim 6, wherein the host platform further comprises a data storage layer, the method further comprising:
and acquiring the current service information and storing the current service information.
8. The method of claim 6, wherein the receiving the current service information and the data processing of the current service information by the data computation layer, converting the obtained current service information computation result into an HFILE file, and inserting the HFILE file into the write cluster comprises:
receiving the current service information sent by the data acquisition layer, and carrying out ETL processing on the current service information to obtain a current service information calculation result, wherein the current service information calculation result comprises a calculation result of each piece of current service data;
utilizing a putlist function to sequentially obtain a first preset number of calculation results of the service data from the current service information calculation results, converting the data formats of the preset number of calculation results of the service data into a target data format, generating an HFILE subfile, and inserting the HFILE subfile into one target Region node in the write cluster until all the calculation results of the service data are inserted;
the write cluster comprises a plurality of Region nodes, a second preset number of target Region nodes are preset in the Region nodes, the number of the target Region nodes is equal to the number of the HFILE subfiles, and the HFILE file is composed of the HFILE subfiles.
9. The method of claim 8, wherein synchronizing the HFILE file to the read cluster and the standby platform when the write cluster detects the HFILE file inserted comprises:
synchronizing the HFILE subfiles to the read cluster and the standby platform when the inserted HFILE subfiles are detected until all the HFILE subfiles related to the current service information are synchronized to the read cluster and the standby platform.
10. The method of claim 6, wherein the write cluster synchronizes the HFILE file to the read cluster after detecting the inserted HFILE file, the method further comprising:
and the read cluster combines all the HFILE subfiles related to the current service information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111026638.9A CN113742137B (en) | 2021-09-02 | 2021-09-02 | Data disaster recovery method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111026638.9A CN113742137B (en) | 2021-09-02 | 2021-09-02 | Data disaster recovery method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113742137A true CN113742137A (en) | 2021-12-03 |
CN113742137B CN113742137B (en) | 2024-10-08 |
Family
ID=78735032
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111026638.9A Active CN113742137B (en) | 2021-09-02 | 2021-09-02 | Data disaster recovery method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113742137B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060218210A1 (en) * | 2005-03-25 | 2006-09-28 | Joydeep Sarma | Apparatus and method for data replication at an intermediate node |
CN105933446A (en) * | 2016-06-28 | 2016-09-07 | 中国农业银行股份有限公司 | Service dual-active implementation method and system of big data platform |
CN109901949A (en) * | 2019-02-25 | 2019-06-18 | 中国工商银行股份有限公司 | The application disaster recovery and backup systems and method of dual-active data center |
-
2021
- 2021-09-02 CN CN202111026638.9A patent/CN113742137B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060218210A1 (en) * | 2005-03-25 | 2006-09-28 | Joydeep Sarma | Apparatus and method for data replication at an intermediate node |
CN105933446A (en) * | 2016-06-28 | 2016-09-07 | 中国农业银行股份有限公司 | Service dual-active implementation method and system of big data platform |
CN109901949A (en) * | 2019-02-25 | 2019-06-18 | 中国工商银行股份有限公司 | The application disaster recovery and backup systems and method of dual-active data center |
Also Published As
Publication number | Publication date |
---|---|
CN113742137B (en) | 2024-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102148850B (en) | Cluster system and service processing method thereof | |
CN109542865B (en) | Method, device, system and medium for synchronizing configuration files of distributed cluster system | |
CN108833479B (en) | Data synchronization method and device | |
CN103530362B (en) | A kind of computer data reading/writing method for many copies distributed system | |
CN104050276A (en) | Cache processing method and system of distributed database | |
CN103825918B (en) | Method of data synchronization, terminal device, server and system | |
CN110351313B (en) | Data caching method, device, equipment and storage medium | |
KR20160002656A (en) | Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching | |
CN107870982A (en) | Data processing method, system and computer-readable recording medium | |
CN110134737A (en) | Data variation monitor method and device, electronic equipment and computer readable storage medium | |
CN113010549A (en) | Data processing method based on remote multi-active system, related equipment and storage medium | |
CN108228581B (en) | Zookeeper compatible communication method, server and system | |
CN107422980B (en) | Internet of things data file storage system and data file storage method thereof | |
CN105005518A (en) | System and method for automatically aggregating transaction data of redundant system as well as processor | |
CN115617571A (en) | Data backup method, device, system, equipment and storage medium | |
CN113297201B (en) | Index data synchronization method, system and device | |
CN113742137A (en) | Data disaster recovery method and system | |
CN110121712B (en) | Log management method, server and database system | |
CN108460163A (en) | K-DB method for synchronizing data of database, device, equipment and storage medium | |
CN113905054A (en) | Kudu cluster data synchronization method, device and system based on RDMA | |
CN102339305A (en) | Database cluster failure recovery method on basis of partial ordering relation logs | |
CN112053150A (en) | Data processing method, device and storage medium | |
CN116866363A (en) | Distributed file synchronization method, device, equipment and storage medium | |
CN111459940B (en) | Efficient and reliable data integration method | |
CN108183966A (en) | A kind of cloud stocking system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |