WO2023046042A1 - Data backup method and database cluster - Google Patents
Data backup method and database cluster Download PDFInfo
- Publication number
- WO2023046042A1 WO2023046042A1 PCT/CN2022/120709 CN2022120709W WO2023046042A1 WO 2023046042 A1 WO2023046042 A1 WO 2023046042A1 CN 2022120709 W CN2022120709 W CN 2022120709W WO 2023046042 A1 WO2023046042 A1 WO 2023046042A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- log
- data node
- data
- storage device
- physical log
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 85
- 238000007726 management method Methods 0.000 claims description 43
- 238000012546 transfer Methods 0.000 claims description 35
- 238000012217 deletion Methods 0.000 claims description 18
- 230000037430 deletion Effects 0.000 claims description 18
- 238000012986 modification Methods 0.000 claims description 12
- 230000004048 modification Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 abstract description 17
- 230000004888 barrier function Effects 0.000 description 52
- 230000015654 memory Effects 0.000 description 47
- 238000004891 communication Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 16
- 238000011084 recovery Methods 0.000 description 16
- 238000012545 processing Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 14
- 230000010076 replication Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000009471 action Effects 0.000 description 9
- 239000012634 fragment Substances 0.000 description 9
- 238000013500 data storage Methods 0.000 description 8
- 230000001360 synchronised effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000005192 partition Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000001174 ascending effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000003139 buffering effect Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002688 persistence Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
Definitions
- the present application relates to the field of databases, and more specifically, relates to a data backup method and a database cluster.
- Data disaster recovery refers to the establishment of an off-site data system, which is an available copy of local key application data.
- the system has at least one copy of available business-critical data in off-site.
- the main technology it adopts is data backup and data replication technology, and the processing of data disaster recovery is actually the processing of off-site data replication.
- the remote data can be a full real-time replication of the local production data (synchronous replication), or it can be slightly behind the local data (asynchronous replication).
- Oracle database Take the data disaster recovery of Oracle database as an example.
- Oracle database deploys a primary storage library and a backup storage library in two computer rooms in the same city (or different places).
- the backup storage library is the main storage Repository backup, and use data protection technology to synchronize data between primary and backup repositories.
- the basic process of data synchronization is that when the primary repository generates a physical log, it is transmitted to the standby repository in the form of synchronous or asynchronous replication through the pre-configured transmission method, so as to realize data replication between the primary repository and the standby repository.
- the transmission of physical logs is carried out by means of a dedicated network line.
- the dedicated network line leads to high cost of network facilities under this architecture, and the transmission efficiency of the transmission mode using a dedicated network line is low.
- the embodiment of the present application provides a data backup method, which can improve data transmission efficiency during data backup.
- the present application provides a data backup method, the method is applied to a first database cluster, the first database cluster includes a first data node, and the method includes: the first data node obtains the first A physical log, the first physical log includes operation information on data in the first data node; the first data node writes the first physical log into a first storage device, and the first storage device uses to transfer the first physical log to the second storage device, so that the second data node in the second database cluster obtains the first physical log from the second storage device, wherein the first storage device Deployed in the first database cluster, the second storage device is deployed in the second database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node uses to be the backup node of the first data node.
- synchronous replication of data may be performed between the first storage device and the second storage device, and the first storage device and the second storage device may be storage devices capable of remote and parallel data transmission.
- the data nodes in the first database cluster can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
- RPO recovery point objective
- the first physical log includes operation information on the data in the first data node, and the operation information indicates a modification operation, a write operation, and/or
- the first physical log can be a Redo log, also known as an XLog, which can record the physical modification of the data page (or describe it as a change in data), and can be used to restore the physical data page after the transaction is committed.
- the data in the first data node make a transaction commit.
- the transaction commit can ensure the persistence of the operation information on the data in the first physical log. If the first data node transfers the first physical log to the first storage device after completing the transaction commit of the first physical log, since the first The data node may fail before the transaction submission is completed, and the first physical log cannot be delivered to the first storage device, that is, the standby data node cannot obtain the first physical log for log playback. In the embodiment of this application, before the first data node commits the transaction on the data in the first data node, it transfers the first physical log to the first storage device, and writes the first physical log into the first After the storage device succeeds, it can be considered that the first physical log has been copied to the standby data node. Even if the first data node fails before the transaction commit is completed, the first physical log can be transferred to the standby data node.
- the first data node may write the first physical log into the first storage device, and the first storage device may include a storage area (first storage area) allocated for the first data node to For example, the first data node belongs to the first shard in the first database cluster.
- the first storage device may include a shared volume for the first shard, and each data node in the first shard can share the shared volume. roll.
- the first storage device may include a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node, the first A storage space is different from the second storage space.
- the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel.
- the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
- the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel.
- the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- the first storage device includes a first storage space for storing the physical log of the first data node, and the first data node is based on the When the storage space available in the first storage space is less than the storage space required by the first physical log, the target storage space may be determined from the first storage space, the target storage space stores the target physical log, and Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log.
- the first data node when the size of the first storage space is insufficient, the occupied storage space in the first storage space is emptied and reused, and in order to prevent the standby data node from being emptied before receiving the physical log, the first data node
- the cleared physical log must be the physical log that has been played back by the standby data node (this information can be fed back to the first data node after the standby data node completes the log playback).
- the storage address of the first storage space includes a head address and a tail address
- the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage
- the determining the target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log includes: The storage space corresponding to the tail address is occupied, and the storage space corresponding to the head address is determined from the first storage space as the target storage space.
- the first storage device is a raw device.
- a raw device which can also be called a raw partition (that is, a raw partition)
- a raw partition that is, a raw partition
- the first data node may write the first physical log into the first storage device based on direct I/O, which improves read and write performance.
- the first data node may also include commit information
- the second physical log of is written into the first storage device, and the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can read from the second
- the storage device obtains the second physical log, and the commit information indicates that the first data node has completed the transaction commit of the first physical log.
- the commit information can be used as a reference for the global consistency point when the second database cluster performs log playback.
- the commit information may include a transaction commit number, which may be used to identify a committed database transaction (also called transaction, transaction).
- a transaction is a logical unit for a data storage node to perform database operations, and consists of a sequence of database operations.
- a transaction in the submitted state indicates that the transaction has been successfully executed, and the data involved in the transaction has been written to the data storage node.
- the first database cluster further includes a fourth data node, and the fourth data node is used as a backup node for the first data node, and the method further includes: the fourth The data node acquires the first physical log from the first storage device; the fourth data node performs log replay according to the first physical log.
- the fourth data node may first read the control information of the header of the first storage device, and after verification, compare the writing progress of the log on the storage device with the local physical log. If there is an updated physical log on the storage device, Then read the physical log and copy it to the local, and play it back. If there is no data to read, wait in a loop.
- the present application provides a data backup method, the method is applied to a second database cluster, the second database cluster includes a second data node, and the second data node is used as the first data node Backup node, the first data node belongs to the first database cluster, the first database cluster and the second database cluster are different database clusters, the first storage device is deployed in the first database cluster, and the second storage The device is deployed in the second database cluster, and the method includes: the second data node acquires a first physical log from the second storage device, and the first physical log is from the first data node and The physical log stored in the second storage device via the first storage device, the first physical log includes operation information on the data in the first data node; the second data node according to the The above-mentioned first physical log performs log playback.
- the data nodes in the first database cluster can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
- RPO recovery point objective
- the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
- the second storage device may include a storage area (third storage area) for the second data node, taking the second data node belonging to the first fragment in the second database cluster as an example,
- the second storage device may include a shared volume allocated for the first slice, and each data node in the first slice may share the shared volume.
- the second storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node, the first The third storage space is different from the fourth storage space.
- the second data node can read the physical log from the second storage device in parallel while other data nodes read the physical log from the first data node. Acquire the first physical log from the second storage device.
- the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
- the management node in the second database cluster can maintain a global information (this global information can be called a barrier point), the global information
- This global information can be called a barrier point
- the log sequence number of the physical log obtained by each standby data node, and the log sequence number is the smallest log sequence number among the largest physical log sequence numbers currently committed by each master node.
- the management node obtains: The current transaction submission progress of node 1 is 1, 2, the current transaction submission progress of master node 2 is 1, 2, the current transaction submission progress of master node 3 is 1, 2, 3, and the current transaction submission progress of master node 4 is 1 , 2, 3, 4, then 3 is the smallest physical log sequence number among the maximum physical log sequence numbers that each master node has currently committed transactions obtained by the management node.
- physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Transactions are committed to the physical log in ascending order, that is, the sequence number can indicate the progress of the transaction submission of the master node.
- management node may be realized through cooperation among modules such as CMA, CMS, and ETCD in the second database cluster.
- the coordinating node in the first database cluster can obtain the current transaction submission progress of each primary data node (that is, the log sequence number of the physical log that completes the transaction submission, for example, it can be called a barrier point), and will include the transaction submission
- the progress submission information is transmitted to the second database cluster in the form of physical logs (this step can also be completed by the primary data node).
- the standby data node obtains the physical log carrying the submission information from the second storage device,
- the log is written to the local disk, and the newly placed physical log is parsed, and the parsed barrier point is stored in the hash table, and the largest barrier point currently received is recorded, and the largest barrier point is the primary data node
- the function of the management node can be realized by the cooperation of CMA, CMS, and ETCD.
- the CMA queries the maximum value of the barrier of CN and DN to the CMS, and the CMS can send each standby data node
- the minimum value of the largest barrier point above is used as the "candidate serial number" (or called the value to be detected), and stored in ETCD;
- CMA obtains the "value to be detected” from ETCD, queries the DN, and confirms whether the DN exists
- the CMS performs the following judgment: if the "value to be detected "The corresponding physical log exists in each standby data node, it can be stored in ETCD as the "target serial number” (or simply called the target value) point, otherwise discarded, CMA reads the "target value” in ETCD, Update the local "target value”.
- CMA In a report, CMA needs to query and execute the report of the maximum value of the barrier, locally query whether the "value to be detected” exists, and update the "target value”; CMS needs to perform the update of the "value to be detected” and the “target value” renew.
- Barrier deletion is the end point of consistency. Barrier deletion occurs during physical log playback. During log playback, when the playback reaches the barrier point, the playback position will be updated, and the barrier point will be deleted in the hash table, thus completing the generation of the barrier to the entire process of deletion.
- a target sequence number is maintained as global information based on the management node in the second database cluster, the target sequence number is the smallest sequence number among the plurality of log sequence numbers, and the plurality of Each standby data node in the standby data node has obtained the physical log corresponding to the target serial number, and each standby data node needs to perform log playback only when the log serial number corresponding to the physical log to be played back is equal to the target serial number. It ensures that each standby data node is played back until the target serial number, so that different standby data nodes are restored to the same position, and the data consistency between different standby data nodes in the distributed database is guaranteed.
- the present application provides a first database cluster, where the first database cluster includes a first data node, and the first data node includes:
- a log acquisition module configured to acquire a first physical log, where the first physical log includes operation information on data in the first data node
- a log transfer module configured to write the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device, so that the first physical log in the second database cluster
- Two data nodes obtain the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second database cluster , the first database cluster and the second database cluster are different database clusters, and the second data node is used as a backup node for the first data node.
- the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
- the first data node further includes:
- a transaction commit module configured to perform transaction commit on the data in the first data node according to the first physical log after transferring the first physical log to the first storage device.
- the first database cluster further includes a third data node; the log transfer module is specifically configured to:
- the first storage device includes a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node , the first storage space is different from the second storage space;
- the log transfer module is specifically used for:
- the third data node When the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- the first storage device includes a first storage space for storing the physical log of the first data node, and the log transfer module is specifically configured to:
- the target physical log in the target storage space is replaced by the first physical log.
- the storage address of the first storage space includes a head address and a tail address
- the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage
- the log transfer module is specifically used for:
- the first storage device is a raw device.
- the log transfer module is also used to:
- the first data node After the first data node commits the data in the first data node according to the first physical log, the first data node writes the second physical log containing commit information into the first physical log A storage device, the first storage device is used to transfer the second physical log to the second storage device, so that the management node in the second database cluster obtains the second physical log from the second storage device , the commit information indicates that the first data node has completed the transaction commit of the first physical log.
- the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node, and the fourth data node includes: log acquisition A module, configured to obtain the first physical log from the first storage device;
- a log playback module configured to perform log playback according to the first physical log.
- the present application provides a second database cluster
- the second database cluster includes a second data node
- the second data node is used as a backup node for the first data node
- the first data node Belonging to the first database cluster, the first database cluster and the second database cluster are different database clusters
- the first storage device is deployed in the first database cluster
- the second storage device is deployed in the second database cluster
- the second data node includes:
- a log acquisition module configured to acquire a first physical log from the second storage device, the first physical log is stored in the second a physical log in the storage device, the first physical log includes operation information on the data in the first data node;
- a log playback module configured to perform log playback according to the first physical log.
- the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
- the second database cluster further includes a fifth data node; the log acquisition module is specifically configured to:
- the second data node obtains the first physical log from the second storage device in parallel.
- the second storage device includes a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node , the third storage space is different from the fourth storage space;
- the log acquisition module is specifically used for:
- the second data node obtains the first physical log from the third storage space in parallel.
- the first database cluster includes multiple primary data nodes including the first data node
- the second database cluster includes multiple standby data nodes including the second data node node
- the second database cluster further includes a management node
- the management node includes:
- a commit information acquisition module configured to acquire commit information from the first database cluster from the second storage device, where the commit information includes the latest transaction commit completed by each master data node among the plurality of master data nodes
- the log sequence number of the physical log, the target sequence number is the smallest sequence number among the multiple log sequence numbers, and each standby data node in the multiple standby data nodes has obtained the target
- the log playback module is specifically used for the second data node to obtain the target serial number from the management node;
- physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Commit transactions to physical logs in ascending order.
- the embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computer device, the computer device is made to execute the above-mentioned first aspect. and any optional method thereof, as well as the above-mentioned second aspect and any optional method thereof.
- the embodiment of the present application provides a computer program product, which is characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computer device, the computer device executes the above-mentioned first aspect and its Any optional method, and the above-mentioned second aspect and any optional method thereof.
- the present application provides a chip system, which includes a processor, configured to support the above-mentioned device to implement the functions involved in the above-mentioned aspect, for example, send or process the data involved in the above-mentioned method; or, information .
- the system-on-a-chip further includes a memory, and the memory is used for storing necessary program instructions and data of the execution device or the training device.
- the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
- Figure 1 is a schematic diagram of the architecture provided by the embodiment of the present application.
- FIG. 2 is a schematic diagram of the architecture provided by the embodiment of the present application.
- FIG. 3 is a schematic diagram of the architecture provided by the embodiment of the present application.
- FIG. 4 is a schematic flow chart of a data backup method provided in an embodiment of the present application.
- FIG. 5 is a schematic diagram of the storage space provided by the embodiment of the present application.
- FIG. 6 is a schematic diagram of the barrier point processing flow provided by the embodiment of the present application.
- FIG. 7 is a schematic diagram of the first database cluster provided by the embodiment of the present application.
- FIG. 8 is a schematic diagram of the second database cluster provided by the embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
- Figure 1 is a schematic diagram of the system logic structure of a data backup system according to an embodiment of the application.
- the system may include a client, a main storage library (such as the first database cluster in the embodiment of the application) and a backup storage library (such as the implementation of the application)
- the second database cluster in the example wherein, the main storage library can contain multiple fragments (such as fragmentation 1 and fragmentation 2 shown in Figure 1), wherein each fragmentation can include a data node (data node , DN), for example, the fragment 1 shown in Figure 1 includes the master node 1 and the backup node, the backup node of the fragment 1 can be used as the backup of the master node 1, and the fragment 2 includes the master node 2 and the backup node, wherein, the fragment The backup node of 2 can serve as the backup of the primary node 2.
- the fragment 1 shown in Figure 1 includes the master node 1 and the backup node
- the backup node of the fragment 1 can be used as the backup of the master node 1
- the fragment 2
- the main storage library can include a coordinator node (coordinator node, CN).
- a hardware device 1 can also be deployed on one side of the main storage library, and the hardware device 1 can be a storage device (such as the first storage device in the embodiment of the present application) .
- the backup database can be deployed with multiple shards, such as shard 1 and 2 shown in Figure 1, where shard 1 in the backup database can be used as shard 1 in the main database. backup, where multiple backup nodes in shard 1 can serve as backups for primary node 1, and multiple backup nodes in shard 2 can serve as backups for primary node 2.
- the side of the main repository can also be deployed with A hardware device 2.
- the hardware device 2 may be a storage device (for example, the second storage device in the embodiment of the present application).
- the primary storage library or the backup storage library can be a storage array or a network storage architecture such as a network attached storage (Network Attached Storage, NAS) or a storage area network (storage area network, SAN) respectively.
- Each storage node (such as the data node and coordination node described above) can be a logical unit number (logical unit number, LUN) or a file system. It should be understood that the embodiment of the present application does not limit the expression forms of the storage repository and the storage node.
- the primary and secondary database systems can also include clients, and the clients can be connected to the primary database system and the standby database through a network, wherein the network can be the Internet, an intranet, or a local area network (Local Area Networks , referred to as LANs), wide area networks (Wireless Local Area Networks, referred to as WLANs), storage area networks (Storage Area Networks, referred to as SANs), etc., or a combination of the above networks.
- LANs Local Area Networks
- WLANs Wireless Local Area Networks
- SANs Storage Area Networks
- the primary node and backup node shown in FIG. 1 can be implemented by the computing device 200 shown in FIG. 2 .
- FIG. 2 is a schematic diagram of a simplified logical structure of a computing device 200. As shown in FIG. Among them, the processor 202 , the memory unit 204 , the input/output interface 206 , the communication interface 208 and the storage device 212 are connected to each other through the bus 210 .
- the processor 202 is the control center of the computing device 200, and is used to execute related programs to realize the technical solutions provided by the embodiments of the present invention.
- the processor 202 includes one or more central processing units (Central Processing Unit, CPU), for example, the central processing unit 1 and the central processing unit 2 shown in FIG. 2 .
- the computing device 200 may further include multiple processors 202, and each processor 202 may be a single-core processor (including one CPU) or a multi-core processor (including multiple CPUs).
- a component for performing a specific function for example, the processor 202 or the memory unit 204, can be implemented by configuring a general component to perform the corresponding function, or by configuring a dedicated It is implemented by a dedicated component that performs a specific function, which is not limited in this application.
- the processor 202 can adopt a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, for executing related programs, so as to realize the technology provided by this application plan.
- ASIC Application Specific Integrated Circuit
- Processor 202 may be connected to one or more storage schemes via bus 210 .
- the storage scheme may include memory unit 204 and storage device 212 .
- the storage device 212 can be a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device or a random access memory (Random Access Memory, RAM).
- the memory unit 204 may be a random access memory.
- the memory unit 204 may be integrated with the processor 202 or inside the processor 202 , or may be one or more storage units independent of the processor 202 .
- Program codes for execution by the processor 202 or a CPU within the processor 202 may be stored in the storage device 212 or the memory unit 204 .
- program codes stored in the storage device 212 for example, operating system, application software, backup module, communication module or storage control module, etc. are copied to the memory unit 204 for execution by the processor 202 .
- the storage device 212 can be a physical hard disk or its partition (including small computing device system interface storage or global network block device volume), network storage protocol (including network or cluster file systems such as network file system NFS), file-based virtual storage device ( virtual disk mirroring), storage devices based on logical volumes. It may include high-speed random access memory (RAM), and may also include non-volatile memories, such as one or more disk memories, flash memories, or other non-volatile memories. In some embodiments, the storage device may further include a remote memory separate from the one or more processors 202, such as a network disk accessed through a communication interface 208 and a communication network.
- the communication network may be the Internet, an intranet , Local Area Networks (LANs), Wide Area Networks (WLANs), Storage Area Networks (SANs), etc., or a combination of the above networks.
- Operating systems include tools for controlling and managing routine system tasks (such as memory management, storage device control, power management, etc.) ) and various software components and/or drivers that facilitate communication between the various hardware and software components.
- the input/output interface 206 is used to receive input data and information, and output data such as operation results.
- Communication interface 208 enables communication between computing device 200 and other devices or communication networks using transceiving means such as, but not limited to, transceivers.
- Bus 210 may comprise a path for carrying information between various components of computing device 200 (eg, processor 202 , memory unit 204 , input/output interface 206 , communication interface 208 , and storage device 212 ).
- the bus 210 may use a wired connection manner or a wireless communication manner, which is not limited in this application.
- computing device 200 shown in FIG. Those skilled in the art should appreciate that computing device 200 also includes other components necessary for proper operation.
- the computing device 200 can be a general-purpose computing device or a special-purpose computing device, including but not limited to any electronic device such as a portable computing device, a personal desktop computing device, a network server, a tablet computer, a mobile phone, a personal digital assistant (PDA), or The combination of the above two or more devices, the present application does not limit the specific implementation form of the computing device 200 in any way.
- a portable computing device such as a portable computing device, a personal desktop computing device, a network server, a tablet computer, a mobile phone, a personal digital assistant (PDA), or The combination of the above two or more devices, the present application does not limit the specific implementation form of the computing device 200 in any way.
- PDA personal digital assistant
- the computing device 200 in FIG. 2 is only an example of the computing device 200 , and the computing device 200 may include more or fewer components than those shown in FIG. 2 , or have different component configurations. According to specific needs, those skilled in the art should understand that the computing device 200 may also include hardware devices for implementing other additional functions. Those skilled in the art should understand that the computing device 200 may also only include the components necessary to implement the embodiment of the present invention, and does not necessarily include all the components shown in FIG. 2 . Meanwhile, various components shown in FIG. 2 may be implemented in hardware, software, or a combination of hardware and software.
- the hardware structure shown in FIG. 2 and the above description are applicable to various computing devices provided in the embodiments of the present application, and are suitable for executing various data backup methods provided in the embodiments of the present application.
- FIG. 3 is a product implementation form of the embodiment of the present application, which mainly includes a dual-cluster disaster recovery architecture of a distributed database with log sharing.
- the dual database clusters are respectively deployed in two physical areas, and the program code of the data backup method provided by the embodiment of the present application runs in the host memory of the server during operation.
- the client on the management and control side can issue commands such as building a cluster, establishing a dual-cluster disaster recovery relationship, cluster switching, and cluster status query.
- the OM module in the cluster will control Modules such as CM and database nodes complete related operations and return execution results.
- a shared volume is a storage device capable of parallel copying data remotely (physical distance), and is used to synchronously transmit redo logs between the active and standby clusters.
- the primary node on each shard of the primary cluster When the database cluster is running, the primary node on each shard of the primary cluster generates logs and writes them to the shared volume, and synchronizes them to the corresponding shared volume of the standby cluster.
- the standby data nodes of the primary cluster and the standby cluster read the logs from the shared volume and perform playback.
- FIG. 4 is a schematic flowchart of a data backup method provided in an embodiment of the present application, wherein the method can be applied to a first database cluster, and the first database cluster includes a first data node.
- the method include:
- the first data node acquires a first physical log, where the first physical log includes operation information on data in the first data node.
- the first database cluster may be a distributed database
- the first database cluster may be a master cluster
- the second database cluster may serve as a backup cluster of the first database cluster.
- the first database cluster may be a database system based on a distributed architecture (shared nothing architecture) of data sharding, and each data node may be configured with a central processing unit (central processing unit, CPU), memory, and hard disk, etc., and each storage node
- the first data node may be a data node of a shard in the first database cluster, for example, the first data node may be a master data node of a shard in the first database cluster.
- the first data node may be a data node DN in the first database cluster.
- the first database cluster can be deployed with at least one data node, where the coordinator node can be deployed on a computing device.
- Data nodes may be deployed on computing devices. Multiple coordinating nodes can be deployed on different computing devices, or can be deployed on the same computing device. Multiple data nodes can be deployed on different computing devices.
- the coordinator node and the data node can be deployed on different computing devices, or can be deployed on the same computing device.
- the data can be distributed on the data nodes, and the data between the data nodes is not shared.
- the coordinating node receives the query request from the client and generates an execution plan and sends it to each data node.
- the data The node initializes the operator (such as a data operation (stream) operator) to be used according to the received plan, and then executes the execution plan delivered by the coordinating node.
- Coordinating nodes and data nodes, as well as data nodes in different physical nodes can be connected through a network channel, and the network channel can be various communication protocols such as scalable transmission control protocol (STCP).
- STCP scalable transmission control protocol
- the first data node as the master node in the first database cluster, can receive the data operation request from the client, and generate the first physical log according to the data operation request, as the backup of the first data node node, the second data node (or the fourth data node described in the subsequent embodiments) can obtain the first physical log, and perform log playback according to the first physical log to ensure that the data on the first data node and the second data node consistency.
- the first physical log includes operation information on the data in the first data node, and the operation information indicates a modification operation, a write operation, and/or
- the first physical log can be a Redo log, also known as an XLog, which can record the physical modification of the data page (or describe it as a change in data), and can be used to restore the physical data page after the transaction is committed.
- log files in the database system may include logical log files and physical log files.
- the logical log in the logical log file is used to record the original logic of the logical operation performed on the database system.
- logical logs are used to record the original logic of logical operations such as data access, data deletion, data modification, data query, database system upgrade, and database system management performed on the database system.
- the logical operation refers to a process of performing logical processing according to a user's data operation command to determine which data operations need to be performed on the data.
- the data operation command is expressed in a structured query language (structured query language, SQL)
- the original logic of the logic operation may be a computer instruction expressed in an SQL statement.
- the physical log in the physical log file is used to record the change of data in the database system (for example, record the change of the data page in the data storage node).
- the content of the physical log records can be understood as data changes caused by logical operations performed on the database system.
- each node in the shard contains a primary node and multiple standby data nodes.
- Each slice is configured with a shared volume (that is, the storage space in the first storage device and the second storage device described in the subsequent embodiments), ensuring that all nodes in the slice have access to the shared volume.
- the master node of the cluster generates logs and stores them on the storage device corresponding to the shard.
- the storage device between the clusters does not establish a synchronous replication relationship, and does not distinguish between the master and slave ends.
- Select one of the clusters as the disaster recovery cluster stop the cluster to prevent the cluster from writing data to the shared disk, configure the relevant parameter information of the active and standby clusters, and establish the remote replication relationship of the storage device, that is, the data is sent by the master end of the active cluster
- the storage device performs synchronous replication to the slave storage device of the standby cluster.
- the standby cluster sends a build (reconstruction) request to the primary cluster through the network, completes the transmission and replication of data and logs, starts the cluster, and completes the establishment of the disaster recovery relationship.
- the first data node writes the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device.
- the first data node may write the first physical log into the first storage device.
- the first storage device and the second storage device may be physical devices such as an all-flash storage system.
- the first storage device is a raw device.
- a raw device which can also be called a raw partition (that is, a raw partition)
- a raw partition that is, a raw partition
- the first data node may write the first physical log into the first storage device based on direct I/O, which improves read and write performance.
- the data in the first data node make a transaction commit.
- the transaction commit can ensure the persistence of the operation information on the data in the first physical log. If the first data node transfers the first physical log to the first storage device after completing the transaction commit of the first physical log, since the first The data node may fail before the transaction submission is completed, and the first physical log cannot be delivered to the first storage device, that is, the standby data node cannot obtain the first physical log for log playback. In the embodiment of this application, before the first data node commits the transaction on the data in the first data node, it transfers the first physical log to the first storage device, and writes the first physical log into the first After the storage device succeeds, it can be considered that the first physical log has been copied to the standby data node. Even if the first data node fails before the transaction commit is completed, the first physical log can be transferred to the standby data node.
- the first data node may write the first physical log into the first storage device, and the first storage device may include a storage area (first storage area) allocated for the first data node to For example, the first data node belongs to the first shard in the first database cluster.
- the first storage device may include a shared volume for the first shard, and each data node in the first shard can share the shared volume. roll.
- the first storage device may include a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node, the first A storage space is different from the second storage space.
- the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel.
- the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- the first storage device includes a first storage space for storing the physical log of the first data node, and the first data node is based on the When the storage space available in the first storage space is less than the storage space required by the first physical log, the target storage space may be determined from the first storage space, the target storage space stores the target physical log, and Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log.
- the first data node when the size of the first storage space is insufficient, the occupied storage space in the first storage space is emptied and reused, and in order to prevent the standby data node from being emptied before receiving the physical log, the first data node
- the cleared physical log must be the physical log that has been played back by the standby data node (this information can be fed back to the first data node after the standby data node completes the log playback).
- the storage address of the first storage space includes a head address and a tail address
- the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage space
- the first data node may determine from the first storage space that the storage space corresponding to the head address is the target storage space based on the storage space corresponding to the tail address in the first storage space space.
- an area (exemplarily, the size is 16MB) can be divided at the head of the storage device for writing control information (control info), and the control information can include check code, log write location, file size and other information.
- the physical log can be written from a position after 16M, and the physical log storage area can be recycled. When the write position (head) is updated to the tail of the log area (tail), it can resume from the offset position of 16M. write.
- the master node for example, the first data node
- the master node generates a physical log, copies the physical log from the local directory to the storage device, and updates the control information while writing. After the physical log is written to the storage device, it is considered that the log is persisted successfully, and then submitted.
- each log has a unique LSN, or in other words, the log and the LSN are one-to-one, so a log can be uniquely determined according to the LSN.
- the first data node may also include commit information
- the second physical log of is written into the first storage device, and the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can read from the second
- the storage device obtains the second physical log, and the commit information indicates that the first data node has completed the transaction commit of the first physical log.
- the commit information can be used as a reference for the global consistency point when the second database cluster performs log playback.
- the commit information may include a transaction commit number, which may be used to identify a committed database transaction (also called transaction, transaction).
- a transaction is a logical unit for a data storage node to perform database operations, and consists of a sequence of database operations.
- a transaction in the submitted state indicates that the transaction has been successfully executed, and the data involved in the transaction has been written to the data storage node.
- the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node, for example, the fourth data node can The nodes are data nodes in the same shard, and the fourth data node is used as a backup node of the first data node, and then the fourth data node can obtain the first physical log from the first storage device, And the fourth data node performs log playback according to the first physical log.
- the fourth data node may first read the control information of the header of the first storage device, and after verification, compare the writing progress of the log on the storage device with the local physical log. If there is an updated physical log on the storage device, Then read the physical log and copy it to the local, and play it back. If there is no data to read, wait in a loop.
- synchronous replication of data may be performed between the first storage device and the second storage device, and the first storage device and the second storage device may be storage devices capable of remote and parallel data transmission.
- the first storage device may transfer the first physical log to the second storage device.
- the first storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node
- the third storage space is different from the fourth storage space, and the first storage device may transfer the first physical log to the third storage space in the second storage device.
- a second data node in the second database cluster acquires the first physical log from the second storage device.
- the second database cluster can be a distributed database
- the first database cluster can be the master cluster
- the second database cluster can be used as the backup cluster of the first database cluster
- the second database cluster can be a database system based on a distributed architecture (shared nothing architecture) based on data sharding
- each data node can be configured with a central processing unit (central processing unit, CPU), memory, and hard disk, etc.
- the resources are not shared between each other, and the second data node may be a data node of a shard in the second database cluster.
- the second data node may be a data node DN in the second database cluster.
- the second database cluster can be deployed with at least one data node, where the coordinator node can be deployed on a computing device.
- Data nodes may be deployed on computing devices.
- Multiple coordinating nodes may be deployed on different computing devices, or may be deployed on the same computing device.
- Multiple data nodes can be deployed on different computing devices.
- the coordinator node and the data node can be deployed on different computing devices, or can be deployed on the same computing device.
- the first data node as the master node in the first database cluster, can receive the data operation request from the client, and generate the first physical log according to the data operation request, as the backup of the first data node Node, the second data node can obtain the first physical log from the second storage device, and perform log playback according to the first physical log, so as to ensure the consistency of data on the first data node and the second data node.
- the second data node may obtain the first physical log from a third storage space, where the third storage space is a storage space allocated for the second data node in the second storage device.
- the second storage device may include a storage area (third storage area) for the second data node, taking the second data node belonging to the first fragment in the second database cluster as an example,
- the second storage device may include a shared volume allocated for the first slice, and each data node in the first slice may share the shared volume.
- the second storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node, the first The third storage space is different from the fourth storage space.
- the second data node can read the physical log from the second storage device in parallel while other data nodes read the physical log from the first data node. Acquire the first physical log from the second storage device.
- the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
- the first database cluster includes multiple data nodes including the first data node
- the second database cluster further includes a management node
- the management node can also obtain Obtain the commit information from the first database cluster in the second storage device, the commit information includes the log sequence number of the physical log of the latest transaction commit completed by each data node among the plurality of data nodes, and the target sequence number is the smallest sequence number among the plurality of log sequence numbers;
- the existing technology adopts a distributed consistency mechanism based on storage devices and generates a global barrier log to ensure that the farthest recovery point common to different shards can be found, but it cannot solve the problem of data synchronization failure caused by network problems in storage devices.
- the management node in the second database cluster can maintain a global information (this global information can be called a barrier point), the global information
- This global information can be called a barrier point
- the log sequence number of the physical log obtained by each standby data node, and the log sequence number is the smallest log sequence number among the largest physical log sequence numbers currently committed by each master node.
- the management node obtains: The current transaction submission progress of node 1 is 1, 2, the current transaction submission progress of master node 2 is 1, 2, the current transaction submission progress of master node 3 is 1, 2, 3, and the current transaction submission progress of master node 4 is 1 , 2, 3, 4, then 3 is the smallest physical log sequence number among the maximum physical log sequence numbers that each master node has currently committed transactions obtained by the management node.
- physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Transactions are committed to the physical log in ascending order, that is, the sequence number can indicate the progress of the transaction submission of the master node.
- management node can be an operation and maintenance management module (operation manager, OM), a cluster management module (cluster manager, CM), a cluster management agent (CM agent, CMA), a cluster management service (CM Server, CMS), a global Transaction manager (global transaction manager, GTM), etc.
- operation manager OM
- cluster manager CM
- CM agent cluster management agent
- CMA cluster management agent
- CM Server cluster management service
- GTM global Transaction manager
- the coordinating node in the first database cluster can obtain the current transaction submission progress of each primary data node (that is, the log sequence number of the physical log that completes the transaction submission, for example, it can be called a barrier point), and will include the transaction submission
- the progress submission information is transmitted to the second database cluster in the form of physical logs (this step can also be completed by the primary data node).
- the standby data node obtains the physical log carrying the submission information from the second storage device,
- the log is written to the local disk, and the newly placed physical log is parsed, and the parsed barrier point is stored in the hash table, and the largest barrier point currently received is recorded, and the largest barrier point is the primary data node
- the function of the management node can be realized by the cooperation of CMA, CMS, and ETCD.
- the CMA queries the maximum value of the barrier of CN and DN to the CMS, and the CMS can send each backup data node
- the minimum value of the largest barrier point above is used as the "candidate serial number" (or called the value to be detected), and stored in ETCD;
- CMA obtains the "value to be detected” from ETCD, queries the DN, and confirms whether the DN exists
- the CMS performs the following judgment: if the "value to be detected "The corresponding physical log exists in each standby data node, it can be stored in ETCD as the "target serial number” (or simply called the target value) point, otherwise discarded, CMA reads the "target value” in ETCD, Update the local "target value”.
- CMA In a report, CMA needs to query and execute the report of the maximum value of the barrier, locally query whether the "value to be detected” exists, and update the "target value”; CMS needs to perform the update of the "value to be detected” and the “target value” renew.
- Barrier deletion is the end point of consistency. Barrier deletion occurs during physical log playback. During log playback, when the playback reaches the barrier point, the playback position will be updated, and the barrier point will be deleted in the hash table, thus completing the generation of the barrier to the entire process of deletion.
- the standby cluster needs to obtain the minimum barrier point among the current maximum barrier points of each fragment (that is, the log sequence number of the physical log of the latest transaction submission of each primary data node among multiple primary data nodes), and the database backup set can be restored to the minimum barrier point.
- the minimum barrier point can be divided into four stages: barrier generation, barrier parsing and storage, barrier advancement, and barrier deletion.
- barrier generation is the premise of consistency. Barrier points can be initiated by any CN node, but the first CN is responsible for generating them.
- CN that initiates barrier generation is not the first CN, notify the first CN to generate.
- CN and/or DN nodes add it to the physical log after generation.
- Barrier parsing and storage is the basis of consistency. After the corresponding standby data node on the standby cluster receives the log through the storage device, it writes the log to the local disk. First, parse the newly placed log, store the parsed barrier points in the hash table, and record the currently received maximum barrier point.
- the hash table is created before the log parsing thread is created, and released when the cluster is uninstalled. The parsed barrier points are stored in the hash table, and these barriers will be deleted when playing back the physical log.
- Barrier advancement is the key to consistency, and this part can be carried out through the cooperation of CN, DN, CMA, CMS, and ETCD, as shown in Figure 6.
- the advancement of the barrier consistency point can include five cycles: in the first cycle, CMA queries the maximum value of the barrier of CN and DN and reports it to CMS; CMS collects and compares the minimum value among them as the "value to be detected” and stores it in ETCD Middle; CMA obtains the "value to be detected” from ETCD, queries CN and DN, confirms whether the point exists in DN, and reports the result to CMS, and judges after collecting all the values.
- a target sequence number is maintained as global information based on the management node in the second database cluster, the target sequence number is the smallest sequence number among the plurality of log sequence numbers, and the plurality of Each standby data node in the standby data node has obtained the physical log corresponding to the target serial number, and each standby data node needs to perform log playback only when the log serial number corresponding to the physical log to be played back is equal to the target serial number. It ensures that each standby data node is played back until the target serial number, so that different standby data nodes are restored to the same position, and the data consistency between different standby data nodes in the distributed database is guaranteed.
- the second data node performs log replay according to the first physical log.
- the second data node obtains the target sequence number from the management node, and the second data node determines that the log sequence number of the first physical log is equal to the target sequence number After the number, perform log playback according to the first physical log.
- the second database cluster needs to become the primary cluster, it can be realized through the failover process or the switchover process.
- the failover process is to perform a failover switchover when an abnormality occurs in the main cluster, that is, the standby cluster is upgraded to the main cluster and continues to provide production services.
- the client on the control side issues the cluster failover command to check the status of the storage device.
- switch to RPO 0; interrupt the synchronization relationship of the storage device and remove the write protection of the storage device of the standby cluster, so that the storage device Readable and writable; stop the standby cluster, and overwrite the redo log in the storage device to the local log for the CN node of the standby cluster; update the relevant parameter information stored in etcd of the standby cluster; OM modifies the mode parameters of CM, CN, and DN according to The main cluster mode starts the cluster.
- the switchover process is a planned cluster role switch initiated by the user when the active and standby clusters are running normally, that is, the active cluster is downgraded to the standby cluster, and the standby cluster is promoted to the active cluster to replace the original active cluster to provide production services.
- the client on the management and control side first sends a cluster switchover command to the main cluster to check the status of the storage device.
- OM modifies the mode parameters of CM, CN, and DN according to Start the cluster in the standby cluster mode; check the status of the storage device, and the storage device performs master-slave switching, that is, the direction of data replication is synchronous transmission from the original backup cluster to the original master cluster; stop the backup cluster, and for the CN node of the backup cluster, transfer the The redo log is overwritten to the local log; OM modifies the mode parameters of CM, CN, and DN, and starts the original cluster according to the mode of the main cluster.
- An embodiment of the present application provides a data backup method, the method is applied to a first database cluster, the first database cluster includes a first data node, and the method includes: the first data node obtains a first physical log , the first physical log includes operation information on data in the first data node; the first data node writes the first physical log into a first storage device, and the first storage device is used to store The first physical log is transferred to the second storage device, so that the second data node in the second database cluster obtains the first physical log from the second storage device, wherein the first storage device is deployed in The first database cluster, the second storage device is deployed in the second database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node is used as A backup node of the first data node.
- the data nodes in the first database cluster can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
- RPO recovery point objective
- FIG. 7 is a schematic structural diagram of a first database cluster 700 according to an embodiment of the present application.
- the first database cluster 700 may include a first data node 70, and the first data node 70 may include:
- a log obtaining module 701 configured to obtain a first physical log, where the first physical log includes operation information on the data in the first data node 70;
- log acquisition module 701 For a specific description of the log acquisition module 701, reference may be made to the description of step 401 in the above embodiment, and details are not repeated here.
- the log acquisition module 701 may be implemented by the processor 202 and the memory unit 204 shown in FIG. 2 . More specifically, the processor 202 may execute related codes in the memory unit 204 to obtain the first physical log.
- a log transfer module 702 configured to write the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device, so that the The second data node 80 obtains the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second A database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node 80 is used as a backup node for the first data node 70 .
- log delivery module 702 For a specific description of the log delivery module 702, reference may be made to the description of step 402 in the above embodiment, and details are not repeated here.
- the log delivery module 702 may be implemented by the processor 202 , the memory unit 204 and the communication interface 208 shown in FIG. 2 . More specifically, the processor 202 may execute the communication module and the backup module in the memory unit 204, so that the communication interface 208 writes the first physical log into the first storage device.
- the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
- the first data node 70 further includes:
- the transaction commit module 703 is configured to perform transaction commit on the data in the first data node 70 according to the first physical log after transferring the first physical log to the first storage device.
- the first database cluster further includes a third data node; the log transfer module 702 is specifically configured to:
- the first storage device includes storage space for storing physical logs
- the log delivery module 702 is specifically configured to:
- the target physical log in the target storage space is replaced by the first physical log.
- the storage address of the storage space includes a head address and a tail address, and the storage order of the storage space is configured from the storage space corresponding to the head address to the storage space corresponding to the tail address storage;
- the log transfer module 702 is specifically used for:
- the first storage device is a raw device.
- the log transfer module 702 is further configured to:
- the first data node 70 After the first data node 70 commits the data in the first data node 70 according to the first physical log, the first data node 70 writes the second physical log containing the commit information into The first storage device, the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can obtain the second physical log from the second storage device Two physical logs, the commit information indicates that the first data node 70 has completed the transaction commit of the first physical log.
- the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node 70, and the fourth data node includes: a log an obtaining module, configured to obtain the first physical log from the first storage device;
- a log playback module configured to perform log playback according to the first physical log.
- FIG. 8 is a schematic structural diagram of a second database cluster 800 according to an embodiment of the present application.
- the second database cluster 800 may include a second data node 80, and the second data node 80 is used as a first
- a log obtaining module 801 configured to obtain a first physical log from the second storage device, the first physical log is from the first data node 70 and is stored in the A physical log in the second storage device, the first physical log includes operation information on the data in the first data node 70;
- log acquisition module 801 For a specific description of the log acquisition module 801, reference may be made to the description of step 403 in the above embodiment, and details are not repeated here.
- the log acquisition module 801 may be implemented by the processor 202 , the memory unit 204 and the communication interface 208 shown in FIG. 2 . More specifically, the processor 202 may execute the communication module in the memory unit 204, so that the communication interface 208 obtains the first physical log from the second storage device.
- the log playback module 802 is configured to perform log playback according to the first physical log.
- log playback module 802 For a specific description of the log playback module 802, reference may be made to the description of step 404 in the above embodiment, and details are not repeated here.
- the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
- the second database cluster further includes a fifth data node; the log acquisition module is specifically configured to:
- the second data node 80 obtains the first physical log from the second storage device in parallel.
- the first database cluster includes a plurality of data nodes including the first data node 70, and the second database cluster further includes a management node, and the management node includes:
- a commit information acquiring module configured to acquire, from the second storage device, commit information from the first database cluster, where the commit information includes the physical transaction commit latest completed by each data node among the plurality of data nodes.
- the log sequence number of the log, and the target sequence number is the smallest sequence number among the plurality of log sequence numbers;
- the log playback module is specifically used for the second data node 80 to obtain the target serial number from the management node;
- physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Commit transactions to physical logs in ascending order.
- the embodiment of the present application also provides a computing device, which may be a node in the first database cluster or a node in the second database cluster described in the above embodiments.
- the computing device may be a server or a terminal.
- the foregoing database management node and/or data storage node may be deployed in the computing device.
- the computing device 90 includes: a processor 901 , a communication interface 902 and a memory 903 .
- the processor 901 , the communication interface 902 and the memory 903 are connected to each other through a bus 904 .
- the memory 903 is used to store computer instructions.
- the processor 901 executes the computer instructions in the memory 903, it can realize the functions of the computer instructions.
- the data recovery method provided in the embodiment of the present application can be implemented.
- the database management node is deployed in a computer device
- the processor 901 executes computer instructions in the memory 903, the functions of the first data node and the fourth data node in the data backup method provided by the embodiment of the present application can be realized.
- the data storage node is deployed in the computer device, when the processor 901 executes the computer instructions in the memory 903, the function of the second data node in the data backup method provided by the embodiment of the present application can be realized.
- the bus 904 can be divided into an address bus, a data bus, a control bus, and the like.
- the bus 904 can be divided into an address bus, a data bus, a control bus, and the like.
- a thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.
- the processor 901 may be a hardware chip, and the hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD) or a combination thereof.
- the aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
- it may also be a general-purpose processor, for example, a central processing unit (central processing unit, CPU), a network processor (network processor, NP), or a combination of a CPU and NP.
- the memory 903 may include a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM). It may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD). Combinations of the above types of memory may also be included.
- volatile memory such as a random-access memory (random-access memory, RAM).
- non-volatile memory such as a flash memory (flash memory), a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD). Combinations of the above types of memory may also be included.
- the embodiment of the present application also provides a storage medium, which is a non-volatile computer-readable storage medium, and the instructions in the storage medium are used to implement the data backup method provided in the embodiment of the present application.
- the embodiment of the present application also provides a computer program product including instructions, and the instructions included in the computer program product are used to realize the data backup method provided in the embodiment of the present application.
- the computer program product can be stored on the storage medium.
- the embodiment of the present application also provides a chip, the chip includes a programmable logic circuit and/or program instructions, which are used to implement the data backup method provided in the embodiment of the present application when the chip is running.
- the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separated.
- a unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
- the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the method of each embodiment of the present application .
- a computer device which can be a personal computer, training device, or network device, etc.
- all or part of them may be implemented by software, hardware, firmware or any combination thereof.
- software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be passed from a website site, computer, training device, or data center Wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) transmission to another website site, computer, training device, or data center.
- Wired eg, coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media.
- the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present application disclose a data backup method. The method comprises: a main data node writes a first physical log into a first storage device, and the first storage device is configured to transmit the first physical log to a second storage device, so that a second data node in a second database cluster obtains the first physical log from the second storage device, wherein the first storage device is deployed in different database clusters, and the second data node is used as a backup node of the first data node. According to the present application, the data nodes in the first database cluster (a main cluster) can rapidly synchronize a physical log to the second database cluster (a standby cluster) by means of the storage device, thereby improving the data transmission efficiency during data backup.
Description
本申请要求于2021年9月23日提交中国专利局、申请号为202111117550.8、发明名称为“一种数据备份方法和数据库集群”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111117550.8 and the invention title "A Data Backup Method and Database Cluster" filed with the China Patent Office on September 23, 2021, the entire contents of which are incorporated by reference in this application middle.
本申请涉及数据库领域,更具体地,涉及一种数据备份方法和数据库集群。The present application relates to the field of databases, and more specifically, relates to a data backup method and a database cluster.
随着包括云计算、大数据等信息技术的快速发展,越来越多的企业把应用、数据、系统进行集中处理,数据大集中的同时也面临风险,灾难性的突发事件发生时如何保障企业核心业务的在线性,即核心业务的不间断运行,成为企业关注的首要问题。金融、银行业对数据安全有着较高的要求,需要保证数据的安全性以及服务的可用性。因此需要采用双集群的容灾方案,主集群在出现故障的情况下,备集群还具备能继续提供服务的能力。当发生自然或人为灾难时,保护数据并快速进行恢复。With the rapid development of information technologies including cloud computing and big data, more and more enterprises centralize applications, data, and systems. While data is concentrated, they also face risks. How to protect against catastrophic emergencies? The onlineness of the enterprise's core business, that is, the uninterrupted operation of the core business, has become the primary concern of the enterprise. The financial and banking industries have high requirements for data security, and need to ensure data security and service availability. Therefore, it is necessary to adopt a dual-cluster disaster recovery solution. When the primary cluster fails, the standby cluster still has the ability to continue to provide services. Protect data and recover quickly when natural or man-made disasters strike.
数据容灾,是指建立一个异地的数据系统,该系统是本地关键应用数据的一个可用复制。在本地数据及整个应用系统出现灾难时,系统至少在异地保存有一份可用的关键业务的数据。其采用的主要技术是数据备份和数据复制技术,数据容灾的处理,实际上是异地数据复制的处理。该异地数据可以是与本地生产数据的完全实时复制(同步复制),也可以比本地数据略微落后(异步复制)。Data disaster recovery refers to the establishment of an off-site data system, which is an available copy of local key application data. When a disaster occurs in the local data and the entire application system, the system has at least one copy of available business-critical data in off-site. The main technology it adopts is data backup and data replication technology, and the processing of data disaster recovery is actually the processing of off-site data replication. The remote data can be a full real-time replication of the local production data (synchronous replication), or it can be slightly behind the local data (asynchronous replication).
以Oracle数据库的数据容灾为例,Oracle数据库为保证高可用性,高可靠性,在同城(或者异地)的两个机房内分别部署主存储库和备存储库,其中,备存储库是主存储库的备份,并使用数据保护技术进行主存储库和备存储库间的数据同步。Take the data disaster recovery of Oracle database as an example. In order to ensure high availability and high reliability, Oracle database deploys a primary storage library and a backup storage library in two computer rooms in the same city (or different places). Among them, the backup storage library is the main storage Repository backup, and use data protection technology to synchronize data between primary and backup repositories.
数据同步的基本流程是当主存储库产生物理日志时,通过事先配置的传送方式,以同步复制或者异步复制的方式传送到备存储库,以实现主存储库备存储库间的数据复制。The basic process of data synchronization is that when the primary repository generates a physical log, it is transmitted to the standby repository in the form of synchronous or asynchronous replication through the pre-configured transmission method, so as to realize data replication between the primary repository and the standby repository.
现有技术中,采用网络专线的方式进行物理日志的传输,然而网络专线导致该架构下网络设施成本较高,且采用网络专线的传输方式的传输效率较低。In the prior art, the transmission of physical logs is carried out by means of a dedicated network line. However, the dedicated network line leads to high cost of network facilities under this architecture, and the transmission efficiency of the transmission mode using a dedicated network line is low.
发明内容Contents of the invention
本申请实施例提供了一种数据备份方法,可以提高数据备份时的数据传输效率。The embodiment of the present application provides a data backup method, which can improve data transmission efficiency during data backup.
第一方面,本申请提供了一种数据备份方法,所述方法应用于第一数据库集群,所述第一数据库集群包括第一数据节点,所述方法包括:所述第一数据节点获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;所述第一数据节点将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群, 所述第二数据节点用于作为所述第一数据节点的备份节点。In a first aspect, the present application provides a data backup method, the method is applied to a first database cluster, the first database cluster includes a first data node, and the method includes: the first data node obtains the first A physical log, the first physical log includes operation information on data in the first data node; the first data node writes the first physical log into a first storage device, and the first storage device uses to transfer the first physical log to the second storage device, so that the second data node in the second database cluster obtains the first physical log from the second storage device, wherein the first storage device Deployed in the first database cluster, the second storage device is deployed in the second database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node uses to be the backup node of the first data node.
在一种可能的实现中,第一存储设备和第二存储设备之间可以进行数据的同步复制,第一存储设备和第二存储设备可以为具有远程以及并行数据传输能力的存储设备。In a possible implementation, synchronous replication of data may be performed between the first storage device and the second storage device, and the first storage device and the second storage device may be storage devices capable of remote and parallel data transmission.
通过上述方式,第一数据库集群(主集群)中的数据节点可以通过存储设备将物理日志快速地同步到第二数据库集群(备集群),从而更接近于实现数据恢复点目标(recovery point objective,RPO)为0的目标,同时保证业务的高性能,提高了数据备份时的数据传输效率。Through the above method, the data nodes in the first database cluster (primary cluster) can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
在一种可能的实现中,所述第一物理日志包括对所述第一数据节点中数据的操作信息,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作,第一物理日志可以为Redo日志,也可以被称为XLog,其可以记录数据页的物理修改(或者描述为数据的变化情况),可以被用来恢复事务提交后的物理数据页。In a possible implementation, the first physical log includes operation information on the data in the first data node, and the operation information indicates a modification operation, a write operation, and/or For the delete operation, the first physical log can be a Redo log, also known as an XLog, which can record the physical modification of the data page (or describe it as a change in data), and can be used to restore the physical data page after the transaction is committed.
在一种可能的实现中,所述第一数据节点在将所述第一物理日志写入所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交。In a possible implementation, after the first data node writes the first physical log into the first storage device, according to the first physical log, the data in the first data node Make a transaction commit.
其中,事务提交可以保证第一物理日志中对数据的操作信息持久化,若第一数据节点在完成第一物理日志的事务提交之后才将第一物理日志传递至第一存储设备,由于第一数据节点可能会出现事务提交还未完成就出现故障的情况,则会出现第一物理日志无法被传递至第一存储设备,也就是备数据节点无法获取到用于进行日志回放的第一物理日志,本申请实施例中,第一数据节点在对所述第一数据节点中的数据进行事务提交之前,就将第一物理日志传递至所述第一存储设备,将第一物理日志写入第一存储设备成功后,即可认为第一物理日志已复制到备数据节点,即使第一数据节点出现事务提交还未完成就出现故障的情况,也可以将第一物理日志传递至备数据节点。Among them, the transaction commit can ensure the persistence of the operation information on the data in the first physical log. If the first data node transfers the first physical log to the first storage device after completing the transaction commit of the first physical log, since the first The data node may fail before the transaction submission is completed, and the first physical log cannot be delivered to the first storage device, that is, the standby data node cannot obtain the first physical log for log playback. In the embodiment of this application, before the first data node commits the transaction on the data in the first data node, it transfers the first physical log to the first storage device, and writes the first physical log into the first After the storage device succeeds, it can be considered that the first physical log has been copied to the standby data node. Even if the first data node fails before the transaction commit is completed, the first physical log can be transferred to the standby data node.
在一种可能的实现中,第一数据节点可以将第一物理日志写入第一存储设备,第一存储设备上可以包括为第一数据节点划分出的存储区域(第一存储区域),以第一数据节点属于第一数据库集群中的第一分片为例,第一存储设备上可以包括为第一分片划分出的一个共享卷,第一分片中的各个数据节点可以共享该共享卷。具体的,所述第一存储设备可以包括用于存储所述第一数据节点的物理日志的第一存储空间以及用于存储所述第三数据节点的物理日志的第二存储空间,所述第一存储空间和所述第二存储空间不同。In a possible implementation, the first data node may write the first physical log into the first storage device, and the first storage device may include a storage area (first storage area) allocated for the first data node to For example, the first data node belongs to the first shard in the first database cluster. The first storage device may include a shared volume for the first shard, and each data node in the first shard can share the shared volume. roll. Specifically, the first storage device may include a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node, the first A storage space is different from the second storage space.
在一种可能的实现中,所述第一数据库集群还包括第三数据节点,在所述第三数据节点将第二物理日志写入所述第一存储设备时,所述第一数据节点可以并行将所述第一物理日志写入第一存储设备。其中,第三数据节点可以为第一数据库集群中的主节点。具体的,可以在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。In a possible implementation, the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel. Wherein, the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
其中,所谓并行,可以理解为在时间上,第一数据节点将所述第一物理日志写入所述第一存储空间的动作和第三数据节点将物理日志写入所述第二存储空间的动作是同时发生的。Wherein, the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
其中,由于为第一数据库集群中不同的主节点划分了不同的存储区域,进而可以支持不同的主节点之间并行执行物理日志的写入,提高了数据写入的并发性,进而提高了数据 备份时日志的传输效率。Among them, since different storage areas are divided for different master nodes in the first database cluster, parallel execution of physical log writing between different master nodes can be supported, the concurrency of data writing is improved, and data Log transfer efficiency during backup.
在一种可能的实现中,所述第一数据库集群还包括第三数据节点,在所述第三数据节点将第二物理日志写入所述第一存储设备时,所述第一数据节点可以并行将所述第一物理日志写入第一存储设备。其中,第三数据节点可以为第一数据库集群中的主节点。具体的,可以在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。In a possible implementation, the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel. Wherein, the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
其中,由于为第一数据库集群中不同的主节点划分了不同的存储区域,进而可以支持不同的主节点之间并行执行物理日志的写入,提高了数据写入的并发性,进而提高了数据备份时日志的传输效率。Among them, since different storage areas are divided for different master nodes in the first database cluster, parallel execution of physical log writing between different master nodes can be supported, the concurrency of data writing is improved, and data Log transfer efficiency during backup.
由于存储设备中的存储空间的大小有限,在一种可能的实现中,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间,第一数据节点基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间时,可以从所述第一存储空间中确定目标存储空间,所述目标存储空间存储有目标物理日志,并基于所述目标物理日志已被所述第二数据节点执行日志回放,将所述目标存储空间中的目标物理日志替换为所述第一物理日志。也就是在第一存储空间的大小不足时,将第一存储空间中已经被占用的存储空间清空并重复使用,且为了防止备数据节点还未拿到物理日志时就被清空,第一数据节点清空的物理日志一定是备数据节点已经进行了日志回放的物理日志(该信息可以是备数据节点完成日志回放后反馈至第一数据节点的),通过上述方式,基于循环读写机制,在保证备数据节点可以拿到全部的物理日志的前提下,节省了存储设备的存储空间。Since the size of the storage space in the storage device is limited, in a possible implementation, the first storage device includes a first storage space for storing the physical log of the first data node, and the first data node is based on the When the storage space available in the first storage space is less than the storage space required by the first physical log, the target storage space may be determined from the first storage space, the target storage space stores the target physical log, and Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log. That is, when the size of the first storage space is insufficient, the occupied storage space in the first storage space is emptied and reused, and in order to prevent the standby data node from being emptied before receiving the physical log, the first data node The cleared physical log must be the physical log that has been played back by the standby data node (this information can be fed back to the first data node after the standby data node completes the log playback). Through the above method, based on the cyclic read and write mechanism, the guarantee On the premise that the standby data node can obtain all physical logs, the storage space of the storage device is saved.
在一种可能的实现中,所述第一存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间;In a possible implementation, the storage address of the first storage space includes a head address and a tail address, and the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage;
所述基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述第一存储空间中确定目标存储空间,包括:基于所述第一存储空间中所述尾部地址对应的存储空间被占用,从所述第一存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。The determining the target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log includes: The storage space corresponding to the tail address is occupied, and the storage space corresponding to the head address is determined from the first storage space as the target storage space.
在一种可能的实现中,所述第一存储设备为裸设备。其中,裸设备(raw device),也可以称之为裸分区(即原始分区),是一种没有经过格式化,不通过文件系统来读取的设备文件。由应用程序负责对它进行读写操作。不经过文件系统的缓冲。它是不被操作系统直接管理的设备。可以直接从数据节点的内存写到存储设备张红,少了从操作系统缓存写到存储设备操作系统缓存的步骤,I/O效率更高。第一数据节点可以基于直接I/O的方式将所述第一物理日志写入第一存储设备,提升了读写性能。In a possible implementation, the first storage device is a raw device. Among them, a raw device (raw device), which can also be called a raw partition (that is, a raw partition), is a device file that has not been formatted and is not read through a file system. It is the responsibility of the application to read and write to it. No buffering by the file system. It is a device that is not directly managed by the operating system. It can directly write from the memory of the data node to the storage device Zhanghong, without the steps of writing from the operating system cache to the storage device operating system cache, and the I/O efficiency is higher. The first data node may write the first physical log into the first storage device based on direct I/O, which improves read and write performance.
在一种可能的实现中,在所述第一数据节点根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交之后,所述第一数据节点还可以将包含提交信息的第二物理日志写入所述第一存储设备,所述第一存储设备用于将所述第二物理日志传递至第二存储设备,以便第二数据库集群中的管理节点从所述第二存储设备中获取所述第二物理日志,所述提交信息指示所述第一数据节点已完成所述第一物理日志的事务提交。其中提交信息可以被第二数据库集群在进行日志回放时作为全局一致性点的参考。In a possible implementation, after the first data node commits the transaction on the data in the first data node according to the first physical log, the first data node may also include commit information The second physical log of is written into the first storage device, and the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can read from the second The storage device obtains the second physical log, and the commit information indicates that the first data node has completed the transaction commit of the first physical log. The commit information can be used as a reference for the global consistency point when the second database cluster performs log playback.
其中,提交信息可以包括事务提交号,事务提交号可以用于标识已提交的数据库事务(也称事务,transaction)。事务是数据存储节点执行数据库操作的逻辑单位,由一个数据库操作序列构成。事务处于已提交状态表明该事务已成功执行,且已将该事务涉及的数据写入到数据存储节点中。Wherein, the commit information may include a transaction commit number, which may be used to identify a committed database transaction (also called transaction, transaction). A transaction is a logical unit for a data storage node to perform database operations, and consists of a sequence of database operations. A transaction in the submitted state indicates that the transaction has been successfully executed, and the data involved in the transaction has been written to the data storage node.
在一种可能的实现中,所述第一数据库集群还包括第四数据节点,所述第四数据节点用于作为所述第一数据节点的备份节点,所述方法还包括:所述第四数据节点从所述第一存储设备中获取所述第一物理日志;所述第四数据节点根据所述第一物理日志进行日志回放。In a possible implementation, the first database cluster further includes a fourth data node, and the fourth data node is used as a backup node for the first data node, and the method further includes: the fourth The data node acquires the first physical log from the first storage device; the fourth data node performs log replay according to the first physical log.
应理解,第四数据节点可以首先读取第一存储设备首部的控制信息,进行校验后,比较存储设备上的日志和本地物理日志的写入进度,如果存储设备上有更新的物理日志,则读取物理日志拷贝到本地,并进行回放,如果没有数据需要读取则循环等待。It should be understood that the fourth data node may first read the control information of the header of the first storage device, and after verification, compare the writing progress of the log on the storage device with the local physical log. If there is an updated physical log on the storage device, Then read the physical log and copy it to the local, and play it back. If there is no data to read, wait in a loop.
第二方面,本申请提供了一种数据备份方法,所述方法应用于第二数据库集群,所述第二数据库集群包括第二数据节点,所述第二数据节点用于作为第一数据节点的备份节点,所述第一数据节点属于第一数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,第一存储设备部署于所述第一数据库集群,第二存储设备部署于所述第二数据库集群,所述方法包括:所述第二数据节点从所述第二存储设备中获取第一物理日志,所述第一物理日志为来自所述第一数据节点并经由所述第一存储设备传递而存储在所述第二存储设备中的物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;所述第二数据节点根据所述第一物理日志进行日志回放。In a second aspect, the present application provides a data backup method, the method is applied to a second database cluster, the second database cluster includes a second data node, and the second data node is used as the first data node Backup node, the first data node belongs to the first database cluster, the first database cluster and the second database cluster are different database clusters, the first storage device is deployed in the first database cluster, and the second storage The device is deployed in the second database cluster, and the method includes: the second data node acquires a first physical log from the second storage device, and the first physical log is from the first data node and The physical log stored in the second storage device via the first storage device, the first physical log includes operation information on the data in the first data node; the second data node according to the The above-mentioned first physical log performs log playback.
通过上述方式,第一数据库集群(主集群)中的数据节点可以通过存储设备将物理日志快速地同步到第二数据库集群(备集群),从而更接近于实现数据恢复点目标(recovery point objective,RPO)为0的目标,同时保证业务的高性能,提高了数据备份时的数据传输效率。Through the above method, the data nodes in the first database cluster (primary cluster) can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
在一种可能的实现中,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。In a possible implementation, the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
和第一存储设备类似,第二存储设备上可以包括为第二数据节点划分出的存储区域(第三存储区域),以第二数据节点属于第二数据库集群中的第一分片为例,第二存储设备上可以包括为第一分片划分出的一个共享卷,第一分片中的各个数据节点可以共享该共享卷。具体的,所述第二存储设备可以包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同。进而,和第一数据节点将第一物理日志写入第一存储设备时类似,所述第二数据节点可以在其他数据节点从第二存储设备中读取物理日志的同时,并行从所述第二存储设备中获取第一物理日志。Similar to the first storage device, the second storage device may include a storage area (third storage area) for the second data node, taking the second data node belonging to the first fragment in the second database cluster as an example, The second storage device may include a shared volume allocated for the first slice, and each data node in the first slice may share the shared volume. Specifically, the second storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node, the first The third storage space is different from the fourth storage space. Furthermore, similar to when the first data node writes the first physical log into the first storage device, the second data node can read the physical log from the second storage device in parallel while other data nodes read the physical log from the first data node. Acquire the first physical log from the second storage device.
其中,所谓并行,可以理解为在时间上,第一数据节点将所述第一物理日志写入所述第一存储空间的动作和第三数据节点将物理日志写入所述第二存储空间的动作是同时发生的。Wherein, the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
其中,由于为第二数据库集群中不同的主节点划分了不同的存储区域,进而可以支持不同的分片的备数据节点之间并行执行物理日志的读取,提高了数据读取的并发性,进而提高了数据备份时日志的传输效率。Among them, since different storage areas are divided for different primary nodes in the second database cluster, it is possible to support parallel execution of physical log reading between different fragmented standby data nodes, which improves the concurrency of data reading. In turn, the log transmission efficiency during data backup is improved.
在一种可能的实现中,为了保证各个备数据节点之间的分布式一致性,需要保证各个备数据节点之间的日志回放进度相同,也就是回放的日志序列号LSN一致(日志序列号可以指示回放进度,序列号越大,则进度越靠前),本申请实施例中,第二数据库集群中的管理节点可以维护一个全局信息(该全局信息可以称之为barrier点),该全局信息可以各个备数据节点已获取到的物理日志的日志序列号,且该日志序列号为各个主节点当前已进行事务提交的最大物理日志序列号中最小的日志序列号,例如管理节点获取到:主节点1当前的事务提交进度为1、2,主节点2当前的事务提交进度为1、2,主节点3当前的事务提交进度为1、2、3,主节点4当前的事务提交进度为1、2、3、4,则3为管理节点所获取到的各个主节点当前已进行事务提交的最大物理日志序列号中最小的物理日志序列号。In a possible implementation, in order to ensure the distributed consistency between each standby data node, it is necessary to ensure that the log playback progress between each standby data node is the same, that is, the log sequence number LSN of the playback is consistent (the log sequence number can be Indicates the playback progress, the larger the serial number, the more advanced the progress), in the embodiment of the present application, the management node in the second database cluster can maintain a global information (this global information can be called a barrier point), the global information The log sequence number of the physical log obtained by each standby data node, and the log sequence number is the smallest log sequence number among the largest physical log sequence numbers currently committed by each master node. For example, the management node obtains: The current transaction submission progress of node 1 is 1, 2, the current transaction submission progress of master node 2 is 1, 2, the current transaction submission progress of master node 3 is 1, 2, 3, and the current transaction submission progress of master node 4 is 1 , 2, 3, 4, then 3 is the smallest physical log sequence number among the maximum physical log sequence numbers that each master node has currently committed transactions obtained by the management node.
在一种可能的实现中,所述多个主数据节点中的不同主数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交,也就是说,序列号可以指示主节点事务提交的进度。In a possible implementation, physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Transactions are committed to the physical log in ascending order, that is, the sequence number can indicate the progress of the transaction submission of the master node.
应理解,上述管理节点的功能可以通过第二数据库集群中的CMA、CMS、ETCD等模块之间的相互配合来实现。It should be understood that the above-mentioned functions of the management node may be realized through cooperation among modules such as CMA, CMS, and ETCD in the second database cluster.
其中,第一数据库集群中的协调节点可以获取到各个主数据节点当前的事务提交进度(即完成事务提交的物理日志的日志序列号,例如可称之为barrier点),并将包含该事务提交进度的提交信息以物理日志的形式传递至第二数据库集群(该步骤也可以是主数据节点完成的),备数据节点从第二存储设备获取到携带有提交信息的物理日志后,可以将物理日志写入本地磁盘,并对新落盘的物理日志进行解析,并将解析到的barrier点存储在哈希表中,并记录当前收到的最大barrier点,最大的barrier点即为主数据节点最新完成事务提交的物理日志的日志序列号,管理节点的功能可以为CMA、CMS、ETCD相互配合实现的,其中,CMA查询CN、DN的barrier最大值上报至CMS,CMS可以将各个备数据节点上的最大barrier点的最小值作为“候选序列号”(或者称之为待探测值),存入ETCD中;CMA从ETCD获取到“待探测值”,对DN进行查询,确认DN是否都存在该点(也就是确定多个备数据节点中的每个备数据节点是否已获取到所述候选序列号对应的物理日志),将结果上报给CMS,CMS进行如下判断:若该“待探测值”对应的物理日志在各个备数据节点都存在,则可以将其作为“目标序列号”(或者简称为目标值)点存入ETCD中,否则舍弃,CMA读取ETCD中的“目标值”,更新本地的“目标值”。在一次上报中,CMA需要查询执行barrier最大值的上报、本地查询“待探测值”是否存在,更新“目标值”这三步;CMS需要执行“待探测值”的更新和“目标值”的更新。Barrier删除是一致性的终点,Barrier删除发生在物理日志回放中,在日志回放时,回放到barrier点时会对回放位置进行更新,并在哈希表将该barrier点删除,从而完成barrier从生成到删除的全过程。Among them, the coordinating node in the first database cluster can obtain the current transaction submission progress of each primary data node (that is, the log sequence number of the physical log that completes the transaction submission, for example, it can be called a barrier point), and will include the transaction submission The progress submission information is transmitted to the second database cluster in the form of physical logs (this step can also be completed by the primary data node). After the standby data node obtains the physical log carrying the submission information from the second storage device, The log is written to the local disk, and the newly placed physical log is parsed, and the parsed barrier point is stored in the hash table, and the largest barrier point currently received is recorded, and the largest barrier point is the primary data node The log sequence number of the physical log submitted by the latest transaction. The function of the management node can be realized by the cooperation of CMA, CMS, and ETCD. Among them, the CMA queries the maximum value of the barrier of CN and DN to the CMS, and the CMS can send each standby data node The minimum value of the largest barrier point above is used as the "candidate serial number" (or called the value to be detected), and stored in ETCD; CMA obtains the "value to be detected" from ETCD, queries the DN, and confirms whether the DN exists At this point (that is, to determine whether each of the multiple standby data nodes has obtained the physical log corresponding to the candidate serial number), the result is reported to the CMS, and the CMS performs the following judgment: if the "value to be detected "The corresponding physical log exists in each standby data node, it can be stored in ETCD as the "target serial number" (or simply called the target value) point, otherwise discarded, CMA reads the "target value" in ETCD, Update the local "target value". In a report, CMA needs to query and execute the report of the maximum value of the barrier, locally query whether the "value to be detected" exists, and update the "target value"; CMS needs to perform the update of the "value to be detected" and the "target value" renew. Barrier deletion is the end point of consistency. Barrier deletion occurs during physical log playback. During log playback, when the playback reaches the barrier point, the playback position will be updated, and the barrier point will be deleted in the hash table, thus completing the generation of the barrier to the entire process of deletion.
本申请实施例中,基于第二数据库集群中的管理节点来维护一个目标序列号作为全局信息,该目标序列号为所述多个所述日志序列号中最小的序列号,且所述多个备数据节点 中的每个备数据节点已获取到所述目标序列号对应的物理日志,各个备数据节点需要在当前待回放的物理日志对应的日志序列号等于目标序列号时才进行日志回放,保证了各个备数据节点回放到目标序列号为止,使得不同备数据节点均恢复到相同的位置,保证了分布式数据库内不同备数据节点间的数据的一致性。In the embodiment of the present application, a target sequence number is maintained as global information based on the management node in the second database cluster, the target sequence number is the smallest sequence number among the plurality of log sequence numbers, and the plurality of Each standby data node in the standby data node has obtained the physical log corresponding to the target serial number, and each standby data node needs to perform log playback only when the log serial number corresponding to the physical log to be played back is equal to the target serial number. It ensures that each standby data node is played back until the target serial number, so that different standby data nodes are restored to the same position, and the data consistency between different standby data nodes in the distributed database is guaranteed.
第三方面,本申请提供了一种第一数据库集群,所述第一数据库集群包括第一数据节点,所述第一数据节点包括:In a third aspect, the present application provides a first database cluster, where the first database cluster includes a first data node, and the first data node includes:
日志获取模块,用于获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;A log acquisition module, configured to acquire a first physical log, where the first physical log includes operation information on data in the first data node;
日志传递模块,用于将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,所述第二数据节点用于作为所述第一数据节点的备份节点。A log transfer module, configured to write the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device, so that the first physical log in the second database cluster Two data nodes obtain the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second database cluster , the first database cluster and the second database cluster are different database clusters, and the second data node is used as a backup node for the first data node.
在一种可能的实现中,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。In a possible implementation, the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
在一种可能的实现中,所述第一数据节点还包括:In a possible implementation, the first data node further includes:
事务提交模块,用于在将所述第一物理日志传递至所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交。A transaction commit module, configured to perform transaction commit on the data in the first data node according to the first physical log after transferring the first physical log to the first storage device.
在一种可能的实现中,所述第一数据库集群还包括第三数据节点;所述日志传递模块,具体用于:In a possible implementation, the first database cluster further includes a third data node; the log transfer module is specifically configured to:
在所述第三数据节点将第二物理日志写入所述第一存储设备时,并行将所述第一物理日志写入第一存储设备。When the third data node writes the second physical log into the first storage device, write the first physical log into the first storage device in parallel.
在一种可能的实现中,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间以及用于存储所述第三数据节点的物理日志的第二存储空间,所述第一存储空间和所述第二存储空间不同;In a possible implementation, the first storage device includes a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node , the first storage space is different from the second storage space;
所述日志传递模块,具体用于:The log transfer module is specifically used for:
在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。When the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
在一种可能的实现中,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间,所述日志传递模块,具体用于:In a possible implementation, the first storage device includes a first storage space for storing the physical log of the first data node, and the log transfer module is specifically configured to:
基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述第一存储空间中确定目标存储空间,所述目标存储空间存储有目标物理日志;Determining a target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log, the target storage space storing a target physical log;
基于所述目标物理日志已被所述第二数据节点执行日志回放,将所述目标存储空间中的目标物理日志替换为所述第一物理日志。Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log.
在一种可能的实现中,所述第一存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间;In a possible implementation, the storage address of the first storage space includes a head address and a tail address, and the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage;
所述日志传递模块,具体用于:The log transfer module is specifically used for:
基于所述第一存储空间中所述尾部地址对应的存储空间被占用,从所述第一存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。Based on that the storage space corresponding to the tail address in the first storage space is occupied, determine from the first storage space that the storage space corresponding to the head address is the target storage space.
在一种可能的实现中,所述第一存储设备为裸设备。In a possible implementation, the first storage device is a raw device.
在一种可能的实现中,所述日志传递模块,还用于:In a possible implementation, the log transfer module is also used to:
在所述第一数据节点根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交之后,所述第一数据节点将包含提交信息的第二物理日志写入所述第一存储设备,所述第一存储设备用于将所述第二物理日志传递至第二存储设备,以便第二数据库集群中的管理节点从所述第二存储设备中获取所述第二物理日志,所述提交信息指示所述第一数据节点已完成所述第一物理日志的事务提交。After the first data node commits the data in the first data node according to the first physical log, the first data node writes the second physical log containing commit information into the first physical log A storage device, the first storage device is used to transfer the second physical log to the second storage device, so that the management node in the second database cluster obtains the second physical log from the second storage device , the commit information indicates that the first data node has completed the transaction commit of the first physical log.
在一种可能的实现中,所述第一数据库集群还包括第四数据节点,所述第四数据节点用于作为所述第一数据节点的备份节点,所述第四数据节点包括:日志获取模块,用于从所述第一存储设备中获取所述第一物理日志;In a possible implementation, the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node, and the fourth data node includes: log acquisition A module, configured to obtain the first physical log from the first storage device;
日志回放模块,用于根据所述第一物理日志进行日志回放。A log playback module, configured to perform log playback according to the first physical log.
第四方面,本申请提供了一种第二数据库集群,所述第二数据库集群包括第二数据节点,所述第二数据节点用于作为第一数据节点的备份节点,所述第一数据节点属于第一数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,第一存储设备部署于所述第一数据库集群,第二存储设备部署于所述第二数据库集群,所述第二数据节点,包括:In a fourth aspect, the present application provides a second database cluster, the second database cluster includes a second data node, the second data node is used as a backup node for the first data node, and the first data node Belonging to the first database cluster, the first database cluster and the second database cluster are different database clusters, the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second database cluster , the second data node includes:
日志获取模块,用于从所述第二存储设备中获取第一物理日志,所述第一物理日志为来自所述第一数据节点并经由所述第一存储设备传递而存储在所述第二存储设备中的物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;a log acquisition module, configured to acquire a first physical log from the second storage device, the first physical log is stored in the second a physical log in the storage device, the first physical log includes operation information on the data in the first data node;
日志回放模块,用于根据所述第一物理日志进行日志回放。A log playback module, configured to perform log playback according to the first physical log.
在一种可能的实现中,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。In a possible implementation, the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
在一种可能的实现中,所述第二数据库集群还包括第五数据节点;所述日志获取模块,具体用于:In a possible implementation, the second database cluster further includes a fifth data node; the log acquisition module is specifically configured to:
在所述第五数据节点从所述第二存储设备中获取物理日志时,所述第二数据节点并行从所述第二存储设备中获取第一物理日志。When the fifth data node obtains the physical log from the second storage device, the second data node obtains the first physical log from the second storage device in parallel.
在一种可能的实现中,所述第二存储设备包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同;In a possible implementation, the second storage device includes a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node , the third storage space is different from the fourth storage space;
所述日志获取模块,具体用于:The log acquisition module is specifically used for:
在所述第五数据节点从所述第四存储空间中获取物理日志时,所述第二数据节点并行从所述第三存储空间中获取第一物理日志。When the fifth data node obtains the physical log from the fourth storage space, the second data node obtains the first physical log from the third storage space in parallel.
在一种可能的实现中,所述第一数据库集群包括所述第一数据节点在内的多个主数据节点,所述第二数据库集群包括所述第二数据节点在内的多个备数据节点,所述第二数据 库集群还包括管理节点,所述管理节点包括:In a possible implementation, the first database cluster includes multiple primary data nodes including the first data node, and the second database cluster includes multiple standby data nodes including the second data node node, the second database cluster further includes a management node, and the management node includes:
提交信息获取模块,用于从所述第二存储设备中获取来自于所述第一数据库集群的提交信息,所述提交信息包括所述多个主数据节点中每个主数据节点最新完成事务提交的物理日志的日志序列号,所述目标序列号为所述多个所述日志序列号中最小的序列号,且所述多个备数据节点中的每个备数据节点已获取到所述目标序列号对应的物理日志;A commit information acquisition module, configured to acquire commit information from the first database cluster from the second storage device, where the commit information includes the latest transaction commit completed by each master data node among the plurality of master data nodes The log sequence number of the physical log, the target sequence number is the smallest sequence number among the multiple log sequence numbers, and each standby data node in the multiple standby data nodes has obtained the target The physical log corresponding to the serial number;
所述日志回放模块,具体用于所述第二数据节点从所述管理节点中获取所述目标序列号;The log playback module is specifically used for the second data node to obtain the target serial number from the management node;
在确定所述第一物理日志的日志序列号等于所述目标序列号之后,根据所述第一物理日志进行日志回放。After determining that the log sequence number of the first physical log is equal to the target sequence number, perform log playback according to the first physical log.
在一种可能的实现中,所述多个主数据节点中的不同主数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交。In a possible implementation, physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Commit transactions to physical logs in ascending order.
第五方面,本申请实施例提供了一种计算机可读存储介质,其特征在于,包括计算机可读指令,当该计算机可读指令在计算机设备上运行时,使得该计算机设备执行上述第一方面及其任一可选的方法、以及上述第二方面及其任一可选的方法。In the fifth aspect, the embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computer device, the computer device is made to execute the above-mentioned first aspect. and any optional method thereof, as well as the above-mentioned second aspect and any optional method thereof.
第六方面,本申请实施例提供了一种计算机程序产品,其特征在于,包括计算机可读指令,当该计算机可读指令在计算机设备上运行时,使得该计算机设备执行上述第一方面及其任一可选的方法、以及上述第二方面及其任一可选的方法。In the sixth aspect, the embodiment of the present application provides a computer program product, which is characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computer device, the computer device executes the above-mentioned first aspect and its Any optional method, and the above-mentioned second aspect and any optional method thereof.
第七方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持上述设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,该芯片系统还包括存储器,该存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a seventh aspect, the present application provides a chip system, which includes a processor, configured to support the above-mentioned device to implement the functions involved in the above-mentioned aspect, for example, send or process the data involved in the above-mentioned method; or, information . In a possible design, the system-on-a-chip further includes a memory, and the memory is used for storing necessary program instructions and data of the execution device or the training device. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。On the basis of the implementation manners provided in the foregoing aspects, the present application may further be combined to provide more implementation manners.
图1为本申请实施例提供的架构示意;Figure 1 is a schematic diagram of the architecture provided by the embodiment of the present application;
图2为本申请实施例提供的架构示意;Figure 2 is a schematic diagram of the architecture provided by the embodiment of the present application;
图3为本申请实施例提供的架构示意;FIG. 3 is a schematic diagram of the architecture provided by the embodiment of the present application;
图4为本申请实施例提供的数据备份方法的流程示意;FIG. 4 is a schematic flow chart of a data backup method provided in an embodiment of the present application;
图5为本申请实施例提供的存储空间示意;FIG. 5 is a schematic diagram of the storage space provided by the embodiment of the present application;
图6为本申请实施例提供的barrier点处理流程示意;FIG. 6 is a schematic diagram of the barrier point processing flow provided by the embodiment of the present application;
图7为本申请实施例提供的第一数据库集群示意;FIG. 7 is a schematic diagram of the first database cluster provided by the embodiment of the present application;
图8为本申请实施例提供的第二数据库集群示意;FIG. 8 is a schematic diagram of the second database cluster provided by the embodiment of the present application;
图9为本申请实施例提供的计算设备的一种结构示意图。FIG. 9 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。Embodiments of the present application are described below with reference to the drawings in the embodiments of the present application. The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned accompanying drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequential order. It should be understood that the terms used like this It can be interchanged under appropriate circumstances, and this is only to describe the distinguishing method adopted when describing the object of the same attribute in the embodiments of the application.In addition, the terms "comprising" and "having" and any deformation thereof are intended to be Covers a non-exclusive inclusion such that a process, method, system, product, or apparatus comprising a series of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to the process, method, product, or apparatus .
图1为依据本申请实施例的一个数据备份系统的系统逻辑结构示意图,系统可以包含客户端、主存储库(例如本申请实施例中的第一数据库集群)和备存储库(例如本申请实施例中的第二数据库集群),其中,主存储库中可以包含多个分片(例如图1所示的分片1以及分片2),其中,每个分片可以包括数据节点(data node,DN),例如图1示出的分片1包含主节点1以及备份节点,分片1的备份节点可以作为主节点1的备份,分片2包含主节点2以及备份节点,其中,分片2的备份节点可以作为主节点2的备份。主存储库中可以包含协调节点(coordinator node,CN),此外,主存储库一侧还可以部署有硬件设备1,硬件设备1可以为存储设备(例如本申请实施例中的第一存储设备)。Figure 1 is a schematic diagram of the system logic structure of a data backup system according to an embodiment of the application. The system may include a client, a main storage library (such as the first database cluster in the embodiment of the application) and a backup storage library (such as the implementation of the application) The second database cluster in the example), wherein, the main storage library can contain multiple fragments (such as fragmentation 1 and fragmentation 2 shown in Figure 1), wherein each fragmentation can include a data node (data node , DN), for example, the fragment 1 shown in Figure 1 includes the master node 1 and the backup node, the backup node of the fragment 1 can be used as the backup of the master node 1, and the fragment 2 includes the master node 2 and the backup node, wherein, the fragment The backup node of 2 can serve as the backup of the primary node 2. The main storage library can include a coordinator node (coordinator node, CN). In addition, a hardware device 1 can also be deployed on one side of the main storage library, and the hardware device 1 can be a storage device (such as the first storage device in the embodiment of the present application) .
备份数据库作为主数据库的备份,可以相对应的部署有多个分片,例如图1所示的分片1以及分片2,其中,备份数据库中的分片1可以作为主数据库中分片1的备份,其中,分片1中的多个备份节点可以作为主节点1的备份,分片2中的多个备份节点可以作为主节点2的备份,此外,主存储库一侧还可以部署有硬件设备2,硬件设备2可以为存储设备(例如本申请实施例中的第二存储设备)。As the backup of the main database, the backup database can be deployed with multiple shards, such as shard 1 and 2 shown in Figure 1, where shard 1 in the backup database can be used as shard 1 in the main database. backup, where multiple backup nodes in shard 1 can serve as backups for primary node 1, and multiple backup nodes in shard 2 can serve as backups for primary node 2. In addition, the side of the main repository can also be deployed with A hardware device 2. The hardware device 2 may be a storage device (for example, the second storage device in the embodiment of the present application).
其中,主存储库或备存储库可以分别是一个存储阵列或网络接入存储(Network Attached Storage,NAS)或存储区域网络(storage area network,SAN)等网络存储架构。每一个存储节点(例如上述描述的数据节点以及协调节点)可以为一个逻辑单元号(logical unit number,LUN)或一个文件系统。应理解,本申请实施例并不对存储库和存储节点的表现形式进行限定。Wherein, the primary storage library or the backup storage library can be a storage array or a network storage architecture such as a network attached storage (Network Attached Storage, NAS) or a storage area network (storage area network, SAN) respectively. Each storage node (such as the data node and coordination node described above) can be a logical unit number (logical unit number, LUN) or a file system. It should be understood that the embodiment of the present application does not limit the expression forms of the storage repository and the storage node.
其中,尽管图1未示出,但主备数据库系统还可以包括客户端,客户端可以和主数据库系统以及备数据库之间通过网络连接,其中网络可以为因特网,内联网,局域网(Local Area Networks,简称LANs),广域网络(Wireless Local Area Networks,简称WLANs),存储区域网络(Storage Area Networks,简称SANs)等,或者以上网络的组合。Wherein, although it is not shown in Fig. 1, the primary and secondary database systems can also include clients, and the clients can be connected to the primary database system and the standby database through a network, wherein the network can be the Internet, an intranet, or a local area network (Local Area Networks , referred to as LANs), wide area networks (Wireless Local Area Networks, referred to as WLANs), storage area networks (Storage Area Networks, referred to as SANs), etc., or a combination of the above networks.
图1所示的主节点以及备份节点可以由图2所示的计算设备200来实现。The primary node and backup node shown in FIG. 1 can be implemented by the computing device 200 shown in FIG. 2 .
图2为计算设备200的简化的逻辑结构示意图,如图2所示,计算设备200包括处理器202、内存单元204、输入/输出接口206、通信接口208、总线210和存储设备212。其中,处理器202、内存单元204、输入/输出接口206、通信接口208和存储设备212,通过总线210实现彼此之间的通信连接。FIG. 2 is a schematic diagram of a simplified logical structure of a computing device 200. As shown in FIG. Among them, the processor 202 , the memory unit 204 , the input/output interface 206 , the communication interface 208 and the storage device 212 are connected to each other through the bus 210 .
处理器202是计算设备200的控制中心,用于执行相关程序,以实现本发明实施例所 提供的技术方案。可选的,处理器202包含一个或多个中央处理器单元(Central Processing Unit,CPU),例如,图2所示的中央处理器单元1和中央处理器单元2。可选的,计算设备200还可以包含多个处理器202,每一个处理器202可以是一个单核处理器(包含一个CPU)或多核处理器(包含多个CPU)。除非另有说明,在本申请实施例中,一个用于执行特定功能的组件,例如,处理器202或内存单元204,可以通过配置一个通用的组件来执行相应功能来实现,也可以通过一个专门执行特定功能的专用组件来实现,本申请并不对此进行限定。处理器202可以采用通用的中央处理器,微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本申请所提供的技术方案。The processor 202 is the control center of the computing device 200, and is used to execute related programs to realize the technical solutions provided by the embodiments of the present invention. Optionally, the processor 202 includes one or more central processing units (Central Processing Unit, CPU), for example, the central processing unit 1 and the central processing unit 2 shown in FIG. 2 . Optionally, the computing device 200 may further include multiple processors 202, and each processor 202 may be a single-core processor (including one CPU) or a multi-core processor (including multiple CPUs). Unless otherwise specified, in the embodiment of the present application, a component for performing a specific function, for example, the processor 202 or the memory unit 204, can be implemented by configuring a general component to perform the corresponding function, or by configuring a dedicated It is implemented by a dedicated component that performs a specific function, which is not limited in this application. The processor 202 can adopt a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, for executing related programs, so as to realize the technology provided by this application plan.
处理器202可以通过总线210与一个或多个存储方案相连接。存储方案可以包含内存单元204和存储设备212。其中,存储设备212可以为只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。内存单元204可以为随机存取存储器。内存单元204可以与处理器202集成在一起或集成在处理器202的内部,也可以是独立于处理器202的一个或多个存储单元。 Processor 202 may be connected to one or more storage schemes via bus 210 . The storage scheme may include memory unit 204 and storage device 212 . Wherein, the storage device 212 can be a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device or a random access memory (Random Access Memory, RAM). The memory unit 204 may be a random access memory. The memory unit 204 may be integrated with the processor 202 or inside the processor 202 , or may be one or more storage units independent of the processor 202 .
供处理器202或处理器202内部的CPU执行的程序代码可以存储在存储设备212或内存单元204中。可选的,存储在存储设备212内部的程序代码(例如,操作系统、应用软件、备份模块、通信模块或存储控制模块等)被拷贝到内存单元204中,以供处理器202执行。Program codes for execution by the processor 202 or a CPU within the processor 202 may be stored in the storage device 212 or the memory unit 204 . Optionally, program codes stored in the storage device 212 (for example, operating system, application software, backup module, communication module or storage control module, etc.) are copied to the memory unit 204 for execution by the processor 202 .
存储设备212可以为物理硬盘或其分区(包括小型计算设备系统接口存储或全局网络块设备卷)、网络存储协议(包括网络文件系统NFS等网络或机群文件系统)、基于文件的虚拟存储设备(虚拟磁盘镜像)、基于逻辑卷的存储设备。可以包含高速随机存储器(RAM),也可以包含非易失性存储器,例如一个或者多个磁盘存储器,闪速存储器,或者其他非易失性存储器。在一些实施例中,存储设备还可能进一步包含与所述一个和多个处理器202分离的远程存储器,例如通过通信接口208与通信网络进行访问的网盘,该通信网络可以为因特网,内联网,局域网(LANs),广域网络(WLANs),存储区域网络(SANs)等,或者以上网络的组合。The storage device 212 can be a physical hard disk or its partition (including small computing device system interface storage or global network block device volume), network storage protocol (including network or cluster file systems such as network file system NFS), file-based virtual storage device ( virtual disk mirroring), storage devices based on logical volumes. It may include high-speed random access memory (RAM), and may also include non-volatile memories, such as one or more disk memories, flash memories, or other non-volatile memories. In some embodiments, the storage device may further include a remote memory separate from the one or more processors 202, such as a network disk accessed through a communication interface 208 and a communication network. The communication network may be the Internet, an intranet , Local Area Networks (LANs), Wide Area Networks (WLANs), Storage Area Networks (SANs), etc., or a combination of the above networks.
操作系统(例如Darwin、RTXC、LINUX、UNIX、OS X、WINDOWS或是诸如Vxworks之类的嵌入式操作系统)包括用于控制和管理常规系统任务(例如内存管理、存储设备控制、电源管理等等)以及有助于各种软硬件组件之间通信的各种软件组件和/或驱动器。Operating systems (such as Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or embedded operating systems such as Vxworks) include tools for controlling and managing routine system tasks (such as memory management, storage device control, power management, etc.) ) and various software components and/or drivers that facilitate communication between the various hardware and software components.
输入/输出接口206用于接收输入的数据和信息,输出操作结果等数据。The input/output interface 206 is used to receive input data and information, and output data such as operation results.
通信接口208使用例如但不限于收发器一类的收发装置,来实现计算设备200与其他设备或通信网络之间的通信。 Communication interface 208 enables communication between computing device 200 and other devices or communication networks using transceiving means such as, but not limited to, transceivers.
总线210可包括一通路,在计算设备200各个部件(例如处理器202、内存单元204、输入/输出接口206、通信接口208和存储设备212)之间传送信息。可选的,总线210可以使用有线的连接方式或采用无线的通讯方式,本申请并不对此进行限定。Bus 210 may comprise a path for carrying information between various components of computing device 200 (eg, processor 202 , memory unit 204 , input/output interface 206 , communication interface 208 , and storage device 212 ). Optionally, the bus 210 may use a wired connection manner or a wireless communication manner, which is not limited in this application.
应注意,尽管图2所示的计算设备200仅仅示出了处理器202、内存单元204、输入/输出接口206、通信接口208、总线210以及存储设备212,但是在具体实现过程中,本领域的技术人员应当明白,计算设备200还包含实现正常运行所必须的其他器件。It should be noted that although the computing device 200 shown in FIG. Those skilled in the art should appreciate that computing device 200 also includes other components necessary for proper operation.
计算设备200可以为一般的通用计算设备或专门用途的计算设备,包括但不限于便携计算设备,个人台式计算设备,网络服务器,平板电脑,手机,个人数字助理(PDA)等任何电子设备,或者以上两种或者多种的组合设备,本申请并不对计算设备200的具体实现形式进行任何限定。The computing device 200 can be a general-purpose computing device or a special-purpose computing device, including but not limited to any electronic device such as a portable computing device, a personal desktop computing device, a network server, a tablet computer, a mobile phone, a personal digital assistant (PDA), or The combination of the above two or more devices, the present application does not limit the specific implementation form of the computing device 200 in any way.
此外,图2的计算设备200仅仅是一个计算设备200的例子,计算设备200可能包含相比于图2展示的更多或者更少的组件,或者有不同的组件配置方式。根据具体需要,本领域的技术人员应当明白,计算设备200还可包含实现其他附加功能的硬件器件。本领域的技术人员应当明白,计算设备200也可仅仅包含实现本发明实施例所必须的器件,而不必包含图2中所示的全部器件。同时,图2中展示的各种组件可以用硬件、软件或者硬件与软件的结合方式实施。In addition, the computing device 200 in FIG. 2 is only an example of the computing device 200 , and the computing device 200 may include more or fewer components than those shown in FIG. 2 , or have different component configurations. According to specific needs, those skilled in the art should understand that the computing device 200 may also include hardware devices for implementing other additional functions. Those skilled in the art should understand that the computing device 200 may also only include the components necessary to implement the embodiment of the present invention, and does not necessarily include all the components shown in FIG. 2 . Meanwhile, various components shown in FIG. 2 may be implemented in hardware, software, or a combination of hardware and software.
图2所示的硬件结构以及上述描述适用于本申请实施例所提供的各种计算设备,适用于执行本申请实施例所提供的各种数据备份方法。The hardware structure shown in FIG. 2 and the above description are applicable to various computing devices provided in the embodiments of the present application, and are suitable for executing various data backup methods provided in the embodiments of the present application.
参照图3,图3为本申请实施例的一种产品实现形态,主要包含了一种日志共享的分布式数据库的双集群容灾架构。数据库双集群分别部署在两个物理区域,运行时,本申请实施例提供的数据备份方法的程序代码运行于服务器的主机内存。以图3所示的应用场景为例,管控侧的客户端可以下发搭建集群、建立双集群的容灾关系、集群切换、集群状态查询等指令,集群内的OM模块接收到指令后,控制CM等模块和数据库节点完成相关操作,并返回执行结果。共享卷是具有远程(物理距离)并行复制数据能力的存储设备,用于在主备集群间同步传输redo日志。数据库集群运行时,主集群每个分片上的主节点产生日志并写入共享卷,同步到备集群对应的共享卷中,主集群和备集群的备数据节点从共享卷中读取日志并进行回放。Referring to FIG. 3 , FIG. 3 is a product implementation form of the embodiment of the present application, which mainly includes a dual-cluster disaster recovery architecture of a distributed database with log sharing. The dual database clusters are respectively deployed in two physical areas, and the program code of the data backup method provided by the embodiment of the present application runs in the host memory of the server during operation. Taking the application scenario shown in Figure 3 as an example, the client on the management and control side can issue commands such as building a cluster, establishing a dual-cluster disaster recovery relationship, cluster switching, and cluster status query. After receiving the command, the OM module in the cluster will control Modules such as CM and database nodes complete related operations and return execution results. A shared volume is a storage device capable of parallel copying data remotely (physical distance), and is used to synchronously transmit redo logs between the active and standby clusters. When the database cluster is running, the primary node on each shard of the primary cluster generates logs and writes them to the shared volume, and synchronizes them to the corresponding shared volume of the standby cluster. The standby data nodes of the primary cluster and the standby cluster read the logs from the shared volume and perform playback.
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。Embodiments of the present application are described below with reference to the drawings in the embodiments of the present application. The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.
参照图4,图4为本申请实施例提供的一种数据备份方法的流程示意,其中,所述方法可以应用于第一数据库集群,所述第一数据库集群包括第一数据节点,所述方法包括:Referring to FIG. 4, FIG. 4 is a schematic flowchart of a data backup method provided in an embodiment of the present application, wherein the method can be applied to a first database cluster, and the first database cluster includes a first data node. The method include:
401、所述第一数据节点获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息。401. The first data node acquires a first physical log, where the first physical log includes operation information on data in the first data node.
在一种可能的实现中,第一数据库集群可以为分布式数据库,第一数据库集群可以为主集群,第二数据库集群可以作为第一数据库集群的备份集群。In a possible implementation, the first database cluster may be a distributed database, the first database cluster may be a master cluster, and the second database cluster may serve as a backup cluster of the first database cluster.
例如,第一数据库集群可以为基于数据分片的分布式架构(shared nothing架构)的数据库系统,各个数据节点可以配置有中央处理器(central processing unit,CPU)、内存和硬盘等,各个存储节点之间不共享资源,第一数据节点可以为第一数据库集群中一个分片的一个数据节点,例如第一数据节点可以为第一数据库集群中一个分片的一个主数据节点。For example, the first database cluster may be a database system based on a distributed architecture (shared nothing architecture) of data sharding, and each data node may be configured with a central processing unit (central processing unit, CPU), memory, and hard disk, etc., and each storage node The first data node may be a data node of a shard in the first database cluster, for example, the first data node may be a master data node of a shard in the first database cluster.
在一种可能的实现中,第一数据节点可以为第一数据库集群中的数据节点DN。第一数据库集群可以部署有至少一个数据节点,其中协调节点可以被部署在计算设备上。数据节点可以被部署在计算设备上。多个协调节点可以分别部署在不同计算设备,或者可以部署 在同一计算设备。多个数据节点可以分别部署在不同计算设备。协调节点与数据节点可以分别部署在不同计算设备,或者可以部署在同一计算设备。In a possible implementation, the first data node may be a data node DN in the first database cluster. The first database cluster can be deployed with at least one data node, where the coordinator node can be deployed on a computing device. Data nodes may be deployed on computing devices. Multiple coordinating nodes can be deployed on different computing devices, or can be deployed on the same computing device. Multiple data nodes can be deployed on different computing devices. The coordinator node and the data node can be deployed on different computing devices, or can be deployed on the same computing device.
在一种可能的实现中,数据可以分布在数据节点上,数据节点之间的数据不共享,在执行业务时,协调节点接收来自客户端的查询请求并生成执行计划下发到各个数据节点,数据节点根据接收到的计划对需要使用的算子(例如数据操作(stream)算子)进行初始化处理,然后执行协调节点下发的执行计划。协调节点与数据节点之间,以及不同物理节点中的数据节点之间可以通过网络通道进行连接,该网络通道可以是可扩展的传输控制协议(scalable transmission control protocol,STCP)等各种通信协议。In a possible implementation, the data can be distributed on the data nodes, and the data between the data nodes is not shared. When executing the business, the coordinating node receives the query request from the client and generates an execution plan and sends it to each data node. The data The node initializes the operator (such as a data operation (stream) operator) to be used according to the received plan, and then executes the execution plan delivered by the coordinating node. Coordinating nodes and data nodes, as well as data nodes in different physical nodes, can be connected through a network channel, and the network channel can be various communication protocols such as scalable transmission control protocol (STCP).
在一种可能的实现中,第一数据节点作为第一数据库集群中的主节点,可以接收来自客户端的数据操作请求,并根据数据操作请求生成第一物理日志,而作为第一数据节点的备份节点,第二数据节点(或者后续实施例中描述的第四数据节点)可以获取到第一物理日志,并根据第一物理日志进行日志回放,来保证第一数据节点和第二数据节点上数据的一致性。In a possible implementation, the first data node, as the master node in the first database cluster, can receive the data operation request from the client, and generate the first physical log according to the data operation request, as the backup of the first data node node, the second data node (or the fourth data node described in the subsequent embodiments) can obtain the first physical log, and perform log playback according to the first physical log to ensure that the data on the first data node and the second data node consistency.
在一种可能的实现中,所述第一物理日志包括对所述第一数据节点中数据的操作信息,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作,第一物理日志可以为Redo日志,也可以被称为XLog,其可以记录数据页的物理修改(或者描述为数据的变化情况),可以被用来恢复事务提交后的物理数据页。In a possible implementation, the first physical log includes operation information on the data in the first data node, and the operation information indicates a modification operation, a write operation, and/or For the delete operation, the first physical log can be a Redo log, also known as an XLog, which can record the physical modification of the data page (or describe it as a change in data), and can be used to restore the physical data page after the transaction is committed.
在一种可能的实现中,数据库系统中的日志文件可以包括逻辑日志文件和物理日志文件。逻辑日志文件中的逻辑日志用于记载对数据库系统执行的逻辑操作的原始逻辑。例如,逻辑日志用于记载对数据库系统执行的数据存取、数据删除、数据修改、数据查询、数据库系统升级和数据库系统管理等逻辑操作的原始逻辑。其中,逻辑操作是指根据用户的数据操作命令进行逻辑处理,确定需要对数据执行哪些数据操作的过程。并且,当数据操作命令使用结构化查询语言(structured query language,SQL)表示时,该逻辑操作的原始逻辑可以为使用SQL语句表示的计算机指令。物理日志文件中的物理日志用于记载数据库系统中数据的变化情况(例如记载数据存储节点中数据页的变化)。该物理日志记录的内容可以理解为对数据库系统执行逻辑操作所引起的数据变化。In a possible implementation, log files in the database system may include logical log files and physical log files. The logical log in the logical log file is used to record the original logic of the logical operation performed on the database system. For example, logical logs are used to record the original logic of logical operations such as data access, data deletion, data modification, data query, database system upgrade, and database system management performed on the database system. The logical operation refers to a process of performing logical processing according to a user's data operation command to determine which data operations need to be performed on the data. In addition, when the data operation command is expressed in a structured query language (structured query language, SQL), the original logic of the logic operation may be a computer instruction expressed in an SQL statement. The physical log in the physical log file is used to record the change of data in the database system (for example, record the change of the data page in the data storage node). The content of the physical log records can be understood as data changes caused by logical operations performed on the database system.
应理解,在数据库初始前,并不区分第一数据库集群和第二数据库集群之间的主备关系,在数据库初始时,可以在两个Region内分别进行数据库的初始化,此时两个集群相互独立,不区分主备角色,分片内的节点均包含一个主节点和多个备数据节点。每个分片配置一个共享卷(即后续实施例中描述的第一存储设备以及第二存储设备中的存储空间),保证分片内的所有节点都具有该共享卷的访问权限,分片内的主节点产生日志并存储在对应分片的存储设备上,此时集群间的存储设备没有建立同步复制关系,不区分主从端。选择其中一个集群作为容灾备集群,停止该集群以防止该集群向共享盘写入数据,分别配置主备集群的相关参数信息,建立存储设备的远程复制关系,即数据由主集群的主端存储设备向备集群的从端存储设备进行同步复制。备集群通过网络向主集群进行build(重建)请求,完成数据和日志等的传输和复制,启动集群,容灾关系建立完成。It should be understood that before the database is initialized, the master-slave relationship between the first database cluster and the second database cluster is not distinguished. When the database is initialized, the database can be initialized separately in the two Regions. At this time, the two clusters interact with each other. Independent, does not distinguish between active and standby roles, each node in the shard contains a primary node and multiple standby data nodes. Each slice is configured with a shared volume (that is, the storage space in the first storage device and the second storage device described in the subsequent embodiments), ensuring that all nodes in the slice have access to the shared volume. The master node of the cluster generates logs and stores them on the storage device corresponding to the shard. At this time, the storage device between the clusters does not establish a synchronous replication relationship, and does not distinguish between the master and slave ends. Select one of the clusters as the disaster recovery cluster, stop the cluster to prevent the cluster from writing data to the shared disk, configure the relevant parameter information of the active and standby clusters, and establish the remote replication relationship of the storage device, that is, the data is sent by the master end of the active cluster The storage device performs synchronous replication to the slave storage device of the standby cluster. The standby cluster sends a build (reconstruction) request to the primary cluster through the network, completes the transmission and replication of data and logs, starts the cluster, and completes the establishment of the disaster recovery relationship.
402、所述第一数据节点将所述第一物理日志写入第一存储设备,所述第一存储设备用 于将所述第一物理日志传递至第二存储设备。402. The first data node writes the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device.
在一种可能的实现中,第一数据节点在获取第一物理日志之后,可以将所述第一物理日志写入第一存储设备。In a possible implementation, after obtaining the first physical log, the first data node may write the first physical log into the first storage device.
其中,第一存储设备和第二存储设备可以为全闪存存储系统等物理设备。Wherein, the first storage device and the second storage device may be physical devices such as an all-flash storage system.
在一种可能的实现中,所述第一存储设备为裸设备。其中,裸设备(raw device),也可以称之为裸分区(即原始分区),是一种没有经过格式化,不通过文件系统来读取的设备文件。由应用程序负责对它进行读写操作。不经过文件系统的缓冲。它是不被操作系统直接管理的设备。可以直接从数据节点的内存写到存储设备张红,少了从操作系统缓存写到存储设备操作系统缓存的步骤,I/O效率更高。第一数据节点可以基于直接I/O的方式将所述第一物理日志写入第一存储设备,提升了读写性能。In a possible implementation, the first storage device is a raw device. Among them, a raw device (raw device), which can also be called a raw partition (that is, a raw partition), is a device file that has not been formatted and is not read through a file system. It is the responsibility of the application to read and write to it. No buffering by the file system. It is a device that is not directly managed by the operating system. It can directly write from the memory of the data node to the storage device Zhanghong, without the steps of writing from the operating system cache to the storage device operating system cache, and the I/O efficiency is higher. The first data node may write the first physical log into the first storage device based on direct I/O, which improves read and write performance.
在一种可能的实现中,所述第一数据节点在将所述第一物理日志写入所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交。In a possible implementation, after the first data node writes the first physical log into the first storage device, according to the first physical log, the data in the first data node Make a transaction commit.
其中,事务提交可以保证第一物理日志中对数据的操作信息持久化,若第一数据节点在完成第一物理日志的事务提交之后才将第一物理日志传递至第一存储设备,由于第一数据节点可能会出现事务提交还未完成就出现故障的情况,则会出现第一物理日志无法被传递至第一存储设备,也就是备数据节点无法获取到用于进行日志回放的第一物理日志,本申请实施例中,第一数据节点在对所述第一数据节点中的数据进行事务提交之前,就将第一物理日志传递至所述第一存储设备,将第一物理日志写入第一存储设备成功后,即可认为第一物理日志已复制到备数据节点,即使第一数据节点出现事务提交还未完成就出现故障的情况,也可以将第一物理日志传递至备数据节点。Among them, the transaction commit can ensure the persistence of the operation information on the data in the first physical log. If the first data node transfers the first physical log to the first storage device after completing the transaction commit of the first physical log, since the first The data node may fail before the transaction submission is completed, and the first physical log cannot be delivered to the first storage device, that is, the standby data node cannot obtain the first physical log for log playback. In the embodiment of this application, before the first data node commits the transaction on the data in the first data node, it transfers the first physical log to the first storage device, and writes the first physical log into the first After the storage device succeeds, it can be considered that the first physical log has been copied to the standby data node. Even if the first data node fails before the transaction commit is completed, the first physical log can be transferred to the standby data node.
在一种可能的实现中,第一数据节点可以将第一物理日志写入第一存储设备,第一存储设备上可以包括为第一数据节点划分出的存储区域(第一存储区域),以第一数据节点属于第一数据库集群中的第一分片为例,第一存储设备上可以包括为第一分片划分出的一个共享卷,第一分片中的各个数据节点可以共享该共享卷。具体的,所述第一存储设备可以包括用于存储所述第一数据节点的物理日志的第一存储空间以及用于存储所述第三数据节点的物理日志的第二存储空间,所述第一存储空间和所述第二存储空间不同。In a possible implementation, the first data node may write the first physical log into the first storage device, and the first storage device may include a storage area (first storage area) allocated for the first data node to For example, the first data node belongs to the first shard in the first database cluster. The first storage device may include a shared volume for the first shard, and each data node in the first shard can share the shared volume. roll. Specifically, the first storage device may include a first storage space for storing the physical log of the first data node and a second storage space for storing the physical log of the third data node, the first A storage space is different from the second storage space.
在一种可能的实现中,所述第一数据库集群还包括第三数据节点,在所述第三数据节点将第二物理日志写入所述第一存储设备时,所述第一数据节点可以并行将所述第一物理日志写入第一存储设备。其中,第三数据节点可以为第一数据库集群中的主节点。具体的,可以在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。In a possible implementation, the first database cluster further includes a third data node, and when the third data node writes the second physical log into the first storage device, the first data node may Writing the first physical log into the first storage device in parallel. Wherein, the third data node may be the master node in the first database cluster. Specifically, when the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
其中,由于为第一数据库集群中不同的主节点划分了不同的存储区域,进而可以支持不同的主节点之间并行执行物理日志的写入,提高了数据写入的并发性,进而提高了数据备份时日志的传输效率。Among them, since different storage areas are divided for different master nodes in the first database cluster, parallel execution of physical log writing between different master nodes can be supported, the concurrency of data writing is improved, and data Log transfer efficiency during backup.
由于存储设备中的存储空间的大小有限,在一种可能的实现中,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间,第一数据节点基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间时,可以从所述第一存储 空间中确定目标存储空间,所述目标存储空间存储有目标物理日志,并基于所述目标物理日志已被所述第二数据节点执行日志回放,将所述目标存储空间中的目标物理日志替换为所述第一物理日志。也就是在第一存储空间的大小不足时,将第一存储空间中已经被占用的存储空间清空并重复使用,且为了防止备数据节点还未拿到物理日志时就被清空,第一数据节点清空的物理日志一定是备数据节点已经进行了日志回放的物理日志(该信息可以是备数据节点完成日志回放后反馈至第一数据节点的),通过上述方式,基于循环读写机制,在保证备数据节点可以拿到全部的物理日志的前提下,节省了存储设备的存储空间。Since the size of the storage space in the storage device is limited, in a possible implementation, the first storage device includes a first storage space for storing the physical log of the first data node, and the first data node is based on the When the storage space available in the first storage space is less than the storage space required by the first physical log, the target storage space may be determined from the first storage space, the target storage space stores the target physical log, and Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log. That is, when the size of the first storage space is insufficient, the occupied storage space in the first storage space is emptied and reused, and in order to prevent the standby data node from being emptied before receiving the physical log, the first data node The cleared physical log must be the physical log that has been played back by the standby data node (this information can be fed back to the first data node after the standby data node completes the log playback). Through the above method, based on the cyclic read and write mechanism, the guarantee On the premise that the standby data node can obtain all physical logs, the storage space of the storage device is saved.
在一种可能的实现中,所述第一存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间,第一数据节点可以基于所述第一存储空间中所述尾部地址对应的存储空间被占用,从所述第一存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。In a possible implementation, the storage address of the first storage space includes a head address and a tail address, and the storage order of the storage space is from the storage space corresponding to the head address to the storage space corresponding to the tail address storage space, the first data node may determine from the first storage space that the storage space corresponding to the head address is the target storage space based on the storage space corresponding to the tail address in the first storage space space.
参照图5,可以在存储设备的首部划分出一块区域(示例性的,大小为16MB)用于写入控制信息(control info),控制信息可以包含了校验码、日志写入位置、文件大小等信息。物理日志可以从16M之后的位置开始写入,物理日志存储区域可以被循环使用,当写入位置(head)更新到日志区域尾部(tail)时,则可以重新从16M的偏移位置处开始继续写入。主节点(例如第一数据节点)产生物理日志,并将物理日志从本地目录拷贝到存储设备上,写入的同时更新控制信息。物理日志写入存储设备上后,即认为日志持久化成功,之后再提交。但由于存储设备空间大小有限,未避免因为主机压力大,导致备机的物理日志还没读到本地而被覆盖掉,导致备机不可用,主机需要知道备机当前本地日志的最大日志序列号(log sequence number,LSN),根据此LSN可以判断是否能继续往存储设备上继续写入数据。因此,对于主机来说,如果有备集群节点连接,则保证至少有一个备集群节点拷贝日志之后,该处的日志才会被覆盖。Referring to Fig. 5, an area (exemplarily, the size is 16MB) can be divided at the head of the storage device for writing control information (control info), and the control information can include check code, log write location, file size and other information. The physical log can be written from a position after 16M, and the physical log storage area can be recycled. When the write position (head) is updated to the tail of the log area (tail), it can resume from the offset position of 16M. write. The master node (for example, the first data node) generates a physical log, copies the physical log from the local directory to the storage device, and updates the control information while writing. After the physical log is written to the storage device, it is considered that the log is persisted successfully, and then submitted. However, due to the limited space of the storage device, it is inevitable that the physical log of the standby machine will be overwritten before it is read locally due to the high pressure on the host machine, resulting in the unavailability of the standby machine. The host needs to know the maximum log sequence number of the current local log of the standby machine (log sequence number, LSN), according to this LSN, it can be judged whether it is possible to continue writing data to the storage device. Therefore, for the host, if there is a standby cluster node connected, ensure that at least one standby cluster node copies the log before the log there will be overwritten.
其中,每一条日志具有唯一的一个LSN,或者说,日志和LSN是一对一的,因此根据LSN能够唯一地确定出一条日志。Wherein, each log has a unique LSN, or in other words, the log and the LSN are one-to-one, so a log can be uniquely determined according to the LSN.
在一种可能的实现中,在所述第一数据节点根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交之后,所述第一数据节点还可以将包含提交信息的第二物理日志写入所述第一存储设备,所述第一存储设备用于将所述第二物理日志传递至第二存储设备,以便第二数据库集群中的管理节点从所述第二存储设备中获取所述第二物理日志,所述提交信息指示所述第一数据节点已完成所述第一物理日志的事务提交。其中提交信息可以被第二数据库集群在进行日志回放时作为全局一致性点的参考。In a possible implementation, after the first data node commits the transaction on the data in the first data node according to the first physical log, the first data node may also include commit information The second physical log of is written into the first storage device, and the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can read from the second The storage device obtains the second physical log, and the commit information indicates that the first data node has completed the transaction commit of the first physical log. The commit information can be used as a reference for the global consistency point when the second database cluster performs log playback.
其中,提交信息可以包括事务提交号,事务提交号可以用于标识已提交的数据库事务(也称事务,transaction)。事务是数据存储节点执行数据库操作的逻辑单位,由一个数据库操作序列构成。事务处于已提交状态表明该事务已成功执行,且已将该事务涉及的数据写入到数据存储节点中。Wherein, the commit information may include a transaction commit number, which may be used to identify a committed database transaction (also called transaction, transaction). A transaction is a logical unit for a data storage node to perform database operations, and consists of a sequence of database operations. A transaction in the submitted state indicates that the transaction has been successfully executed, and the data involved in the transaction has been written to the data storage node.
在一种可能的实现中,所述第一数据库集群还包括第四数据节点,所述第四数据节点用于作为所述第一数据节点的备份节点,例如第四数据节点可以和第一数据节点同一个分片中的数据节点,且第四数据节点用于作为所述第一数据节点的备份节点,进而第四数据 节点可以从所述第一存储设备中获取所述第一物理日志,并所述第四数据节点根据所述第一物理日志进行日志回放。In a possible implementation, the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node, for example, the fourth data node can The nodes are data nodes in the same shard, and the fourth data node is used as a backup node of the first data node, and then the fourth data node can obtain the first physical log from the first storage device, And the fourth data node performs log playback according to the first physical log.
应理解,第四数据节点可以首先读取第一存储设备首部的控制信息,进行校验后,比较存储设备上的日志和本地物理日志的写入进度,如果存储设备上有更新的物理日志,则读取物理日志拷贝到本地,并进行回放,如果没有数据需要读取则循环等待。It should be understood that the fourth data node may first read the control information of the header of the first storage device, and after verification, compare the writing progress of the log on the storage device with the local physical log. If there is an updated physical log on the storage device, Then read the physical log and copy it to the local, and play it back. If there is no data to read, wait in a loop.
在一种可能的实现中,第一存储设备和第二存储设备之间可以进行数据的同步复制,第一存储设备和第二存储设备可以为具有远程以及并行数据传输能力的存储设备。其中,第一存储设备可以将第一物理日志传递至第二存储设备。In a possible implementation, synchronous replication of data may be performed between the first storage device and the second storage device, and the first storage device and the second storage device may be storage devices capable of remote and parallel data transmission. Wherein, the first storage device may transfer the first physical log to the second storage device.
在一种可能的实现中,所述第一存储设备可以包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同,第一存储设备可以将第一物理日志传递至第二存储设备中的第三存储空间。In a possible implementation, the first storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node The third storage space is different from the fourth storage space, and the first storage device may transfer the first physical log to the third storage space in the second storage device.
403、第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志。403. A second data node in the second database cluster acquires the first physical log from the second storage device.
在一种可能的实现中,和第一数据库集群类似,第二数据库集群可以为分布式数据库,第一数据库集群可以为主集群,而第二数据库集群可以作为第一数据库集群的备份集群,例如,第二数据库集群可以为基于数据分片的分布式架构(shared nothing架构)的数据库系统,各个数据节点可以配置有中央处理器(central processing unit,CPU)、内存和硬盘等,各个存储节点之间不共享资源,第二数据节点可以为第二数据库集群中一个分片的一个数据节点。In a possible implementation, similar to the first database cluster, the second database cluster can be a distributed database, the first database cluster can be the master cluster, and the second database cluster can be used as the backup cluster of the first database cluster, for example , the second database cluster can be a database system based on a distributed architecture (shared nothing architecture) based on data sharding, and each data node can be configured with a central processing unit (central processing unit, CPU), memory, and hard disk, etc. The resources are not shared between each other, and the second data node may be a data node of a shard in the second database cluster.
在一种可能的实现中,第二数据节点可以为第二数据库集群中的数据节点DN。第二数据库集群可以部署有至少一个数据节点,其中协调节点可以被部署在计算设备上。数据节点可以被部署在计算设备上。多个协调节点可以分别部署在不同计算设备,或者可以部署在同一计算设备。多个数据节点可以分别部署在不同计算设备。协调节点与数据节点可以分别部署在不同计算设备,或者可以部署在同一计算设备。In a possible implementation, the second data node may be a data node DN in the second database cluster. The second database cluster can be deployed with at least one data node, where the coordinator node can be deployed on a computing device. Data nodes may be deployed on computing devices. Multiple coordinating nodes may be deployed on different computing devices, or may be deployed on the same computing device. Multiple data nodes can be deployed on different computing devices. The coordinator node and the data node can be deployed on different computing devices, or can be deployed on the same computing device.
在一种可能的实现中,第一数据节点作为第一数据库集群中的主节点,可以接收来自客户端的数据操作请求,并根据数据操作请求生成第一物理日志,而作为第一数据节点的备份节点,第二数据节点可以从第二存储设备中获取到第一物理日志,并根据第一物理日志进行日志回放,来保证第一数据节点和第二数据节点上数据的一致性。具体的,第二数据节点可以从第三存储空间中获取到第一物理日志,其中,第三存储空间为第二存储设备中为第二数据节点分配的存储空间。In a possible implementation, the first data node, as the master node in the first database cluster, can receive the data operation request from the client, and generate the first physical log according to the data operation request, as the backup of the first data node Node, the second data node can obtain the first physical log from the second storage device, and perform log playback according to the first physical log, so as to ensure the consistency of data on the first data node and the second data node. Specifically, the second data node may obtain the first physical log from a third storage space, where the third storage space is a storage space allocated for the second data node in the second storage device.
和第一存储设备类似,第二存储设备上可以包括为第二数据节点划分出的存储区域(第三存储区域),以第二数据节点属于第二数据库集群中的第一分片为例,第二存储设备上可以包括为第一分片划分出的一个共享卷,第一分片中的各个数据节点可以共享该共享卷。具体的,所述第二存储设备可以包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同。进而,和第一数据节点将第一物理日志写入第一存储设备时类似,所述第二数据节点可以在其他数据节点从第二存储设备中读取物理日志的同时,并行从所述 第二存储设备中获取第一物理日志。Similar to the first storage device, the second storage device may include a storage area (third storage area) for the second data node, taking the second data node belonging to the first fragment in the second database cluster as an example, The second storage device may include a shared volume allocated for the first slice, and each data node in the first slice may share the shared volume. Specifically, the second storage device may include a third storage space for storing the physical log of the second data node and a fourth storage space for storing the physical log of the fifth data node, the first The third storage space is different from the fourth storage space. Furthermore, similar to when the first data node writes the first physical log into the first storage device, the second data node can read the physical log from the second storage device in parallel while other data nodes read the physical log from the first data node. Acquire the first physical log from the second storage device.
其中,所谓并行,可以理解为在时间上,第一数据节点将所述第一物理日志写入所述第一存储空间的动作和第三数据节点将物理日志写入所述第二存储空间的动作是同时发生的。Wherein, the so-called parallelism can be understood as the action of the first data node writing the first physical log into the first storage space and the action of the third data node writing the physical log into the second storage space in terms of time. Actions happen simultaneously.
其中,由于为第二数据库集群中不同的主节点划分了不同的存储区域,进而可以支持不同的分片的备数据节点之间并行执行物理日志的读取,提高了数据读取的并发性,进而提高了数据备份时日志的传输效率。Among them, since different storage areas are divided for different primary nodes in the second database cluster, it is possible to support parallel execution of physical log reading between different fragmented standby data nodes, which improves the concurrency of data reading. In turn, the log transmission efficiency during data backup is improved.
在一种可能的实现中,所述第一数据库集群包括所述第一数据节点在内的多个数据节点,所述第二数据库集群还包括管理节点,所述管理节点还可以从所述第二存储设备中获取来自于所述第一数据库集群的提交信息,所述提交信息包括所述多个数据节点中每个数据节点最新完成事务提交的物理日志的日志序列号,所述目标序列号为所述多个所述日志序列号中最小的序列号;In a possible implementation, the first database cluster includes multiple data nodes including the first data node, the second database cluster further includes a management node, and the management node can also obtain Obtain the commit information from the first database cluster in the second storage device, the commit information includes the log sequence number of the physical log of the latest transaction commit completed by each data node among the plurality of data nodes, and the target sequence number is the smallest sequence number among the plurality of log sequence numbers;
现有技术采用了基于存储设备的分布式一致性机制,通过生成全局barrier日志,来确保找到不同分片公共的最远恢复点,但是无法解决存储设备出现网络问题导致数据同步失败的问题。The existing technology adopts a distributed consistency mechanism based on storage devices and generates a global barrier log to ensure that the farthest recovery point common to different shards can be found, but it cannot solve the problem of data synchronization failure caused by network problems in storage devices.
在一种可能的实现中,为了保证各个备数据节点之间的分布式一致性,需要保证各个备数据节点之间的日志回放进度相同,也就是回放的日志序列号LSN一致(日志序列号可以指示回放进度,序列号越大,则进度越靠前),本申请实施例中,第二数据库集群中的管理节点可以维护一个全局信息(该全局信息可以称之为barrier点),该全局信息可以各个备数据节点已获取到的物理日志的日志序列号,且该日志序列号为各个主节点当前已进行事务提交的最大物理日志序列号中最小的日志序列号,例如管理节点获取到:主节点1当前的事务提交进度为1、2,主节点2当前的事务提交进度为1、2,主节点3当前的事务提交进度为1、2、3,主节点4当前的事务提交进度为1、2、3、4,则3为管理节点所获取到的各个主节点当前已进行事务提交的最大物理日志序列号中最小的物理日志序列号。In a possible implementation, in order to ensure the distributed consistency between each standby data node, it is necessary to ensure that the log playback progress between each standby data node is the same, that is, the log sequence number LSN of the playback is consistent (the log sequence number can be Indicates the playback progress, the larger the serial number, the more advanced the progress), in the embodiment of the present application, the management node in the second database cluster can maintain a global information (this global information can be called a barrier point), the global information The log sequence number of the physical log obtained by each standby data node, and the log sequence number is the smallest log sequence number among the largest physical log sequence numbers currently committed by each master node. For example, the management node obtains: The current transaction submission progress of node 1 is 1, 2, the current transaction submission progress of master node 2 is 1, 2, the current transaction submission progress of master node 3 is 1, 2, 3, and the current transaction submission progress of master node 4 is 1 , 2, 3, 4, then 3 is the smallest physical log sequence number among the maximum physical log sequence numbers that each master node has currently committed transactions obtained by the management node.
在一种可能的实现中,所述多个主数据节点中的不同主数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交,也就是说,序列号可以指示主节点事务提交的进度。In a possible implementation, physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Transactions are committed to the physical log in ascending order, that is, the sequence number can indicate the progress of the transaction submission of the master node.
应理解,上述管理节点可以为运维管理模块(operation manager,OM)、集群管理模块(cluster manager,CM)、集群管理代理(CM agent,CMA)、集群管理服务(CM Server,CMS)、全局事务管理器(global transaction manager,GTM)等。It should be understood that the above-mentioned management node can be an operation and maintenance management module (operation manager, OM), a cluster management module (cluster manager, CM), a cluster management agent (CM agent, CMA), a cluster management service (CM Server, CMS), a global Transaction manager (global transaction manager, GTM), etc.
其中,第一数据库集群中的协调节点可以获取到各个主数据节点当前的事务提交进度(即完成事务提交的物理日志的日志序列号,例如可称之为barrier点),并将包含该事务提交进度的提交信息以物理日志的形式传递至第二数据库集群(该步骤也可以是主数据节点完成的),备数据节点从第二存储设备获取到携带有提交信息的物理日志后,可以将物理日志写入本地磁盘,并对新落盘的物理日志进行解析,并将解析到的barrier点存储在哈希表中,并记录当前收到的最大barrier点,最大的barrier点即为主数据节点最新完成事务提交的物理日志的日志序列号,管理节点的功能可以为CMA、CMS、ETCD相互配合实现的, 其中,CMA查询CN、DN的barrier最大值上报至CMS,CMS可以将各个备数据节点上的最大barrier点的最小值作为“候选序列号”(或者称之为待探测值),存入ETCD中;CMA从ETCD获取到“待探测值”,对DN进行查询,确认DN是否都存在该点(也就是确定多个备数据节点中的每个备数据节点是否已获取到所述候选序列号对应的物理日志),将结果上报给CMS,CMS进行如下判断:若该“待探测值”对应的物理日志在各个备数据节点都存在,则可以将其作为“目标序列号”(或者简称为目标值)点存入ETCD中,否则舍弃,CMA读取ETCD中的“目标值”,更新本地的“目标值”。在一次上报中,CMA需要查询执行barrier最大值的上报、本地查询“待探测值”是否存在,更新“目标值”这三步;CMS需要执行“待探测值”的更新和“目标值”的更新。Barrier删除是一致性的终点,Barrier删除发生在物理日志回放中,在日志回放时,回放到barrier点时会对回放位置进行更新,并在哈希表将该barrier点删除,从而完成barrier从生成到删除的全过程。Among them, the coordinating node in the first database cluster can obtain the current transaction submission progress of each primary data node (that is, the log sequence number of the physical log that completes the transaction submission, for example, it can be called a barrier point), and will include the transaction submission The progress submission information is transmitted to the second database cluster in the form of physical logs (this step can also be completed by the primary data node). After the standby data node obtains the physical log carrying the submission information from the second storage device, The log is written to the local disk, and the newly placed physical log is parsed, and the parsed barrier point is stored in the hash table, and the largest barrier point currently received is recorded, and the largest barrier point is the primary data node The log sequence number of the physical log submitted by the latest transaction. The function of the management node can be realized by the cooperation of CMA, CMS, and ETCD. Among them, the CMA queries the maximum value of the barrier of CN and DN to the CMS, and the CMS can send each backup data node The minimum value of the largest barrier point above is used as the "candidate serial number" (or called the value to be detected), and stored in ETCD; CMA obtains the "value to be detected" from ETCD, queries the DN, and confirms whether the DN exists At this point (that is, to determine whether each of the multiple standby data nodes has obtained the physical log corresponding to the candidate serial number), the result is reported to the CMS, and the CMS performs the following judgment: if the "value to be detected "The corresponding physical log exists in each standby data node, it can be stored in ETCD as the "target serial number" (or simply called the target value) point, otherwise discarded, CMA reads the "target value" in ETCD, Update the local "target value". In a report, CMA needs to query and execute the report of the maximum value of the barrier, locally query whether the "value to be detected" exists, and update the "target value"; CMS needs to perform the update of the "value to be detected" and the "target value" renew. Barrier deletion is the end point of consistency. Barrier deletion occurs during physical log playback. During log playback, when the playback reaches the barrier point, the playback position will be updated, and the barrier point will be deleted in the hash table, thus completing the generation of the barrier to the entire process of deletion.
示例性的,在多分片场景下,存在多个共享存储设备的同步链路,为了确保在出现同步链路进度不一致的情况下,仍能确保分布式一致性。备集群需要得到目前各个分片当前最大barrier点里的最小barrier点(即多个主数据节点中每个主数据节点最新完成事务提交的物理日志的日志序列号),数据库备集可以恢复到最小barrier点。例如可以分为四个阶段:barrier生成、barrier解析存储、barrier推进、barrier删除。其中Barrier生成是一致性的前提,Barrier点可由任一CN节点发起,但由第一个CN负责生成。若发起barrier生成的CN不是第一个CN,则通知第一个CN进行生成。生成后CN和/或DN节点将其添加到物理日志中。Barrier解析存储是一致性的基础,备集群上对应的备数据节点通过存储设备收到日志后,将日志写入本地磁盘。首先对新落盘的日志进行解析,并将解析到的barrier点存储在哈希表中,并记录当前收到的最大barrier点。哈希表在创建日志解析线程前进行创建,在集群卸载时进行释放。哈希表中储存着解析出来的barrier点,这些barrier将在回放物理日志时进行删除。Barrier推进是一致性的关键,这部分可以通过CN、DN、CMA、CMS、ETCD相互配合进行,如图6所示。Barrier一致性点的推进可以包括五个周期:第一个周期CMA查询CN、DN的barrier最大值上报至CMS;CMS通过收齐比较的到其中的最小值作为“待探测值”,存入ETCD中;CMA从ETCD获取到“待探测值”,对CN、DN进行查询,确认DN是否都存在该点,将结果上报给CMS、CMS收齐后判断,若该“待探测值”各分片都确认存在,则作为“目标值”点存入ETCD中,否则舍弃;CMA读取ETCD中的“目标值”,更新本地的“目标值”。在一次上报中,CMA需要查询执行barrier最大值的上报、本地查询“待探测值”是否存在,更新“目标值”这三步;CMS需要执行“待探测值”的更新和“目标值”的更新。Barrier删除是一致性的终点,Barrier删除发生在物理日志回放中,在日志回放时,回放到barrier点时会对回放位置进行更新,并在哈希表将该barrier点删除,从而完成barrier从生成到删除的全过程。Exemplarily, in a multi-sharding scenario, there are multiple synchronization links that share storage devices, in order to ensure that distributed consistency can still be ensured in the case of inconsistent progress of synchronization links. The standby cluster needs to obtain the minimum barrier point among the current maximum barrier points of each fragment (that is, the log sequence number of the physical log of the latest transaction submission of each primary data node among multiple primary data nodes), and the database backup set can be restored to the minimum barrier point. For example, it can be divided into four stages: barrier generation, barrier parsing and storage, barrier advancement, and barrier deletion. Among them, barrier generation is the premise of consistency. Barrier points can be initiated by any CN node, but the first CN is responsible for generating them. If the CN that initiates barrier generation is not the first CN, notify the first CN to generate. CN and/or DN nodes add it to the physical log after generation. Barrier parsing and storage is the basis of consistency. After the corresponding standby data node on the standby cluster receives the log through the storage device, it writes the log to the local disk. First, parse the newly placed log, store the parsed barrier points in the hash table, and record the currently received maximum barrier point. The hash table is created before the log parsing thread is created, and released when the cluster is uninstalled. The parsed barrier points are stored in the hash table, and these barriers will be deleted when playing back the physical log. Barrier advancement is the key to consistency, and this part can be carried out through the cooperation of CN, DN, CMA, CMS, and ETCD, as shown in Figure 6. The advancement of the barrier consistency point can include five cycles: in the first cycle, CMA queries the maximum value of the barrier of CN and DN and reports it to CMS; CMS collects and compares the minimum value among them as the "value to be detected" and stores it in ETCD Middle; CMA obtains the "value to be detected" from ETCD, queries CN and DN, confirms whether the point exists in DN, and reports the result to CMS, and judges after collecting all the values. If the "value to be detected" in each fragment If it is confirmed that it exists, it will be stored in ETCD as a "target value" point, otherwise it will be discarded; CMA reads the "target value" in ETCD and updates the local "target value". In a report, CMA needs to query and execute the report of the maximum value of the barrier, locally query whether the "value to be detected" exists, and update the "target value"; CMS needs to perform the update of the "value to be detected" and the "target value" renew. Barrier deletion is the end point of consistency. Barrier deletion occurs during physical log playback. During log playback, when the playback reaches the barrier point, the playback position will be updated, and the barrier point will be deleted in the hash table, thus completing the generation of the barrier to the entire process of deletion.
本申请实施例中,基于第二数据库集群中的管理节点来维护一个目标序列号作为全局信息,该目标序列号为所述多个所述日志序列号中最小的序列号,且所述多个备数据节点中的每个备数据节点已获取到所述目标序列号对应的物理日志,各个备数据节点需要在当前待回放的物理日志对应的日志序列号等于目标序列号时才进行日志回放,保证了各个备 数据节点回放到目标序列号为止,使得不同备数据节点均恢复到相同的位置,保证了分布式数据库内不同备数据节点间的数据的一致性。In the embodiment of the present application, a target sequence number is maintained as global information based on the management node in the second database cluster, the target sequence number is the smallest sequence number among the plurality of log sequence numbers, and the plurality of Each standby data node in the standby data node has obtained the physical log corresponding to the target serial number, and each standby data node needs to perform log playback only when the log serial number corresponding to the physical log to be played back is equal to the target serial number. It ensures that each standby data node is played back until the target serial number, so that different standby data nodes are restored to the same position, and the data consistency between different standby data nodes in the distributed database is guaranteed.
404、所述第二数据节点根据所述第一物理日志进行日志回放。404. The second data node performs log replay according to the first physical log.
在一种可能的实现中,所述第二数据节点从所述管理节点中获取所述目标序列号,所述第二数据节点在确定所述第一物理日志的日志序列号等于所述目标序列号之后,根据所述第一物理日志进行日志回放。In a possible implementation, the second data node obtains the target sequence number from the management node, and the second data node determines that the log sequence number of the first physical log is equal to the target sequence number After the number, perform log playback according to the first physical log.
在第一数据库集群故障或用户手动调整的原因下,需要第二数据库集群变为主集群时,可以通过failover流程或者switchover流程来实现。When the first database cluster is faulty or manually adjusted by the user, when the second database cluster needs to become the primary cluster, it can be realized through the failover process or the switchover process.
其中failover流程是在主集群发生异常的情况下,进行failover切换,即备集群升为主集群,继续提供生产服务。管控侧的客户端下发集群failover指令,检查存储设备的状态,如果状态正常,则可以进行RPO=0的切换;中断存储设备的同步关系,去除备集群存储设备的写保护,使得该存储设备可读写;停止备集群,对备集群的CN节点,将存储设备中的redo日志覆盖到本地日志;更新备集群etcd中存储的相关参数信息;OM修改CM、CN、DN的模式参数,按照主集群模式启动集群。Among them, the failover process is to perform a failover switchover when an abnormality occurs in the main cluster, that is, the standby cluster is upgraded to the main cluster and continues to provide production services. The client on the control side issues the cluster failover command to check the status of the storage device. If the status is normal, switch to RPO=0; interrupt the synchronization relationship of the storage device and remove the write protection of the storage device of the standby cluster, so that the storage device Readable and writable; stop the standby cluster, and overwrite the redo log in the storage device to the local log for the CN node of the standby cluster; update the relevant parameter information stored in etcd of the standby cluster; OM modifies the mode parameters of CM, CN, and DN according to The main cluster mode starts the cluster.
其中,switchover流程是在主备集群正常运行的情况下,由用户发起的有计划的集群角色切换,即主集群降为备集群,备集群升为主集群,代替原主集群提供生产服务。管控侧的客户端首先对主集群下发集群switchover指令,检查存储设备的状态,如果状态正常,则可以进行RPO=0的切换;关闭主集群,OM修改CM、CN、DN的模式参数,按照备集群模式启动集群;检查存储设备的状态,存储设备进行主从切换,即数据的复制方向为原备集群向原主集群进行同步传输;停止备集群,对备集群的CN节点,将存储设备中的redo日志覆盖到本地日志;OM修改CM、CN、DN的模式参数,按照主集群模式启动原备集群。Among them, the switchover process is a planned cluster role switch initiated by the user when the active and standby clusters are running normally, that is, the active cluster is downgraded to the standby cluster, and the standby cluster is promoted to the active cluster to replace the original active cluster to provide production services. The client on the management and control side first sends a cluster switchover command to the main cluster to check the status of the storage device. If the status is normal, switch to RPO=0; shut down the main cluster, and the OM modifies the mode parameters of CM, CN, and DN according to Start the cluster in the standby cluster mode; check the status of the storage device, and the storage device performs master-slave switching, that is, the direction of data replication is synchronous transmission from the original backup cluster to the original master cluster; stop the backup cluster, and for the CN node of the backup cluster, transfer the The redo log is overwritten to the local log; OM modifies the mode parameters of CM, CN, and DN, and starts the original cluster according to the mode of the main cluster.
本申请实施例提供了一种数据备份方法,所述方法应用于第一数据库集群,所述第一数据库集群包括第一数据节点,所述方法包括:所述第一数据节点获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;所述第一数据节点将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,所述第二数据节点用于作为所述第一数据节点的备份节点。通过上述方式,第一数据库集群(主集群)中的数据节点可以通过存储设备将物理日志快速地同步到第二数据库集群(备集群),从而更接近于实现数据恢复点目标(recovery point objective,RPO)为0的目标,同时保证业务的高性能,提高了数据备份时的数据传输效率。An embodiment of the present application provides a data backup method, the method is applied to a first database cluster, the first database cluster includes a first data node, and the method includes: the first data node obtains a first physical log , the first physical log includes operation information on data in the first data node; the first data node writes the first physical log into a first storage device, and the first storage device is used to store The first physical log is transferred to the second storage device, so that the second data node in the second database cluster obtains the first physical log from the second storage device, wherein the first storage device is deployed in The first database cluster, the second storage device is deployed in the second database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node is used as A backup node of the first data node. Through the above method, the data nodes in the first database cluster (primary cluster) can quickly synchronize the physical log to the second database cluster (standby cluster) through the storage device, thereby getting closer to realizing the data recovery point objective (recovery point objective, RPO) is 0, while ensuring the high performance of the business and improving the data transmission efficiency during data backup.
参照图7,图7为本申请实施例的一种第一数据库集群700的结构示意图,所述第一数据库集群700可以包括第一数据节点70,所述第一数据节点70可以包括:Referring to FIG. 7, FIG. 7 is a schematic structural diagram of a first database cluster 700 according to an embodiment of the present application. The first database cluster 700 may include a first data node 70, and the first data node 70 may include:
日志获取模块701,用于获取第一物理日志,所述第一物理日志包括对所述第一数据节点70中数据的操作信息;A log obtaining module 701, configured to obtain a first physical log, where the first physical log includes operation information on the data in the first data node 70;
关于日志获取模块701的具体描述可以参照上述实施例中步骤401的描述,这里不再赘述。For a specific description of the log acquisition module 701, reference may be made to the description of step 401 in the above embodiment, and details are not repeated here.
在具体实现过程中,日志获取模块701可以由图2所示的处理器202,内存单元204来实现。更具体的,可以由处理器202执行内存单元204中相关代码以获取第一物理日志。In a specific implementation process, the log acquisition module 701 may be implemented by the processor 202 and the memory unit 204 shown in FIG. 2 . More specifically, the processor 202 may execute related codes in the memory unit 204 to obtain the first physical log.
日志传递模块702,用于将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点80从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,所述第二数据节点80用于作为所述第一数据节点70的备份节点。A log transfer module 702, configured to write the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device, so that the The second data node 80 obtains the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second A database cluster, the first database cluster and the second database cluster are different database clusters, and the second data node 80 is used as a backup node for the first data node 70 .
关于日志传递模块702的具体描述可以参照上述实施例中步骤402的描述,这里不再赘述。For a specific description of the log delivery module 702, reference may be made to the description of step 402 in the above embodiment, and details are not repeated here.
在具体实现过程中,日志传递模块702可以由图2所示的处理器202,内存单元204和通信接口208来实现。更具体的,可以由处理器202执行内存单元204中的通信模块和备份模块,以使通信接口208将所述第一物理日志写入第一存储设备。In a specific implementation process, the log delivery module 702 may be implemented by the processor 202 , the memory unit 204 and the communication interface 208 shown in FIG. 2 . More specifically, the processor 202 may execute the communication module and the backup module in the memory unit 204, so that the communication interface 208 writes the first physical log into the first storage device.
在一种可能的实现中,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。In a possible implementation, the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
在一种可能的实现中,所述第一数据节点70还包括:In a possible implementation, the first data node 70 further includes:
事务提交模块703,用于在将所述第一物理日志传递至所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点70中的数据进行事务提交。The transaction commit module 703 is configured to perform transaction commit on the data in the first data node 70 according to the first physical log after transferring the first physical log to the first storage device.
在一种可能的实现中,所述第一数据库集群还包括第三数据节点;所述日志传递模块702,具体用于:In a possible implementation, the first database cluster further includes a third data node; the log transfer module 702 is specifically configured to:
在所述第三数据节点将第二物理日志写入所述第一存储设备时,并行将所述第一物理日志写入第一存储设备。When the third data node writes the second physical log into the first storage device, write the first physical log into the first storage device in parallel.
在一种可能的实现中,所述第一存储设备包括用于存储物理日志的存储空间,所述日志传递模块702,具体用于:In a possible implementation, the first storage device includes storage space for storing physical logs, and the log delivery module 702 is specifically configured to:
基于所述存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述存储空间中确定目标存储空间,所述目标存储空间存储有目标物理日志;determining a target storage space from the storage space based on that the available storage space in the storage space is less than the storage space required by the first physical log, and the target storage space stores a target physical log;
基于所述目标物理日志已被所述第二数据节点80执行日志回放,将所述目标存储空间中的目标物理日志替换为所述第一物理日志。Based on the fact that the target physical log has been played back by the second data node 80, the target physical log in the target storage space is replaced by the first physical log.
在一种可能的实现中,所述存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序被配置为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间;In a possible implementation, the storage address of the storage space includes a head address and a tail address, and the storage order of the storage space is configured from the storage space corresponding to the head address to the storage space corresponding to the tail address storage;
所述日志传递模块702,具体用于:The log transfer module 702 is specifically used for:
基于所述存储空间中所述尾部地址对应的存储空间被占用,从所述存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。Based on the storage space corresponding to the tail address in the storage space is occupied, determine from the storage space that the storage space corresponding to the head address is the target storage space.
在一种可能的实现中,所述第一存储设备为裸设备。In a possible implementation, the first storage device is a raw device.
在一种可能的实现中,所述日志传递模块702,还用于:In a possible implementation, the log transfer module 702 is further configured to:
在所述第一数据节点70根据所述第一物理日志,对所述第一数据节点70中的数据进行事务提交之后,所述第一数据节点70将包含提交信息的第二物理日志写入所述第一存储设备,所述第一存储设备用于将所述第二物理日志传递至第二存储设备,以便第二数据库集群中的管理节点从所述第二存储设备中获取所述第二物理日志,所述提交信息指示所述第一数据节点70已完成所述第一物理日志的事务提交。After the first data node 70 commits the data in the first data node 70 according to the first physical log, the first data node 70 writes the second physical log containing the commit information into The first storage device, the first storage device is used to transfer the second physical log to the second storage device, so that the management nodes in the second database cluster can obtain the second physical log from the second storage device Two physical logs, the commit information indicates that the first data node 70 has completed the transaction commit of the first physical log.
在一种可能的实现中,所述第一数据库集群还包括第四数据节点,所述第四数据节点用于作为所述第一数据节点70的备份节点,所述第四数据节点包括:日志获取模块,用于从所述第一存储设备中获取所述第一物理日志;In a possible implementation, the first database cluster further includes a fourth data node, the fourth data node is used as a backup node for the first data node 70, and the fourth data node includes: a log an obtaining module, configured to obtain the first physical log from the first storage device;
日志回放模块,用于根据所述第一物理日志进行日志回放。A log playback module, configured to perform log playback according to the first physical log.
参照图8,图8为本申请实施例的一种第二数据库集群800的结构示意图,所述第二数据库集群800可以包括第二数据节点80,所述第二数据节点80用于作为第一数据节点70的备份节点,所述第一数据节点70属于第一数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,第一存储设备部署于所述第一数据库集群,第二存储设备部署于所述第二数据库集群,所述第二数据节点80,包括:Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a second database cluster 800 according to an embodiment of the present application. The second database cluster 800 may include a second data node 80, and the second data node 80 is used as a first The backup node of the data node 70, the first data node 70 belongs to the first database cluster, the first database cluster and the second database cluster are different database clusters, and the first storage device is deployed in the first database cluster In the cluster, the second storage device is deployed in the second database cluster, and the second data node 80 includes:
日志获取模块801,用于从所述第二存储设备中获取第一物理日志,所述第一物理日志为来自所述第一数据节点70并经由所述第一存储设备传递而存储在所述第二存储设备中的物理日志,所述第一物理日志包括对所述第一数据节点70中数据的操作信息;A log obtaining module 801, configured to obtain a first physical log from the second storage device, the first physical log is from the first data node 70 and is stored in the A physical log in the second storage device, the first physical log includes operation information on the data in the first data node 70;
关于日志获取模块801的具体描述可以参照上述实施例中步骤403的描述,这里不再赘述。For a specific description of the log acquisition module 801, reference may be made to the description of step 403 in the above embodiment, and details are not repeated here.
在具体实现过程中,日志获取模块801可以由图2所示的处理器202,内存单元204和通信接口208来实现。更具体的,可以由处理器202执行内存单元204中的通信模块,以使通信接口208从所述第二存储设备中获取第一物理日志。In a specific implementation process, the log acquisition module 801 may be implemented by the processor 202 , the memory unit 204 and the communication interface 208 shown in FIG. 2 . More specifically, the processor 202 may execute the communication module in the memory unit 204, so that the communication interface 208 obtains the first physical log from the second storage device.
日志回放模块802,用于根据所述第一物理日志进行日志回放。The log playback module 802 is configured to perform log playback according to the first physical log.
关于日志回放模块802的具体描述可以参照上述实施例中步骤404的描述,这里不再赘述。For a specific description of the log playback module 802, reference may be made to the description of step 404 in the above embodiment, and details are not repeated here.
在一种可能的实现中,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。In a possible implementation, the operation information indicates a modification operation, a write operation, and/or a deletion operation on the data in the data node.
在一种可能的实现中,所述第二数据库集群还包括第五数据节点;所述日志获取模块,具体用于:In a possible implementation, the second database cluster further includes a fifth data node; the log acquisition module is specifically configured to:
在所述第五数据节点从所述第二存储设备中获取物理日志时,所述第二数据节点80并行从所述第二存储设备中获取第一物理日志。When the fifth data node obtains the physical log from the second storage device, the second data node 80 obtains the first physical log from the second storage device in parallel.
在一种可能的实现中,所述第一数据库集群包括所述第一数据节点70在内的多个数据节点,所述第二数据库集群还包括管理节点,所述管理节点包括:In a possible implementation, the first database cluster includes a plurality of data nodes including the first data node 70, and the second database cluster further includes a management node, and the management node includes:
提交信息获取模块,用于从所述第二存储设备中获取来自于所述第一数据库集群的提交信息,所述提交信息包括所述多个数据节点中每个数据节点最新完成事务提交的物理日志的日志序列号,所述目标序列号为所述多个所述日志序列号中最小的序列号;A commit information acquiring module, configured to acquire, from the second storage device, commit information from the first database cluster, where the commit information includes the physical transaction commit latest completed by each data node among the plurality of data nodes. the log sequence number of the log, and the target sequence number is the smallest sequence number among the plurality of log sequence numbers;
所述日志回放模块,具体用于所述第二数据节点80从所述管理节点中获取所述目标序 列号;The log playback module is specifically used for the second data node 80 to obtain the target serial number from the management node;
在确定所述第一物理日志的日志序列号等于所述目标序列号之后,根据所述第一物理日志进行日志回放。After determining that the log sequence number of the first physical log is equal to the target sequence number, perform log playback according to the first physical log.
在一种可能的实现中,所述多个主数据节点中的不同主数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交。In a possible implementation, physical logs with the same sequence number on different primary data nodes among the multiple primary data nodes correspond to the same task, and each primary data node among the multiple primary data nodes is based on the sequence number Commit transactions to physical logs in ascending order.
本申请实施例还提供了一种计算设备,该计算设备可以为上述实施例中描述的第一数据库集群中的节点或者第二数据库集群中的节点。该计算设备可以为服务器或终端等。前述数据库管理节点和/或数据存储节点可以部署在该计算设备中。如图9所示,该计算设备90包括:处理器901,通信接口902和存储器903。处理器901,通信接口902和存储器903之间通过总线904相互连接。The embodiment of the present application also provides a computing device, which may be a node in the first database cluster or a node in the second database cluster described in the above embodiments. The computing device may be a server or a terminal. The foregoing database management node and/or data storage node may be deployed in the computing device. As shown in FIG. 9 , the computing device 90 includes: a processor 901 , a communication interface 902 and a memory 903 . The processor 901 , the communication interface 902 and the memory 903 are connected to each other through a bus 904 .
存储器903用于存储计算机指令。处理器901执行存储器903中的计算机指令时,能够实现该计算机指令的功能。例如,处理器901执行存储器903中的计算机指令时,能够实现本申请实施例提供的数据恢复方法。又例如,当数据库管理节点部署在计算机设备中时,处理器901执行存储器903中的计算机指令时,能够实现本申请实施例提供的数据备份方法中第一数据节点、第四数据节点的功能。再例如,当数据存储节点部署在计算机设备中时,处理器901执行存储器903中的计算机指令时,能够实现本申请实施例提供的数据备份方法中第二数据节点的功能。The memory 903 is used to store computer instructions. When the processor 901 executes the computer instructions in the memory 903, it can realize the functions of the computer instructions. For example, when the processor 901 executes the computer instructions in the memory 903, the data recovery method provided in the embodiment of the present application can be implemented. For another example, when the database management node is deployed in a computer device, when the processor 901 executes computer instructions in the memory 903, the functions of the first data node and the fourth data node in the data backup method provided by the embodiment of the present application can be realized. For another example, when the data storage node is deployed in the computer device, when the processor 901 executes the computer instructions in the memory 903, the function of the second data node in the data backup method provided by the embodiment of the present application can be realized.
在图9中,总线904可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。In FIG. 9, the bus 904 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.
在图9中,处理器901可以是硬件芯片,该硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。或者,也可以是通用处理器,例如,中央处理器(central processing unit,CPU),网络处理器(network processor,NP),或者,CPU和NP的组合。In FIG. 9, the processor 901 may be a hardware chip, and the hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD) or a combination thereof. The aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof. Alternatively, it may also be a general-purpose processor, for example, a central processing unit (central processing unit, CPU), a network processor (network processor, NP), or a combination of a CPU and NP.
在图9中,存储器903可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)。还可以包括上述种类的存储器的组合。In FIG. 9, the memory 903 may include a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM). It may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD). Combinations of the above types of memory may also be included.
本申请实施例还提供了一种存储介质,该存储介质为非易失性计算机可读存储介质,存储介质中的指令用于实现本申请实施例提供的数据备份方法。The embodiment of the present application also provides a storage medium, which is a non-volatile computer-readable storage medium, and the instructions in the storage medium are used to implement the data backup method provided in the embodiment of the present application.
本申请实施例还提供了一种包含指令的计算机程序产品,计算机程序产品包括的指令用于实现本申请实施例提供的数据备份方法。该计算机程序产品可以存储该存储介质上。The embodiment of the present application also provides a computer program product including instructions, and the instructions included in the computer program product are used to realize the data backup method provided in the embodiment of the present application. The computer program product can be stored on the storage medium.
本申请实施例还提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,当所述芯片运行时用于实现本申请实施例提供的数据备份方法。The embodiment of the present application also provides a chip, the chip includes a programmable logic circuit and/or program instructions, which are used to implement the data backup method provided in the embodiment of the present application when the chip is running.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中该作为分离部件说 明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separated. A unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例该的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware, and of course it can also be realized by special hardware including application-specific integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions completed by computer programs can be easily realized by corresponding hardware, and the specific hardware structure used to realize the same function can also be varied, such as analog circuits, digital circuits or special-purpose circuit etc. However, for this application, software program implementation is a better implementation mode in most cases. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the method of each embodiment of the present application .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。该计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be passed from a website site, computer, training device, or data center Wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) transmission to another website site, computer, training device, or data center. The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.
Claims (28)
- 一种数据备份方法,其特征在于,所述方法应用于第一数据库集群,所述第一数据库集群包括第一数据节点,所述方法包括:A data backup method, characterized in that the method is applied to a first database cluster, the first database cluster includes a first data node, and the method includes:所述第一数据节点获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;The first data node acquires a first physical log, where the first physical log includes operation information on data in the first data node;所述第一数据节点将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,所述第二数据节点用于作为所述第一数据节点的备份节点。The first data node writes the first physical log into a first storage device, and the first storage device is used to transfer the first physical log to a second storage device, so that the first physical log in the second database cluster Two data nodes obtain the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second database cluster , the first database cluster and the second database cluster are different database clusters, and the second data node is used as a backup node for the first data node.
- 根据权利要求1所述的方法,其特征在于,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。The method according to claim 1, wherein the operation information indicates a modification operation, a write operation and/or a deletion operation on the data in the data node.
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, characterized in that the method further comprises:所述第一数据节点在将所述第一物理日志写入所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交。After the first data node writes the first physical log into the first storage device, according to the first physical log, transaction commit is performed on the data in the first data node.
- 根据权利要求1至3任一所述的方法,其特征在于,所述第一数据库集群还包括第三数据节点;所述第一数据节点将所述第一物理日志写入第一存储设备,包括:The method according to any one of claims 1 to 3, wherein the first database cluster further includes a third data node; the first data node writes the first physical log into the first storage device, include:在所述第三数据节点将第二物理日志写入所述第一存储设备时,所述第一数据节点并行将所述第一物理日志写入所述第一存储设备。When the third data node writes the second physical log into the first storage device, the first data node writes the first physical log into the first storage device in parallel.
- 根据权利要求4所述的方法,其特征在于,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间以及用于存储所述第三数据节点的物理日志的第二存储空间,所述第一存储空间和所述第二存储空间不同;The method according to claim 4, wherein the first storage device includes a first storage space for storing the physical log of the first data node and a physical log for storing the third data node The second storage space, the first storage space and the second storage space are different;所述在所述第三数据节点将第二物理日志写入所述第一存储设备时,所述第一数据节点并行将所述第一物理日志写入第一存储设备,包括:When the third data node writes the second physical log into the first storage device, the first data node writes the first physical log into the first storage device in parallel, including:在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。When the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- 根据权利要求1至5任一所述的方法,其特征在于,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间,所述将所述第一物理日志写入第一存储设备,包括:The method according to any one of claims 1 to 5, wherein the first storage device includes a first storage space for storing the physical log of the first data node, and the first physical The log is written into the first storage device, including:基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述第一存储空间中确定目标存储空间,所述目标存储空间存储有目标物理日志;Determining a target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log, the target storage space storing a target physical log;基于所述目标物理日志已被所述第二数据节点执行日志回放,将所述目标存储空间中 的目标物理日志替换为所述第一物理日志。Based on that the target physical log has been executed by the second data node to perform log playback, the target physical log in the target storage space is replaced with the first physical log.
- 根据权利要求6所述的方法,其特征在于,所述第一存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间;The method according to claim 6, wherein the storage address of the first storage space includes a head address and a tail address, and the storage order of the storage space is from the storage space corresponding to the head address to the The storage space corresponding to the above tail address;所述基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述第一存储空间中确定目标存储空间,包括:The determining the target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log includes:基于所述第一存储空间中所述尾部地址对应的存储空间被占用,从所述第一存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。Based on that the storage space corresponding to the tail address in the first storage space is occupied, determine from the first storage space that the storage space corresponding to the head address is the target storage space.
- 根据权利要求1至7任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:在所述第一数据节点根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交之后,所述第一数据节点将包含提交信息的第二物理日志写入所述第一存储设备,所述第一存储设备用于将所述第二物理日志传递至第二存储设备,以便第二数据库集群中的管理节点从所述第二存储设备中获取所述第二物理日志,所述提交信息指示所述第一数据节点已完成所述第一物理日志的事务提交。After the first data node commits the data in the first data node according to the first physical log, the first data node writes the second physical log containing commit information into the first physical log A storage device, the first storage device is used to transfer the second physical log to the second storage device, so that the management node in the second database cluster obtains the second physical log from the second storage device , the commit information indicates that the first data node has completed the transaction commit of the first physical log.
- 根据权利要求1至8任一所述的方法,其特征在于,所述第一数据库集群还包括第四数据节点,所述第四数据节点用于作为所述第一数据节点的备份节点,所述方法还包括:The method according to any one of claims 1 to 8, wherein the first database cluster further includes a fourth data node, and the fourth data node is used as a backup node for the first data node, so The method also includes:所述第四数据节点从所述第一存储设备中获取所述第一物理日志;The fourth data node obtains the first physical log from the first storage device;所述第四数据节点根据所述第一物理日志进行日志回放。The fourth data node performs log playback according to the first physical log.
- 一种数据备份方法,其特征在于,所述方法应用于第二数据库集群,所述第二数据库集群包括第二数据节点,所述第二数据节点用于作为第一数据节点的备份节点,所述第一数据节点属于第一数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,第一存储设备部署于所述第一数据库集群,第二存储设备部署于所述第二数据库集群,所述方法包括:A data backup method, characterized in that the method is applied to a second database cluster, the second database cluster includes a second data node, and the second data node is used as a backup node for the first data node, so The first data node belongs to the first database cluster, the first database cluster and the second database cluster are different database clusters, the first storage device is deployed in the first database cluster, and the second storage device is deployed in the The second database cluster, the method includes:所述第二数据节点从所述第二存储设备中获取第一物理日志,所述第一物理日志为来自所述第一数据节点并经由所述第一存储设备传递而存储在所述第二存储设备中的物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;The second data node obtains a first physical log from the second storage device, and the first physical log is stored in the second data node from the first data node and transferred via the first storage device. a physical log in the storage device, the first physical log includes operation information on the data in the first data node;所述第二数据节点根据所述第一物理日志进行日志回放。The second data node performs log playback according to the first physical log.
- 根据权利要求10所述的方法,其特征在于,所述操作信息指示对所述数据节点中数据的修改操作、写入操作和/或删除操作。The method according to claim 10, wherein the operation information indicates a modification operation, a write operation and/or a deletion operation on the data in the data node.
- 根据权利要求10或11所述的方法,其特征在于,所述第二数据库集群还包括第五数据节点;所述第二数据节点从所述第二存储设备中获取第一物理日志,包括:The method according to claim 10 or 11, wherein the second database cluster further includes a fifth data node; and the second data node obtains the first physical log from the second storage device, comprising:在所述第五数据节点从所述第二存储设备中获取物理日志时,所述第二数据节点并行 从所述第二存储设备中获取第一物理日志。When the fifth data node obtains the physical log from the second storage device, the second data node obtains the first physical log from the second storage device in parallel.
- 根据权利要求12所述的方法,其特征在于,所述第二存储设备包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同;The method according to claim 12, wherein the second storage device includes a third storage space for storing the physical log of the second data node and a physical log for storing the fifth data node The fourth storage space, the third storage space is different from the fourth storage space;所述在所述第五数据节点从所述第二存储设备中获取物理日志时,所述第二数据节点并行从所述第二存储设备中获取第一物理日志,包括:When the fifth data node obtains the physical log from the second storage device, the second data node obtains the first physical log from the second storage device in parallel, including:在所述第五数据节点从所述第四存储空间中获取物理日志时,所述第二数据节点并行从所述第三存储空间中获取第一物理日志。When the fifth data node obtains the physical log from the fourth storage space, the second data node obtains the first physical log from the third storage space in parallel.
- 根据权利要求10至13任一所述的方法,其特征在于,所述第一数据库集群包括所述第一数据节点在内的多个主数据节点,所述第二数据库集群包括所述第二数据节点在内的多个备数据节点,所述第二数据库集群还包括管理节点,所述方法还包括:The method according to any one of claims 10 to 13, wherein the first database cluster includes multiple primary data nodes including the first data node, and the second database cluster includes the second Multiple standby data nodes including data nodes, the second database cluster also includes a management node, and the method also includes:所述管理节点从所述第二存储设备中获取来自于所述第一数据库集群的提交信息,所述提交信息包括所述多个主数据节点中每个主数据节点最新完成事务提交的物理日志的日志序列号,所述目标序列号为所述多个所述日志序列号中最小的序列号,且所述多个备数据节点中的每个备数据节点已获取到所述目标序列号对应的物理日志;The management node obtains the commit information from the first database cluster from the second storage device, and the commit information includes the physical log of the latest transaction commit completed by each master data node among the plurality of master data nodes log sequence number, the target sequence number is the smallest sequence number among the plurality of log sequence numbers, and each standby data node in the plurality of standby data nodes has obtained the target sequence number corresponding to physical logs;所述第二数据节点根据所述第一物理日志进行日志回放,包括:The second data node performs log playback according to the first physical log, including:所述第二数据节点从所述管理节点中获取所述目标序列号;The second data node obtains the target serial number from the management node;所述第二数据节点在确定所述第一物理日志的日志序列号等于所述目标序列号之后,根据所述第一物理日志进行日志回放。After the second data node determines that the log sequence number of the first physical log is equal to the target sequence number, log playback is performed according to the first physical log.
- 根据权利要求14所述的方法,其特征在于,所述多个主数据节点中的不同数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交。The method according to claim 14, wherein the physical logs with the same serial number on different data nodes among the multiple primary data nodes correspond to the same task, and each primary data node in the multiple primary data nodes Nodes commit transactions to physical logs based on sequence numbers from small to large.
- 一种第一数据库集群,其特征在于,所述第一数据库集群包括第一数据节点,所述第一数据节点包括:A first database cluster, characterized in that the first database cluster includes a first data node, and the first data node includes:日志获取模块,用于获取第一物理日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;A log acquisition module, configured to acquire a first physical log, where the first physical log includes operation information on data in the first data node;日志传递模块,用于将所述第一物理日志写入第一存储设备,所述第一存储设备用于将所述第一物理日志传递至第二存储设备,以便第二数据库集群中的第二数据节点从所述第二存储设备中获取所述第一物理日志,其中,所述第一存储设备部署于所述第一数据库集群,所述第二存储设备部署于所述第二数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,所述第二数据节点用于作为所述第一数据节点的备份节点。A log transfer module, configured to write the first physical log into a first storage device, and the first storage device is configured to transfer the first physical log to a second storage device, so that the first physical log in the second database cluster Two data nodes obtain the first physical log from the second storage device, wherein the first storage device is deployed in the first database cluster, and the second storage device is deployed in the second database cluster , the first database cluster and the second database cluster are different database clusters, and the second data node is used as a backup node for the first data node.
- 根据权利要求16所述的第一数据库集群,其特征在于,所述第一数据节点还包括:The first database cluster according to claim 16, wherein the first data node further comprises:事务提交模块,用于在将所述第一物理日志传递至所述第一存储设备之后,根据所述第一物理日志,对所述第一数据节点中的数据进行事务提交。A transaction commit module, configured to perform transaction commit on the data in the first data node according to the first physical log after transferring the first physical log to the first storage device.
- 根据权利要求16或17所述的第一数据库集群,其特征在于,所述第一数据库集群还包括第三数据节点;所述日志传递模块,具体用于:The first database cluster according to claim 16 or 17, wherein the first database cluster further includes a third data node; the log transfer module is specifically used for:在所述第三数据节点将第二物理日志写入所述第一存储设备时,并行将所述第一物理日志写入第一存储设备。When the third data node writes the second physical log into the first storage device, write the first physical log into the first storage device in parallel.
- 根据权利要求18所述的第一数据库集群,其特征在于,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间以及用于存储所述第三数据节点的物理日志的第二存储空间,所述第一存储空间和所述第二存储空间不同;The first database cluster according to claim 18, wherein the first storage device includes a first storage space for storing the physical log of the first data node and a first storage space for storing the physical log of the third data node The second storage space of the physical log, the first storage space and the second storage space are different;所述日志传递模块,具体用于:The log transfer module is specifically used for:在所述第三数据节点将第二物理日志写入所述第二存储空间时,所述第一数据节点并行将所述第一物理日志写入所述第一存储空间。When the third data node writes the second physical log into the second storage space, the first data node writes the first physical log into the first storage space in parallel.
- 根据权利要求16至19任一所述的第一数据库集群,其特征在于,所述第一存储设备包括用于存储所述第一数据节点的物理日志的第一存储空间,所述日志传递模块,具体用于:The first database cluster according to any one of claims 16 to 19, wherein the first storage device includes a first storage space for storing the physical log of the first data node, and the log delivery module , specifically for:基于所述第一存储空间中可用的存储空间小于所述第一物理日志所需的存储空间,从所述第一存储空间中确定目标存储空间,所述目标存储空间存储有目标物理日志;Determining a target storage space from the first storage space based on the available storage space in the first storage space being less than the storage space required by the first physical log, the target storage space storing a target physical log;基于所述目标物理日志已被所述第二数据节点执行日志回放,将所述目标存储空间中的目标物理日志替换为所述第一物理日志。Based on the fact that the target physical log has been replayed by the second data node, the target physical log in the target storage space is replaced by the first physical log.
- 根据权利要求20所述的第一数据库集群,其特征在于,所述第一存储空间的存储地址包括头部地址和尾部地址,所述存储空间的存储顺序为从所述头部地址对应的存储空间到所述尾部地址对应的存储空间;The first database cluster according to claim 20, wherein the storage address of the first storage space includes a head address and a tail address, and the storage order of the storage space is from the storage address corresponding to the head address space to the storage space corresponding to the tail address;所述日志传递模块,具体用于:The log transfer module is specifically used for:基于所述第一存储空间中所述尾部地址对应的存储空间被占用,从所述第一存储空间中确定所述头部地址对应的存储空间为所述目标存储空间。Based on that the storage space corresponding to the tail address in the first storage space is occupied, determine from the first storage space that the storage space corresponding to the head address is the target storage space.
- 一种第二数据库集群,其特征在于,所述第二数据库集群包括第二数据节点,所述第二数据节点用于作为第一数据节点的备份节点,所述第一数据节点属于第一数据库集群,所述第一数据库集群和所述第二数据库集群为不同的数据库集群,第一存储设备部署于所述第一数据库集群,第二存储设备部署于所述第二数据库集群,所述第二数据节点,包括:A second database cluster, characterized in that the second database cluster includes a second data node, the second data node is used as a backup node for the first data node, and the first data node belongs to the first database cluster, the first database cluster and the second database cluster are different database clusters, the first storage device is deployed on the first database cluster, the second storage device is deployed on the second database cluster, and the first storage device is deployed on the second database cluster. Two data nodes, including:日志获取模块,用于从所述第二存储设备中获取第一物理日志,所述第一物理日志为来自所述第一数据节点并经由所述第一存储设备传递而存储在所述第二存储设备中的物理 日志,所述第一物理日志包括对所述第一数据节点中数据的操作信息;a log acquisition module, configured to acquire a first physical log from the second storage device, the first physical log is stored in the second a physical log in the storage device, the first physical log includes operation information on the data in the first data node;日志回放模块,用于根据所述第一物理日志进行日志回放。A log playback module, configured to perform log playback according to the first physical log.
- 根据权利要求22所述的第二数据库集群,其特征在于,所述第二数据库集群还包括第四数据节点;所述日志获取模块,具体用于:The second database cluster according to claim 22, wherein the second database cluster further includes a fourth data node; the log acquisition module is specifically used for:在所述第五数据节点从所述第二存储设备中获取物理日志时,所述第二数据节点并行从所述第二存储设备中获取第一物理日志。When the fifth data node obtains the physical log from the second storage device, the second data node obtains the first physical log from the second storage device in parallel.
- 根据权利要求23所述的第二数据库集群,其特征在于,所述第二存储设备包括用于存储所述第二数据节点的物理日志的第三存储空间以及用于存储所述第五数据节点的物理日志的第四存储空间,所述第三存储空间和所述第四存储空间不同;The second database cluster according to claim 23, wherein the second storage device includes a third storage space for storing the physical log of the second data node and a third storage space for storing the physical log of the fifth data node The fourth storage space of the physical log, the third storage space is different from the fourth storage space;所述日志获取模块,具体用于:The log acquisition module is specifically used for:在所述第五数据节点从所述第四存储空间中获取物理日志时,所述第二数据节点并行从所述第三存储空间中获取第一物理日志。When the fifth data node obtains the physical log from the fourth storage space, the second data node obtains the first physical log from the third storage space in parallel.
- 根据权利要求22至24任一所述的第二数据库集群,其特征在于,所述第一数据库集群包括所述第一数据节点在内的多个主数据节点,所述第二数据库集群包括所述第二数据节点在内的多个备数据节点,所述第二数据库集群还包括管理节点,所述管理节点包括:The second database cluster according to any one of claims 22 to 24, wherein the first database cluster includes multiple master data nodes including the first data node, and the second database cluster includes the Multiple standby data nodes including the second data node, the second database cluster also includes a management node, and the management node includes:提交信息获取模块,用于从所述第二存储设备中获取来自于所述第一数据库集群的提交信息,所述提交信息包括所述多个主数据节点中每个主数据节点最新完成事务提交的物理日志的日志序列号,所述目标序列号为所述多个所述日志序列号中最小的序列号,且所述多个备数据节点中的每个备数据节点已获取到所述目标序列号对应的物理日志;A commit information acquisition module, configured to acquire commit information from the first database cluster from the second storage device, where the commit information includes the latest transaction commit completed by each master data node among the plurality of master data nodes The log sequence number of the physical log, the target sequence number is the smallest sequence number among the multiple log sequence numbers, and each standby data node in the multiple standby data nodes has obtained the target The physical log corresponding to the serial number;所述日志回放模块,具体用于所述第二数据节点从所述管理节点中获取所述目标序列号;The log playback module is specifically used for the second data node to obtain the target serial number from the management node;在确定所述第一物理日志的日志序列号等于所述目标序列号之后,根据所述第一物理日志进行日志回放。After determining that the log sequence number of the first physical log is equal to the target sequence number, perform log playback according to the first physical log.
- 根据权利要求25所述的第二数据库集群,其特征在于,所述多个主数据节点中的不同主数据节点上相同序列号的物理日志对应于同一个任务,且所述多个主数据节点中各个主数据节点基于序列号由小到大的顺序对物理日志进行事务提交。The second database cluster according to claim 25, wherein the physical logs with the same serial number on different primary data nodes among the multiple primary data nodes correspond to the same task, and the multiple primary data nodes Each primary data node commits transactions to the physical log based on the order of sequence numbers from small to large.
- 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机设备上运行时,使得所述计算机设备执行权利要求1至15任一项所述的方法。A computer-readable storage medium, characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computer device, the computer device executes the method according to any one of claims 1 to 15 .
- 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机设备上运行时,使得所述计算机设备执行如权利要求1至15任一所述的方法。A computer program product, characterized by comprising computer-readable instructions, which, when the computer-readable instructions are run on a computer device, cause the computer device to execute the method according to any one of claims 1 to 15.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111117550.8A CN115858236A (en) | 2021-09-23 | 2021-09-23 | Data backup method and database cluster |
CN202111117550.8 | 2021-09-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023046042A1 true WO2023046042A1 (en) | 2023-03-30 |
Family
ID=85652386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/120709 WO2023046042A1 (en) | 2021-09-23 | 2022-09-23 | Data backup method and database cluster |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115858236A (en) |
WO (1) | WO2023046042A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955015A (en) * | 2023-09-19 | 2023-10-27 | 恒生电子股份有限公司 | Data backup system and method based on data storage service |
CN117033087A (en) * | 2023-10-10 | 2023-11-10 | 武汉吧哒科技股份有限公司 | Data processing method, device, storage medium and management server |
CN117171266A (en) * | 2023-08-28 | 2023-12-05 | 北京逐风科技有限公司 | Data synchronization method, device, equipment and storage medium |
CN117667515A (en) * | 2023-12-08 | 2024-03-08 | 广州鼎甲计算机科技有限公司 | Backup management method and device for main and standby clusters, computer equipment and storage medium |
CN117857568A (en) * | 2023-12-25 | 2024-04-09 | 慧之安信息技术股份有限公司 | Edge equipment capacity-increasing configuration method and system based on cloud edge cooperation |
CN118410115A (en) * | 2024-07-03 | 2024-07-30 | 上海联鼎软件股份有限公司 | Automatic double-activity disaster recovery method and device for ORACL database and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116760693B (en) * | 2023-08-18 | 2023-10-27 | 天津南大通用数据技术股份有限公司 | Method and system for switching main and standby nodes of database |
CN117194566B (en) * | 2023-08-21 | 2024-04-19 | 泽拓科技(深圳)有限责任公司 | Multi-storage engine data copying method, system and computer equipment |
CN118484345B (en) * | 2024-07-15 | 2024-09-20 | 浪潮云信息技术股份公司 | Distributed file system hot backup method based on linux operating system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570007A (en) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and equipment for data synchronization of distributed caching system |
CN110377577A (en) * | 2018-04-11 | 2019-10-25 | 北京嘀嘀无限科技发展有限公司 | Method of data synchronization, device, system and computer readable storage medium |
US10936545B1 (en) * | 2013-12-20 | 2021-03-02 | EMC IP Holding Company LLC | Automatic detection and backup of primary database instance in database cluster |
CN112905390A (en) * | 2021-03-31 | 2021-06-04 | 恒生电子股份有限公司 | Log data backup method, device, equipment and storage medium |
-
2021
- 2021-09-23 CN CN202111117550.8A patent/CN115858236A/en active Pending
-
2022
- 2022-09-23 WO PCT/CN2022/120709 patent/WO2023046042A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10936545B1 (en) * | 2013-12-20 | 2021-03-02 | EMC IP Holding Company LLC | Automatic detection and backup of primary database instance in database cluster |
CN106570007A (en) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and equipment for data synchronization of distributed caching system |
CN110377577A (en) * | 2018-04-11 | 2019-10-25 | 北京嘀嘀无限科技发展有限公司 | Method of data synchronization, device, system and computer readable storage medium |
CN112905390A (en) * | 2021-03-31 | 2021-06-04 | 恒生电子股份有限公司 | Log data backup method, device, equipment and storage medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117171266A (en) * | 2023-08-28 | 2023-12-05 | 北京逐风科技有限公司 | Data synchronization method, device, equipment and storage medium |
CN117171266B (en) * | 2023-08-28 | 2024-05-14 | 北京逐风科技有限公司 | Data synchronization method, device, equipment and storage medium |
CN116955015A (en) * | 2023-09-19 | 2023-10-27 | 恒生电子股份有限公司 | Data backup system and method based on data storage service |
CN116955015B (en) * | 2023-09-19 | 2024-01-23 | 恒生电子股份有限公司 | Data backup system and method based on data storage service |
CN117033087A (en) * | 2023-10-10 | 2023-11-10 | 武汉吧哒科技股份有限公司 | Data processing method, device, storage medium and management server |
CN117033087B (en) * | 2023-10-10 | 2024-01-19 | 武汉吧哒科技股份有限公司 | Data processing method, device, storage medium and management server |
CN117667515A (en) * | 2023-12-08 | 2024-03-08 | 广州鼎甲计算机科技有限公司 | Backup management method and device for main and standby clusters, computer equipment and storage medium |
CN117857568A (en) * | 2023-12-25 | 2024-04-09 | 慧之安信息技术股份有限公司 | Edge equipment capacity-increasing configuration method and system based on cloud edge cooperation |
CN118410115A (en) * | 2024-07-03 | 2024-07-30 | 上海联鼎软件股份有限公司 | Automatic double-activity disaster recovery method and device for ORACL database and storage medium |
CN118410115B (en) * | 2024-07-03 | 2024-09-06 | 上海联鼎软件股份有限公司 | Automatic double-activity disaster recovery method and device for ORACLE database and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115858236A (en) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023046042A1 (en) | Data backup method and database cluster | |
US11836155B2 (en) | File system operation handling during cutover and steady state | |
US11734306B2 (en) | Data replication method and storage system | |
US12073091B2 (en) | Low overhead resynchronization snapshot creation and utilization | |
US10929428B1 (en) | Adaptive database replication for database copies | |
US7299378B2 (en) | Geographically distributed clusters | |
US10452680B1 (en) | Catch-up replication with log peer | |
JP4461147B2 (en) | Cluster database using remote data mirroring | |
US11461192B1 (en) | Automatic recovery from detected data errors in database systems | |
US20240061603A1 (en) | Co-located Journaling and Data Storage for Write Requests | |
WO2024051027A1 (en) | Data configuration method and system for big data | |
US11461018B2 (en) | Direct snapshot to external storage | |
US11681592B2 (en) | Snapshots with selective suspending of writes | |
US20230252045A1 (en) | Life cycle management for standby databases | |
US11265374B2 (en) | Cloud disaster recovery | |
CN117992467A (en) | Data processing system, method and device and related equipment | |
CN117931831A (en) | Data processing system, data processing method, data processing device and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22872086 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22872086 Country of ref document: EP Kind code of ref document: A1 |