CN115599747B - Metadata synchronization method, system and equipment of distributed storage system - Google Patents
Metadata synchronization method, system and equipment of distributed storage system Download PDFInfo
- Publication number
- CN115599747B CN115599747B CN202210432189.6A CN202210432189A CN115599747B CN 115599747 B CN115599747 B CN 115599747B CN 202210432189 A CN202210432189 A CN 202210432189A CN 115599747 B CN115599747 B CN 115599747B
- Authority
- CN
- China
- Prior art keywords
- metadata
- change operation
- node
- metadata service
- operation log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000008859 change Effects 0.000 claims abstract description 112
- 230000001360 synchronised effect Effects 0.000 claims abstract description 19
- 238000004891 communication Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005192 partition Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000005538 encapsulation Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 16
- 230000007246 mechanism Effects 0.000 description 7
- 239000012634 fragment Substances 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000004064 recycling Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a metadata synchronization method, a system and equipment of a distributed storage system, wherein a metadata service master node and a metadata service slave node are determined through a consensus protocol cluster; when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing a metadata service main node; sequentially writing the change operation log into segments of the consensus protocol cluster; after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node; when new segments are created or separated by preset time periods in the consensus protocol cluster, the change operation log and the corresponding metadata thereof are synchronized into a local storage engine of the metadata service slave node according to preset synchronization rules. The metadata service can directly read metadata from the local storage engine, does not need to go through a network calling and consensus process, reduces delay and improves synchronization efficiency.
Description
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a metadata synchronization method, system, and device for a distributed storage system.
Background
The distributed storage system connects a plurality of independent servers together through a network to form a distributed cluster, and storage resources such as mechanical disks and solid state disks of all servers in the cluster form a resource pool to perform unified management and external service. Distributed storage systems typically allocate virtual volumes from a pool of storage resources, iSCSI LUNs, files, etc., where storage objects are provided to storage consumers, and the data capacity of a virtual volume or file may be greater than the total storage capacity on a single server. For example, one virtual volume 64TB, while the physical disk capacity on a single server within a cluster is only 32TB. In order to support virtual volumes with data volumes exceeding the storage capacity of a single server, a distributed storage system may sub-divide storage objects such as virtual volumes or files into data fragments of finer granularity, e.g., divide a 64TB volume into multiple fixed small-sized data fragments such as 256MB, 4MB, or 1M, and place the data fragments into multiple servers within a cluster, so that a storage object can utilize the storage resources of the multiple servers. For data security and improved read performance, distributed storage systems typically also make data redundancy based on data slicing, typically using replica techniques or erasure coding techniques. Taking the copy technology as an example, assuming that the number of copies is 3, the distributed storage system allocates a larger storage object from the uniform resource pool, divides the storage object into a plurality of data fragments with finer granularity, and allocates 3 copies of each data fragment to 3 different servers in the cluster according to a certain strategy. For normal data reading and writing of data objects such as volumes or files, it is known which data slices of the data object the required data are on, and which copies of the corresponding data slices are on which servers, respectively. The information of such positioning data is important metadata of the distributed storage system. In addition, metadata information of the distributed storage system further includes files, directory attributes, data node information constituting clusters, and the like.
Metadata is critical to a distributed storage system, and if the metadata is lost, service data of the distributed storage system cannot be accessed, the service of a user is greatly influenced. Such metadata is also typically persisted in clusters (multiple copies, etc.). In addition, the consistency requirement of the distributed storage system on the metadata is very strict, and data inconsistency cannot be tolerated, so that the metadata stored by each server in the cluster needs to be ensured to be strongly consistent when the metadata is updated.
In order to ensure the consistency of multiple copies of metadata, a common mode is to realize copy synchronization based on a Paxos/Raft distributed consistency algorithm, the whole metadata access is completed by adopting a distributed consensus mechanism (for example, etcd, zookeeper or cassandra is used for hosting all metadata, and the metadata is directly read and written through the consensus), in the mode, the data access needs to go through a consensus process, only a master node can be written when the metadata is updated, and the metadata can be updated successfully only after being updated to most slave nodes, and the metadata is also required to be provided by a Leader of a Raft module when being read.
The main problem of the metadata synchronization mechanism directly realized based on the distributed consistency protocol is that the query of the metadata is relatively costly. The consensus protocol cluster generally only provides single-scale object query with Key-value granularity, each object query is a relatively independent action, and needs to undergo a consensus confirmation process, so that more data results are required to be acquired at a relatively high cost and then split for the second time when range query or more complex conditional query containing data semantics is performed. Wherein the query of each small object needs to undergo a consensus process according to different specific consensus algorithms, and the time consumption is high. Whereas in a distributed storage system, metadata read requests are typically much more frequent than write requests. The performance of metadata read requests is therefore critical to the performance of distributed storage.
Disclosure of Invention
An objective of the embodiments of the present application is to provide a metadata synchronization method, system and device for a distributed storage system, so as to solve the problems of low metadata synchronization efficiency and low metadata read request performance at present. The specific technical scheme is as follows:
in a first aspect, there is provided a metadata synchronization method for a distributed storage system, the method comprising:
determining a metadata service master node and a metadata service slave node through a consensus protocol cluster;
when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing the metadata service main node;
writing the change operation log into segments of the consensus protocol cluster in sequence;
after the writing is successful, the change operation log and the corresponding metadata thereof are updated to a local storage engine of the metadata master node;
and when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule.
Optionally, the determining the metadata service master node and the metadata service slave node through the consensus protocol cluster includes:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
Optionally, the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
and determining the metadata service node corresponding to the node currently ranked first as a new metadata service master node.
Optionally, the preset synchronization rule is:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
Optionally, after synchronizing the change operation log and the metadata corresponding to the change operation log into the local storage engine of the metadata service slave node according to a preset synchronization rule, the method further includes:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
Optionally, the metadata service slave node corresponding to the node arranged at the top can execute the change operation log recycling operation at fixed time.
Optionally, the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
In a second aspect, the present application provides a metadata synchronization system for a distributed storage system, the system comprising:
the determining unit is used for determining a metadata service master node and a metadata service slave node through the consensus protocol cluster;
the encapsulation unit is used for encapsulating the metadata change operation into a change operation log by utilizing the metadata service main node when metadata change occurs;
the writing unit is used for writing the change operation log into segments of the consensus protocol cluster in sequence;
the updating unit is used for updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node after the writing is successful;
and the synchronization unit is used for synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset time length in the common-knowledge protocol cluster.
In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the first aspects when executing a program stored on a memory.
In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the method steps of any of the first aspects.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the metadata synchronization method of a distributed storage system as described in any one of the above.
The beneficial effects of the embodiment of the application are that:
the embodiment of the application provides a metadata synchronization method, a metadata synchronization system and metadata synchronization equipment of a distributed storage system, wherein a metadata service master node and a metadata service slave node are determined through a consensus protocol cluster; when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing a metadata service main node; sequentially writing the change operation log into segments of the consensus protocol cluster; after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node; when new segments are created or separated by preset time periods in the consensus protocol cluster, the change operation log and the corresponding metadata thereof are synchronized into a local storage engine of the metadata service slave node according to preset synchronization rules. The method and the system do not directly store metadata in the common protocol cluster, only select a master node and synchronize metadata change operation logs by means of the common protocol cluster, and finally store the metadata in a local storage engine. The metadata service can directly read metadata from the local storage engine, network calling and consensus processes are not needed, delay is reduced, the local metadata is processed through the local storage engine in each service node, the data states of other nodes are not needed to be considered, and various caching mechanisms and data organization modes can be adopted according to requirements to further improve performance. In addition, the strong consistency of metadata is ensured by means of the existing mechanism inside the consensus protocol cluster. All metadata does not need to be loaded into the memory, so that the system resource consumption is reduced, and the system can process a larger amount of metadata.
Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flowchart of a metadata synchronization method for a distributed storage system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a metadata service cluster according to an embodiment of the present application;
fig. 3 is a schematic diagram of a master node election process according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of metadata service slave node synchronization according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating metadata full synchronization according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a metadata synchronization system of a distributed storage system according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the present application provides a metadata synchronization method of a distributed storage system, and in the following, a detailed description will be given of the metadata synchronization method of the distributed storage system provided in the embodiment of the present application, as shown in fig. 1, with specific steps as follows:
step S101: the metadata service master node and the metadata service slave node are determined by a consensus protocol cluster.
In this step, the consensus protocol cluster may be zookeeper, etcd, etc.
As shown in fig. 2, the number of the metadata service master nodes is only 1, the number of the metadata service slave nodes can be a plurality of, and the common protocol clusters, the metadata service master nodes and the metadata service slave nodes jointly form a metadata service cluster of the distributed storage system. Each metadata service node is internally provided with a local storage engine, such as MySQL, levelDB, etc., in which metadata is stored. And synchronizing the metadata change operation log by means of the consensus protocol cluster. The read and conditional query performance of these local storage engines on data is typically much higher than a consensus-based cluster.
In the embodiment of the application, the consensus protocol cluster bears two functions, one is to provide election service for the metadata service cluster based on a consensus algorithm, and determine a unique metadata service master node and a plurality of metadata service slave nodes; the other is to maintain a log of the change operations of the metadata through the consensus protocol cluster. In particular in the form of Key-value.
The read-write request of the metadata can only be completed by the metadata service master node. And when the metadata changes, the metadata service master node firstly writes a change operation log into the common protocol cluster, and updates the change operation log in the local storage engine after confirming that the writing is successful. At the time of reading, because the local storage engine already has complete metadata. The consensus protocol process may be skipped directly and the metadata read from the local storage engine. Compared with the existing scheme of directly using the common protocol cluster to store and manage metadata, the metadata is not stored in the common protocol cluster, the master node election and the metadata change operation log are only carried out by means of the common protocol cluster, and the metadata finally exists in a local storage engine. The metadata service can read metadata from the local storage engine without network calling and consensus process, so that delay is reduced, and meanwhile, the local storage engine of each node does not need to consider the data state of other nodes when processing the local metadata, and can adopt various caching mechanisms and data organization modes as required, so that the reading performance is further improved.
Optionally, the determining the metadata service master node and the metadata service slave node through the consensus protocol cluster includes:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
Optionally, the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
other metadata service slave nodes also receive the message notification, check whether the node list is arranged at the top, and determine the metadata service node corresponding to the node arranged at the top currently as a new metadata service master node. After the metadata service master node generates network partition, the metadata service master node can lose the identity of the master node because the metadata service master node cannot be normally connected with the consensus protocol cluster, so that two metadata service master nodes can be avoided when the network partition occurs.
In another embodiment, when the metadata service slave node fails or the network partition occurs, the node representing the node on the consensus protocol cluster is automatically deleted, and other metadata service nodes can also receive message notification, but have no obvious influence on other metadata service nodes.
In addition, even if the metadata service master node or the metadata service slave node does not have faults and network partitions, each metadata service node always monitors the catalogs in the common protocol cluster, and each time the node in the catalogs changes, the metadata service receives the notification of the common protocol cluster, then checks whether the node representing the metadata service is arranged first, if so, the metadata service node is updated into the master node and provides services for the outside, and if not, the metadata service node is used as the slave node to continuously synchronize metadata change of the master node.
As shown in fig. 3, a specific master node election process is provided, and the steps are as follows:
step S301: and (3) starting a node:
step S302: creating a node in the consensus protocol cluster;
step S303: judging whether the node number is 0, if so, executing step S304, otherwise, executing step S305:
step S304: becomes a metadata service master node;
step S305: becomes a metadata service slave node;
step S306: receiving a member change notification sent by the consensus protocol cluster;
step S307: whether the node number of the host is still 0 is determined, and if yes, the process returns to step S304, and if no, the process returns to step S305.
Step S102: when metadata change occurs, the metadata change operation is packaged into a change operation log by the metadata service master node.
The metadata service master node is an interface for providing metadata change to the outside, the operation of the metadata change is only initiated from the metadata service master node, the metadata service master node locally maintains two integer values of commit_op_seq and replay_op_seq, and the metadata service slave node locally maintains the integer value of replay_op_seq. The commit_op_seq represents the latest change operation log sequence number that the metadata service master node has stored into the consensus protocol cluster, and the replay_op_seq represents the latest change operation log sequence number that each metadata service slave node has synchronously applied to the local storage. When the metadata service master node starts, the sequence number replay_op_seq of the latest log synchronized locally is obtained from the local storage engine of the master node, and the latest log sequence number commit_op_seq successfully written into the consensus protocol cluster is set as the replay_op_seq. Before starting the external start service, the metadata service master node synchronizes all the latest change operation logs from the consensus protocol cluster and updates the replay_op_seq and the commit_op_seq.
Step S103: and writing the change operation log into segments of the consensus protocol cluster in sequence.
The metadata change operation log is stored in a fixed directory of the consensus protocol cluster, referred to as a data directory. The change operation logs in the data catalog are segmented according to segments, each segment stores a plurality of logs at most, each change operation log has a serial number, and the change operation logs are arranged according to the writing sequence.
Step S104: and after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node.
Step S105: and when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule.
When the metadata service slave node starts, the sequence number replay_op_seq of the latest log synchronized locally is obtained from the local storage engine of the slave node, and when a new segment is created in the consensus protocol cluster, the metadata service slave node receives a notification and synchronizes the last segment from the consensus protocol cluster. The metadata service slave node may also synchronize changes to the oplog from the consensus protocol cluster at intervals of a few seconds.
In the embodiment of the application, metadata synchronization takes segments as a basic unit, and when one segment is not fully written, even if new metadata changes exist, the metadata service slave node does not synchronize the changes immediately. When a new segment is created, the metadata service slave node receives the event notification, and then synchronizes the metadata change operation log of the last segment from the consensus protocol cluster, and updates the local metadata accordingly. The metadata synchronization is performed at the segment granularity to avoid broadcast storm, because if each change operation log is synchronized once, it means that each update triggers reading and event notification of each service node in the metadata service cluster, which has a great influence on cluster performance.
Optionally, the preset synchronization rule is:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
As shown in fig. 4, a specific implementation procedure of metadata service slave node synchronization is provided, which includes the following steps:
step S401: the metadata service obtains the serial number replay_op_seq of the latest log applied locally from the local storage engine after the slave node is started;
step S402: judging whether a new segment is created or 10 seconds are required, if so, executing step S403, otherwise, repeatedly executing step S402;
step S403: pulling all segment information from the consensus protocol cluster;
step S404: ordering all segments according to the sequence number of the first log in the segments;
step S405: finding out the first segment not smaller than the replay_op_seq, and returning to the last segment when the first segment is not present;
step S406: judging whether the segment is the last one, and if the sequence number of the first log of the segment is larger than the replay_op_seq, executing step S407 if yes, and executing step S408 if no;
step S407: taking the last segment of the segment as a target segment;
step S408: taking the segment as a target segment;
step S409: starting from the target segment, the change operation logs of all segments are synchronized.
Since the common protocol cluster loads the metadata change operation log into the memory, and the memory space is limited, the storage requirement of a large amount of metadata cannot be met, and therefore the size of the metadata change log in the common protocol cluster needs to be limited. A common protocol cluster metadata reclamation mechanism is introduced and metadata changes in the common protocol cluster are deleted after they have been applied locally by the respective metadata service node. The metadata service slave node with the minimum node serial number in the metadata service cluster is responsible for clearing useless change operation logs in the consensus protocol cluster. After synchronizing the change operation log and the metadata corresponding to the change operation log into the local storage engine of the metadata service slave node according to a preset synchronization rule, the method further comprises:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
In the embodiment of the application, each metadata service node monitors the same data directory in the consensus protocol cluster, and is notified when the segment number changes. When the metadata service master node creates a new segment in the consensus protocol cluster data directory, other metadata service slave nodes can be notified of the consensus protocol cluster. The metadata service slave node corresponding to the node arranged at the first position can execute the operation of recovering the metadata log and delete all the segments in the consensus protocol cluster, which are already synchronized to the front of the local segments.
Optionally, the metadata service slave node further starts a timing task, and the metadata service slave node corresponding to the node arranged at the first position can perform the change operation log recycling operation at a timing.
Because the change operation log in the common protocol cluster is cleared, the new metadata service node may not synchronize to the complete metadata change operation log from the common protocol cluster after joining the metadata service cluster, and only the data of the existing metadata service node can be copied, so that the metadata full synchronization mechanism is introduced in the embodiment of the present application. Specifically, in the process of synchronizing the change operation log from the consensus protocol cluster, if the change operation log with the smallest sequence number in the consensus protocol cluster is found to be larger than the replay_op_seq+1 of the local storage engine, the complete full-quantity change operation log and the metadata cannot be synchronized from the consensus protocol cluster, and then the metadata and the change operation log need to be synchronized from other metadata service nodes in a full-quantity synchronization mode. In addition, if it is found that the individual change operation log data is damaged when the operation log is changed from the consensus protocol cluster synchronization, the data cannot be normally parsed into protobuf or cannot be normally written into the local storage engine, and full synchronization is also required. Metadata service nodes newly added into the cluster can synchronize metadata in full quantity from other metadata service nodes, and then incremental data change is carried out. Optionally, the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
As shown in fig. 5, a specific flow of metadata full synchronization is provided, and the steps are as follows:
step S501: obtaining addresses of all metadata service nodes from the consensus protocol cluster;
step S502: requesting a version number version of metadata of one of the unprocessed metadata service nodes;
step S503: judging whether version is equal to the latest version number known locally; if so, executing step S504; if not, returning to the step S502;
step S504: creating two temporary directory syncs and old;
step S505: pulling the full data from the metadata service node selected in the step S502 and placing the full data in a local sync catalog;
step S506: the local metadata catalogue is changed into backup, and the sync catalogue is changed into the name of the local metadata catalogue;
step S507: restarting the initialization local metadata service.
In a second aspect, based on the same technical concept, the present application provides a metadata synchronization system of a distributed storage system, as shown in fig. 6, the system including:
a determining unit 601, configured to determine a metadata service master node and a metadata service slave node through a consensus protocol cluster;
a packaging unit 602, configured to package, when a metadata change occurs, a metadata change operation into a change operation log by using the metadata service master node;
a writing unit 603, configured to sequentially write the change operation log into segments of the consensus protocol cluster;
an updating unit 604, configured to update the change operation log and the metadata corresponding to the change operation log to a local storage engine of the metadata master node after the writing is successful;
and the synchronization unit 605 is configured to synchronize the change operation log and the metadata corresponding to the change operation log to a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset duration in the common protocol cluster.
Based on the same technical concept, the embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete communication with each other through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the steps of the metadata synchronization method of the distributed storage system when executing the program stored in the memory 703.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the metadata synchronization method of any of the above-described distributed storage systems.
In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the metadata synchronization method of any of the distributed storage systems of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. A method for metadata synchronization for a distributed storage system, the method comprising:
determining a metadata service master node and a metadata service slave node through a consensus protocol cluster;
when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing the metadata service main node;
writing the change operation log into segments of the consensus protocol cluster in sequence;
after the writing is successful, the change operation log and the corresponding metadata thereof are updated to a local storage engine of the metadata master node;
when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule;
the preset synchronization rule is as follows:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
2. The method of claim 1, wherein the determining the metadata service master node and the metadata service slave node via the consensus protocol cluster comprises:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
3. The method according to claim 2, wherein the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
and determining the metadata service node corresponding to the node currently ranked first as a new metadata service master node.
4. The method of claim 1, wherein after synchronizing the change operation log and its corresponding metadata into the metadata service slave node's local storage engine according to a preset synchronization rule, the method further comprises:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
5. The method of claim 4, wherein the first node-oriented metadata service slave node periodically performs a change operation log reclamation operation.
6. The method according to claim 4, wherein the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
7. A metadata synchronization system for a distributed storage system, the system comprising:
the determining unit is used for determining a metadata service master node and a metadata service slave node through the consensus protocol cluster;
the encapsulation unit is used for encapsulating the metadata change operation into a change operation log by utilizing the metadata service main node when metadata change occurs;
the writing unit is used for writing the change operation log into segments of the consensus protocol cluster in sequence;
the updating unit is used for updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node after the writing is successful;
the synchronization unit is used for synchronizing the change operation log and the corresponding metadata thereof to a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset time length in the common-knowledge protocol cluster;
the preset synchronization rule is as follows:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
8. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210432189.6A CN115599747B (en) | 2022-04-22 | 2022-04-22 | Metadata synchronization method, system and equipment of distributed storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210432189.6A CN115599747B (en) | 2022-04-22 | 2022-04-22 | Metadata synchronization method, system and equipment of distributed storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115599747A CN115599747A (en) | 2023-01-13 |
CN115599747B true CN115599747B (en) | 2023-06-06 |
Family
ID=84842075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210432189.6A Active CN115599747B (en) | 2022-04-22 | 2022-04-22 | Metadata synchronization method, system and equipment of distributed storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115599747B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115794499B (en) * | 2023-02-03 | 2023-05-16 | 创云融达信息技术(天津)股份有限公司 | Method and system for dual-activity replication data among distributed block storage clusters |
CN116561221B (en) * | 2023-04-21 | 2024-03-19 | 清华大学 | Method for supporting distributed time sequence database copy consensus protocol of Internet of things scene |
CN116302140B (en) * | 2023-05-11 | 2023-09-22 | 京东科技信息技术有限公司 | Method and device for starting computing terminal based on storage and calculation separation cloud primary number bin |
CN116633946B (en) * | 2023-05-29 | 2023-11-21 | 广州经传多赢投资咨询有限公司 | Cluster state synchronous processing method and system based on distributed protocol |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015192661A1 (en) * | 2014-06-19 | 2015-12-23 | 中兴通讯股份有限公司 | Method, device, and system for data synchronization in distributed storage system |
CN111949633A (en) * | 2020-08-03 | 2020-11-17 | 杭州电子科技大学 | ICT system operation log analysis method based on parallel stream processing |
WO2021051581A1 (en) * | 2019-09-17 | 2021-03-25 | 平安科技(深圳)有限公司 | Server cluster file synchronization method and apparatus, electronic device, and storage medium |
WO2021226905A1 (en) * | 2020-05-14 | 2021-11-18 | 深圳市欢太科技有限公司 | Data storage method and system, and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107426265A (en) * | 2016-03-11 | 2017-12-01 | 阿里巴巴集团控股有限公司 | The synchronous method and apparatus of data consistency |
CN108280080B (en) * | 2017-01-06 | 2022-02-22 | 阿里巴巴集团控股有限公司 | Data synchronization method and device and electronic equipment |
CN108322533B (en) * | 2018-01-31 | 2019-02-19 | 广州鼎甲计算机科技有限公司 | Configuration and synchronization method between distributed type assemblies node based on operation log |
CN111858097A (en) * | 2020-07-22 | 2020-10-30 | 安徽华典大数据科技有限公司 | Distributed database system and database access method |
-
2022
- 2022-04-22 CN CN202210432189.6A patent/CN115599747B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015192661A1 (en) * | 2014-06-19 | 2015-12-23 | 中兴通讯股份有限公司 | Method, device, and system for data synchronization in distributed storage system |
WO2021051581A1 (en) * | 2019-09-17 | 2021-03-25 | 平安科技(深圳)有限公司 | Server cluster file synchronization method and apparatus, electronic device, and storage medium |
WO2021226905A1 (en) * | 2020-05-14 | 2021-11-18 | 深圳市欢太科技有限公司 | Data storage method and system, and storage medium |
CN111949633A (en) * | 2020-08-03 | 2020-11-17 | 杭州电子科技大学 | ICT system operation log analysis method based on parallel stream processing |
Also Published As
Publication number | Publication date |
---|---|
CN115599747A (en) | 2023-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115599747B (en) | Metadata synchronization method, system and equipment of distributed storage system | |
US10579364B2 (en) | Upgrading bundled applications in a distributed computing system | |
CN108509462B (en) | Method and device for synchronizing activity transaction table | |
EP3803618B1 (en) | Distributed transactions in cloud storage with hierarchical namespace | |
WO2019231689A1 (en) | Multi-protocol cloud storage for big data and analytics | |
US7783607B2 (en) | Decentralized record expiry | |
EP2416236B1 (en) | Data restore system and method | |
CN113268472B (en) | Distributed data storage system and method | |
US10628298B1 (en) | Resumable garbage collection | |
WO2017050064A1 (en) | Memory management method and device for shared memory database | |
CN112334891B (en) | Centralized storage for search servers | |
CN116400855A (en) | Data processing method and data storage system | |
US11429311B1 (en) | Method and system for managing requests in a distributed system | |
CN114297196A (en) | Metadata storage method and device, electronic equipment and storage medium | |
US9871863B2 (en) | Managing network attached storage | |
CN114281765A (en) | Metadata processing method and equipment in distributed file system | |
CN107102898B (en) | Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture | |
US10073874B1 (en) | Updating inverted indices | |
CN114785662B (en) | Storage management method, device, equipment and machine-readable storage medium | |
CN114780043A (en) | Data processing method and device based on multilayer cache and electronic equipment | |
CN115292394A (en) | Data processing method, data processing device, computer equipment and storage medium | |
CN113778975A (en) | Data processing method and device based on distributed database | |
CN112860788B (en) | Transaction processing method, device, computer system and readable storage medium | |
CN117255101B (en) | Data processing method, device, equipment and medium of distributed storage system | |
CN115604290B (en) | Kafka message execution method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086 Patentee after: Beijing Zhiling Haina Technology Co.,Ltd. Country or region after: China Address before: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086 Patentee before: Beijing zhilinghaina Technology Co.,Ltd. Country or region before: China |