Nothing Special   »   [go: up one dir, main page]

CN115599747B - Metadata synchronization method, system and equipment of distributed storage system - Google Patents

Metadata synchronization method, system and equipment of distributed storage system Download PDF

Info

Publication number
CN115599747B
CN115599747B CN202210432189.6A CN202210432189A CN115599747B CN 115599747 B CN115599747 B CN 115599747B CN 202210432189 A CN202210432189 A CN 202210432189A CN 115599747 B CN115599747 B CN 115599747B
Authority
CN
China
Prior art keywords
metadata
change operation
node
metadata service
operation log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210432189.6A
Other languages
Chinese (zh)
Other versions
CN115599747A (en
Inventor
罗杰彬
徐文豪
王弘毅
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiling Haina Technology Co ltd
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Priority to CN202210432189.6A priority Critical patent/CN115599747B/en
Publication of CN115599747A publication Critical patent/CN115599747A/en
Application granted granted Critical
Publication of CN115599747B publication Critical patent/CN115599747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a metadata synchronization method, a system and equipment of a distributed storage system, wherein a metadata service master node and a metadata service slave node are determined through a consensus protocol cluster; when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing a metadata service main node; sequentially writing the change operation log into segments of the consensus protocol cluster; after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node; when new segments are created or separated by preset time periods in the consensus protocol cluster, the change operation log and the corresponding metadata thereof are synchronized into a local storage engine of the metadata service slave node according to preset synchronization rules. The metadata service can directly read metadata from the local storage engine, does not need to go through a network calling and consensus process, reduces delay and improves synchronization efficiency.

Description

Metadata synchronization method, system and equipment of distributed storage system
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a metadata synchronization method, system, and device for a distributed storage system.
Background
The distributed storage system connects a plurality of independent servers together through a network to form a distributed cluster, and storage resources such as mechanical disks and solid state disks of all servers in the cluster form a resource pool to perform unified management and external service. Distributed storage systems typically allocate virtual volumes from a pool of storage resources, iSCSI LUNs, files, etc., where storage objects are provided to storage consumers, and the data capacity of a virtual volume or file may be greater than the total storage capacity on a single server. For example, one virtual volume 64TB, while the physical disk capacity on a single server within a cluster is only 32TB. In order to support virtual volumes with data volumes exceeding the storage capacity of a single server, a distributed storage system may sub-divide storage objects such as virtual volumes or files into data fragments of finer granularity, e.g., divide a 64TB volume into multiple fixed small-sized data fragments such as 256MB, 4MB, or 1M, and place the data fragments into multiple servers within a cluster, so that a storage object can utilize the storage resources of the multiple servers. For data security and improved read performance, distributed storage systems typically also make data redundancy based on data slicing, typically using replica techniques or erasure coding techniques. Taking the copy technology as an example, assuming that the number of copies is 3, the distributed storage system allocates a larger storage object from the uniform resource pool, divides the storage object into a plurality of data fragments with finer granularity, and allocates 3 copies of each data fragment to 3 different servers in the cluster according to a certain strategy. For normal data reading and writing of data objects such as volumes or files, it is known which data slices of the data object the required data are on, and which copies of the corresponding data slices are on which servers, respectively. The information of such positioning data is important metadata of the distributed storage system. In addition, metadata information of the distributed storage system further includes files, directory attributes, data node information constituting clusters, and the like.
Metadata is critical to a distributed storage system, and if the metadata is lost, service data of the distributed storage system cannot be accessed, the service of a user is greatly influenced. Such metadata is also typically persisted in clusters (multiple copies, etc.). In addition, the consistency requirement of the distributed storage system on the metadata is very strict, and data inconsistency cannot be tolerated, so that the metadata stored by each server in the cluster needs to be ensured to be strongly consistent when the metadata is updated.
In order to ensure the consistency of multiple copies of metadata, a common mode is to realize copy synchronization based on a Paxos/Raft distributed consistency algorithm, the whole metadata access is completed by adopting a distributed consensus mechanism (for example, etcd, zookeeper or cassandra is used for hosting all metadata, and the metadata is directly read and written through the consensus), in the mode, the data access needs to go through a consensus process, only a master node can be written when the metadata is updated, and the metadata can be updated successfully only after being updated to most slave nodes, and the metadata is also required to be provided by a Leader of a Raft module when being read.
The main problem of the metadata synchronization mechanism directly realized based on the distributed consistency protocol is that the query of the metadata is relatively costly. The consensus protocol cluster generally only provides single-scale object query with Key-value granularity, each object query is a relatively independent action, and needs to undergo a consensus confirmation process, so that more data results are required to be acquired at a relatively high cost and then split for the second time when range query or more complex conditional query containing data semantics is performed. Wherein the query of each small object needs to undergo a consensus process according to different specific consensus algorithms, and the time consumption is high. Whereas in a distributed storage system, metadata read requests are typically much more frequent than write requests. The performance of metadata read requests is therefore critical to the performance of distributed storage.
Disclosure of Invention
An objective of the embodiments of the present application is to provide a metadata synchronization method, system and device for a distributed storage system, so as to solve the problems of low metadata synchronization efficiency and low metadata read request performance at present. The specific technical scheme is as follows:
in a first aspect, there is provided a metadata synchronization method for a distributed storage system, the method comprising:
determining a metadata service master node and a metadata service slave node through a consensus protocol cluster;
when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing the metadata service main node;
writing the change operation log into segments of the consensus protocol cluster in sequence;
after the writing is successful, the change operation log and the corresponding metadata thereof are updated to a local storage engine of the metadata master node;
and when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule.
Optionally, the determining the metadata service master node and the metadata service slave node through the consensus protocol cluster includes:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
Optionally, the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
and determining the metadata service node corresponding to the node currently ranked first as a new metadata service master node.
Optionally, the preset synchronization rule is:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
Optionally, after synchronizing the change operation log and the metadata corresponding to the change operation log into the local storage engine of the metadata service slave node according to a preset synchronization rule, the method further includes:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
Optionally, the metadata service slave node corresponding to the node arranged at the top can execute the change operation log recycling operation at fixed time.
Optionally, the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
In a second aspect, the present application provides a metadata synchronization system for a distributed storage system, the system comprising:
the determining unit is used for determining a metadata service master node and a metadata service slave node through the consensus protocol cluster;
the encapsulation unit is used for encapsulating the metadata change operation into a change operation log by utilizing the metadata service main node when metadata change occurs;
the writing unit is used for writing the change operation log into segments of the consensus protocol cluster in sequence;
the updating unit is used for updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node after the writing is successful;
and the synchronization unit is used for synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset time length in the common-knowledge protocol cluster.
In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the first aspects when executing a program stored on a memory.
In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the method steps of any of the first aspects.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the metadata synchronization method of a distributed storage system as described in any one of the above.
The beneficial effects of the embodiment of the application are that:
the embodiment of the application provides a metadata synchronization method, a metadata synchronization system and metadata synchronization equipment of a distributed storage system, wherein a metadata service master node and a metadata service slave node are determined through a consensus protocol cluster; when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing a metadata service main node; sequentially writing the change operation log into segments of the consensus protocol cluster; after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node; when new segments are created or separated by preset time periods in the consensus protocol cluster, the change operation log and the corresponding metadata thereof are synchronized into a local storage engine of the metadata service slave node according to preset synchronization rules. The method and the system do not directly store metadata in the common protocol cluster, only select a master node and synchronize metadata change operation logs by means of the common protocol cluster, and finally store the metadata in a local storage engine. The metadata service can directly read metadata from the local storage engine, network calling and consensus processes are not needed, delay is reduced, the local metadata is processed through the local storage engine in each service node, the data states of other nodes are not needed to be considered, and various caching mechanisms and data organization modes can be adopted according to requirements to further improve performance. In addition, the strong consistency of metadata is ensured by means of the existing mechanism inside the consensus protocol cluster. All metadata does not need to be loaded into the memory, so that the system resource consumption is reduced, and the system can process a larger amount of metadata.
Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flowchart of a metadata synchronization method for a distributed storage system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a metadata service cluster according to an embodiment of the present application;
fig. 3 is a schematic diagram of a master node election process according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of metadata service slave node synchronization according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating metadata full synchronization according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a metadata synchronization system of a distributed storage system according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the present application provides a metadata synchronization method of a distributed storage system, and in the following, a detailed description will be given of the metadata synchronization method of the distributed storage system provided in the embodiment of the present application, as shown in fig. 1, with specific steps as follows:
step S101: the metadata service master node and the metadata service slave node are determined by a consensus protocol cluster.
In this step, the consensus protocol cluster may be zookeeper, etcd, etc.
As shown in fig. 2, the number of the metadata service master nodes is only 1, the number of the metadata service slave nodes can be a plurality of, and the common protocol clusters, the metadata service master nodes and the metadata service slave nodes jointly form a metadata service cluster of the distributed storage system. Each metadata service node is internally provided with a local storage engine, such as MySQL, levelDB, etc., in which metadata is stored. And synchronizing the metadata change operation log by means of the consensus protocol cluster. The read and conditional query performance of these local storage engines on data is typically much higher than a consensus-based cluster.
In the embodiment of the application, the consensus protocol cluster bears two functions, one is to provide election service for the metadata service cluster based on a consensus algorithm, and determine a unique metadata service master node and a plurality of metadata service slave nodes; the other is to maintain a log of the change operations of the metadata through the consensus protocol cluster. In particular in the form of Key-value.
The read-write request of the metadata can only be completed by the metadata service master node. And when the metadata changes, the metadata service master node firstly writes a change operation log into the common protocol cluster, and updates the change operation log in the local storage engine after confirming that the writing is successful. At the time of reading, because the local storage engine already has complete metadata. The consensus protocol process may be skipped directly and the metadata read from the local storage engine. Compared with the existing scheme of directly using the common protocol cluster to store and manage metadata, the metadata is not stored in the common protocol cluster, the master node election and the metadata change operation log are only carried out by means of the common protocol cluster, and the metadata finally exists in a local storage engine. The metadata service can read metadata from the local storage engine without network calling and consensus process, so that delay is reduced, and meanwhile, the local storage engine of each node does not need to consider the data state of other nodes when processing the local metadata, and can adopt various caching mechanisms and data organization modes as required, so that the reading performance is further improved.
Optionally, the determining the metadata service master node and the metadata service slave node through the consensus protocol cluster includes:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
Optionally, the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
other metadata service slave nodes also receive the message notification, check whether the node list is arranged at the top, and determine the metadata service node corresponding to the node arranged at the top currently as a new metadata service master node. After the metadata service master node generates network partition, the metadata service master node can lose the identity of the master node because the metadata service master node cannot be normally connected with the consensus protocol cluster, so that two metadata service master nodes can be avoided when the network partition occurs.
In another embodiment, when the metadata service slave node fails or the network partition occurs, the node representing the node on the consensus protocol cluster is automatically deleted, and other metadata service nodes can also receive message notification, but have no obvious influence on other metadata service nodes.
In addition, even if the metadata service master node or the metadata service slave node does not have faults and network partitions, each metadata service node always monitors the catalogs in the common protocol cluster, and each time the node in the catalogs changes, the metadata service receives the notification of the common protocol cluster, then checks whether the node representing the metadata service is arranged first, if so, the metadata service node is updated into the master node and provides services for the outside, and if not, the metadata service node is used as the slave node to continuously synchronize metadata change of the master node.
As shown in fig. 3, a specific master node election process is provided, and the steps are as follows:
step S301: and (3) starting a node:
step S302: creating a node in the consensus protocol cluster;
step S303: judging whether the node number is 0, if so, executing step S304, otherwise, executing step S305:
step S304: becomes a metadata service master node;
step S305: becomes a metadata service slave node;
step S306: receiving a member change notification sent by the consensus protocol cluster;
step S307: whether the node number of the host is still 0 is determined, and if yes, the process returns to step S304, and if no, the process returns to step S305.
Step S102: when metadata change occurs, the metadata change operation is packaged into a change operation log by the metadata service master node.
The metadata service master node is an interface for providing metadata change to the outside, the operation of the metadata change is only initiated from the metadata service master node, the metadata service master node locally maintains two integer values of commit_op_seq and replay_op_seq, and the metadata service slave node locally maintains the integer value of replay_op_seq. The commit_op_seq represents the latest change operation log sequence number that the metadata service master node has stored into the consensus protocol cluster, and the replay_op_seq represents the latest change operation log sequence number that each metadata service slave node has synchronously applied to the local storage. When the metadata service master node starts, the sequence number replay_op_seq of the latest log synchronized locally is obtained from the local storage engine of the master node, and the latest log sequence number commit_op_seq successfully written into the consensus protocol cluster is set as the replay_op_seq. Before starting the external start service, the metadata service master node synchronizes all the latest change operation logs from the consensus protocol cluster and updates the replay_op_seq and the commit_op_seq.
Step S103: and writing the change operation log into segments of the consensus protocol cluster in sequence.
The metadata change operation log is stored in a fixed directory of the consensus protocol cluster, referred to as a data directory. The change operation logs in the data catalog are segmented according to segments, each segment stores a plurality of logs at most, each change operation log has a serial number, and the change operation logs are arranged according to the writing sequence.
Step S104: and after the writing is successful, updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node.
Step S105: and when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule.
When the metadata service slave node starts, the sequence number replay_op_seq of the latest log synchronized locally is obtained from the local storage engine of the slave node, and when a new segment is created in the consensus protocol cluster, the metadata service slave node receives a notification and synchronizes the last segment from the consensus protocol cluster. The metadata service slave node may also synchronize changes to the oplog from the consensus protocol cluster at intervals of a few seconds.
In the embodiment of the application, metadata synchronization takes segments as a basic unit, and when one segment is not fully written, even if new metadata changes exist, the metadata service slave node does not synchronize the changes immediately. When a new segment is created, the metadata service slave node receives the event notification, and then synchronizes the metadata change operation log of the last segment from the consensus protocol cluster, and updates the local metadata accordingly. The metadata synchronization is performed at the segment granularity to avoid broadcast storm, because if each change operation log is synchronized once, it means that each update triggers reading and event notification of each service node in the metadata service cluster, which has a great influence on cluster performance.
Optionally, the preset synchronization rule is:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
As shown in fig. 4, a specific implementation procedure of metadata service slave node synchronization is provided, which includes the following steps:
step S401: the metadata service obtains the serial number replay_op_seq of the latest log applied locally from the local storage engine after the slave node is started;
step S402: judging whether a new segment is created or 10 seconds are required, if so, executing step S403, otherwise, repeatedly executing step S402;
step S403: pulling all segment information from the consensus protocol cluster;
step S404: ordering all segments according to the sequence number of the first log in the segments;
step S405: finding out the first segment not smaller than the replay_op_seq, and returning to the last segment when the first segment is not present;
step S406: judging whether the segment is the last one, and if the sequence number of the first log of the segment is larger than the replay_op_seq, executing step S407 if yes, and executing step S408 if no;
step S407: taking the last segment of the segment as a target segment;
step S408: taking the segment as a target segment;
step S409: starting from the target segment, the change operation logs of all segments are synchronized.
Since the common protocol cluster loads the metadata change operation log into the memory, and the memory space is limited, the storage requirement of a large amount of metadata cannot be met, and therefore the size of the metadata change log in the common protocol cluster needs to be limited. A common protocol cluster metadata reclamation mechanism is introduced and metadata changes in the common protocol cluster are deleted after they have been applied locally by the respective metadata service node. The metadata service slave node with the minimum node serial number in the metadata service cluster is responsible for clearing useless change operation logs in the consensus protocol cluster. After synchronizing the change operation log and the metadata corresponding to the change operation log into the local storage engine of the metadata service slave node according to a preset synchronization rule, the method further comprises:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
In the embodiment of the application, each metadata service node monitors the same data directory in the consensus protocol cluster, and is notified when the segment number changes. When the metadata service master node creates a new segment in the consensus protocol cluster data directory, other metadata service slave nodes can be notified of the consensus protocol cluster. The metadata service slave node corresponding to the node arranged at the first position can execute the operation of recovering the metadata log and delete all the segments in the consensus protocol cluster, which are already synchronized to the front of the local segments.
Optionally, the metadata service slave node further starts a timing task, and the metadata service slave node corresponding to the node arranged at the first position can perform the change operation log recycling operation at a timing.
Because the change operation log in the common protocol cluster is cleared, the new metadata service node may not synchronize to the complete metadata change operation log from the common protocol cluster after joining the metadata service cluster, and only the data of the existing metadata service node can be copied, so that the metadata full synchronization mechanism is introduced in the embodiment of the present application. Specifically, in the process of synchronizing the change operation log from the consensus protocol cluster, if the change operation log with the smallest sequence number in the consensus protocol cluster is found to be larger than the replay_op_seq+1 of the local storage engine, the complete full-quantity change operation log and the metadata cannot be synchronized from the consensus protocol cluster, and then the metadata and the change operation log need to be synchronized from other metadata service nodes in a full-quantity synchronization mode. In addition, if it is found that the individual change operation log data is damaged when the operation log is changed from the consensus protocol cluster synchronization, the data cannot be normally parsed into protobuf or cannot be normally written into the local storage engine, and full synchronization is also required. Metadata service nodes newly added into the cluster can synchronize metadata in full quantity from other metadata service nodes, and then incremental data change is carried out. Optionally, the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
As shown in fig. 5, a specific flow of metadata full synchronization is provided, and the steps are as follows:
step S501: obtaining addresses of all metadata service nodes from the consensus protocol cluster;
step S502: requesting a version number version of metadata of one of the unprocessed metadata service nodes;
step S503: judging whether version is equal to the latest version number known locally; if so, executing step S504; if not, returning to the step S502;
step S504: creating two temporary directory syncs and old;
step S505: pulling the full data from the metadata service node selected in the step S502 and placing the full data in a local sync catalog;
step S506: the local metadata catalogue is changed into backup, and the sync catalogue is changed into the name of the local metadata catalogue;
step S507: restarting the initialization local metadata service.
In a second aspect, based on the same technical concept, the present application provides a metadata synchronization system of a distributed storage system, as shown in fig. 6, the system including:
a determining unit 601, configured to determine a metadata service master node and a metadata service slave node through a consensus protocol cluster;
a packaging unit 602, configured to package, when a metadata change occurs, a metadata change operation into a change operation log by using the metadata service master node;
a writing unit 603, configured to sequentially write the change operation log into segments of the consensus protocol cluster;
an updating unit 604, configured to update the change operation log and the metadata corresponding to the change operation log to a local storage engine of the metadata master node after the writing is successful;
and the synchronization unit 605 is configured to synchronize the change operation log and the metadata corresponding to the change operation log to a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset duration in the common protocol cluster.
Based on the same technical concept, the embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete communication with each other through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the steps of the metadata synchronization method of the distributed storage system when executing the program stored in the memory 703.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the metadata synchronization method of any of the above-described distributed storage systems.
In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the metadata synchronization method of any of the distributed storage systems of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method for metadata synchronization for a distributed storage system, the method comprising:
determining a metadata service master node and a metadata service slave node through a consensus protocol cluster;
when metadata change occurs, the metadata change operation is packaged into a change operation log by utilizing the metadata service main node;
writing the change operation log into segments of the consensus protocol cluster in sequence;
after the writing is successful, the change operation log and the corresponding metadata thereof are updated to a local storage engine of the metadata master node;
when a new segment is created or separated by a preset time length in the consensus protocol cluster, synchronizing the change operation log and the corresponding metadata thereof into a local storage engine of the metadata service slave node according to a preset synchronization rule;
the preset synchronization rule is as follows:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
2. The method of claim 1, wherein the determining the metadata service master node and the metadata service slave node via the consensus protocol cluster comprises:
creating a node for each metadata service node in the same directory of the consensus protocol cluster, and sequencing according to the creation time;
and determining the metadata service nodes corresponding to the nodes arranged at the first position as metadata service master nodes, and determining the other nodes as metadata service slave nodes.
3. The method according to claim 2, wherein the method further comprises:
deleting a node representing the metadata service master node when the metadata service master node fails or a network partition occurs;
and determining the metadata service node corresponding to the node currently ranked first as a new metadata service master node.
4. The method of claim 1, wherein after synchronizing the change operation log and its corresponding metadata into the metadata service slave node's local storage engine according to a preset synchronization rule, the method further comprises:
and recovering and processing the change operation log synchronized to each metadata service slave node in the common protocol cluster through the metadata service slave node corresponding to the node arranged at the head.
5. The method of claim 4, wherein the first node-oriented metadata service slave node periodically performs a change operation log reclamation operation.
6. The method according to claim 4, wherein the method further comprises:
when a new metadata service node joins the consensus protocol cluster, the full amount of metadata is synchronized from the local storage engines of the other metadata service nodes.
7. A metadata synchronization system for a distributed storage system, the system comprising:
the determining unit is used for determining a metadata service master node and a metadata service slave node through the consensus protocol cluster;
the encapsulation unit is used for encapsulating the metadata change operation into a change operation log by utilizing the metadata service main node when metadata change occurs;
the writing unit is used for writing the change operation log into segments of the consensus protocol cluster in sequence;
the updating unit is used for updating the change operation log and the corresponding metadata thereof into a local storage engine of the metadata master node after the writing is successful;
the synchronization unit is used for synchronizing the change operation log and the corresponding metadata thereof to a local storage engine of the metadata service slave node according to a preset synchronization rule when a new segment is created or separated by a preset time length in the common-knowledge protocol cluster;
the preset synchronization rule is as follows:
acquiring the latest change operation log sequence number from a local storage engine of the slave node in the metadata service;
pulling all segment information from the consensus protocol cluster;
sorting all segments according to the sequence number of the first change operation log in the segments;
finding a first segment not smaller than the latest change operation log sequence number;
judging whether the segment is the last one or not, and judging that the serial number of the first log of the segment is larger than the serial number of the latest change operation log;
if yes, taking the last segment of the segments as a target segment;
if not, taking the segment as a target segment;
starting from the target segment, the change operation logs of all segments are synchronized.
8. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
CN202210432189.6A 2022-04-22 2022-04-22 Metadata synchronization method, system and equipment of distributed storage system Active CN115599747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432189.6A CN115599747B (en) 2022-04-22 2022-04-22 Metadata synchronization method, system and equipment of distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432189.6A CN115599747B (en) 2022-04-22 2022-04-22 Metadata synchronization method, system and equipment of distributed storage system

Publications (2)

Publication Number Publication Date
CN115599747A CN115599747A (en) 2023-01-13
CN115599747B true CN115599747B (en) 2023-06-06

Family

ID=84842075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432189.6A Active CN115599747B (en) 2022-04-22 2022-04-22 Metadata synchronization method, system and equipment of distributed storage system

Country Status (1)

Country Link
CN (1) CN115599747B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794499B (en) * 2023-02-03 2023-05-16 创云融达信息技术(天津)股份有限公司 Method and system for dual-activity replication data among distributed block storage clusters
CN116561221B (en) * 2023-04-21 2024-03-19 清华大学 Method for supporting distributed time sequence database copy consensus protocol of Internet of things scene
CN116302140B (en) * 2023-05-11 2023-09-22 京东科技信息技术有限公司 Method and device for starting computing terminal based on storage and calculation separation cloud primary number bin
CN116633946B (en) * 2023-05-29 2023-11-21 广州经传多赢投资咨询有限公司 Cluster state synchronous processing method and system based on distributed protocol

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015192661A1 (en) * 2014-06-19 2015-12-23 中兴通讯股份有限公司 Method, device, and system for data synchronization in distributed storage system
CN111949633A (en) * 2020-08-03 2020-11-17 杭州电子科技大学 ICT system operation log analysis method based on parallel stream processing
WO2021051581A1 (en) * 2019-09-17 2021-03-25 平安科技(深圳)有限公司 Server cluster file synchronization method and apparatus, electronic device, and storage medium
WO2021226905A1 (en) * 2020-05-14 2021-11-18 深圳市欢太科技有限公司 Data storage method and system, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107426265A (en) * 2016-03-11 2017-12-01 阿里巴巴集团控股有限公司 The synchronous method and apparatus of data consistency
CN108280080B (en) * 2017-01-06 2022-02-22 阿里巴巴集团控股有限公司 Data synchronization method and device and electronic equipment
CN108322533B (en) * 2018-01-31 2019-02-19 广州鼎甲计算机科技有限公司 Configuration and synchronization method between distributed type assemblies node based on operation log
CN111858097A (en) * 2020-07-22 2020-10-30 安徽华典大数据科技有限公司 Distributed database system and database access method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015192661A1 (en) * 2014-06-19 2015-12-23 中兴通讯股份有限公司 Method, device, and system for data synchronization in distributed storage system
WO2021051581A1 (en) * 2019-09-17 2021-03-25 平安科技(深圳)有限公司 Server cluster file synchronization method and apparatus, electronic device, and storage medium
WO2021226905A1 (en) * 2020-05-14 2021-11-18 深圳市欢太科技有限公司 Data storage method and system, and storage medium
CN111949633A (en) * 2020-08-03 2020-11-17 杭州电子科技大学 ICT system operation log analysis method based on parallel stream processing

Also Published As

Publication number Publication date
CN115599747A (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
US10579364B2 (en) Upgrading bundled applications in a distributed computing system
CN108509462B (en) Method and device for synchronizing activity transaction table
EP3803618B1 (en) Distributed transactions in cloud storage with hierarchical namespace
WO2019231689A1 (en) Multi-protocol cloud storage for big data and analytics
US7783607B2 (en) Decentralized record expiry
EP2416236B1 (en) Data restore system and method
CN113268472B (en) Distributed data storage system and method
US10628298B1 (en) Resumable garbage collection
WO2017050064A1 (en) Memory management method and device for shared memory database
CN112334891B (en) Centralized storage for search servers
CN116400855A (en) Data processing method and data storage system
US11429311B1 (en) Method and system for managing requests in a distributed system
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
US9871863B2 (en) Managing network attached storage
CN114281765A (en) Metadata processing method and equipment in distributed file system
CN107102898B (en) Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture
US10073874B1 (en) Updating inverted indices
CN114785662B (en) Storage management method, device, equipment and machine-readable storage medium
CN114780043A (en) Data processing method and device based on multilayer cache and electronic equipment
CN115292394A (en) Data processing method, data processing device, computer equipment and storage medium
CN113778975A (en) Data processing method and device based on distributed database
CN112860788B (en) Transaction processing method, device, computer system and readable storage medium
CN117255101B (en) Data processing method, device, equipment and medium of distributed storage system
CN115604290B (en) Kafka message execution method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086

Patentee after: Beijing Zhiling Haina Technology Co.,Ltd.

Country or region after: China

Address before: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086

Patentee before: Beijing zhilinghaina Technology Co.,Ltd.

Country or region before: China