CN113641467B - Distributed block storage implementation method of virtual machine - Google Patents
Distributed block storage implementation method of virtual machine Download PDFInfo
- Publication number
- CN113641467B CN113641467B CN202111213142.2A CN202111213142A CN113641467B CN 113641467 B CN113641467 B CN 113641467B CN 202111213142 A CN202111213142 A CN 202111213142A CN 113641467 B CN113641467 B CN 113641467B
- Authority
- CN
- China
- Prior art keywords
- node
- data storage
- fragment
- virtual disk
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003860 storage Methods 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 55
- 239000012634 fragment Substances 0.000 claims abstract description 153
- 238000013500 data storage Methods 0.000 claims abstract description 108
- 230000008569 process Effects 0.000 claims abstract description 24
- 238000009826 distribution Methods 0.000 claims description 18
- 239000003795 chemical substances by application Substances 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45562—Creating, deleting, cloning virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a distributed block storage implementation method of a virtual machine, which is applied to a distributed storage system comprising 1 main control node, a plurality of client nodes and data storage nodes, wherein the client nodes map virtual disks to the virtual machine and send requests for operating the virtual disks to the main control nodes, the main control nodes process and store the information of the virtual disks, the data storage nodes provide physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine comprises the following steps: the main control node receives a command of creating the virtual disk, selects a data storage node for creating the fragment based on a preset rule, and sends a corresponding data storage node address to the client node; the client node creates a shard at the data storage node based on the received address, the shard file being in Qcow2 format. The distributed block storage implementation method of the virtual machine, provided by the invention, has the advantages that a plurality of Qcow2 files are virtualized into a large virtual disk, the I/O balance is satisfied without hot spots, the performance is high, and the code is simple and easy to maintain.
Description
Technical Field
The invention relates to the field of distributed storage of virtual machines, in particular to a distributed block storage implementation method of a virtual machine.
Background
With the rapid development of cloud computing technology, Infrastructure as a Service (IaaS) is becoming more and more important as a base for cloud computing. Virtual machine services are the core of IaaS, and thus the status and requirements for providing storage services for virtual machines are increasing.
The virtual machines of the cloud operator need to store while supporting high reliability, scalability, and cheapness. Traditional virtual machine storage services can be divided into three major categories: open system Direct Attached Storage (DAS), Network Attached Storage (NAS), and Storage Area Network (SAN). However, the conventional storage is difficult to meet the storage requirement of the virtual machine in the IaaS scene. Firstly, they are difficult to expand infinitely and have insufficient reliability. The three kinds of storage are usually closed source technologies of various manufacturers, the manufacturing cost is high, and cloud operators cannot operate and maintain the storage by themselves.
There are currently some open source distributed block storage software, such as the commonly used sheetlog, which was developed by NTT laboratories in japan in 2009 to provide distributed block storage for virtual machines (QEMU/KVM virtual machines). The deployment is convenient, and the code is simple and easy to maintain. Sheepdog architecture As shown in FIG. 1, the I/O of a virtual machine is forwarded to a gateway process through a Qemu process, and then forwarded by the gateway to object manager processes on other nodes through the network.
However, the distributed storage technology using the open source has the following disadvantages, which is described by taking Sheepdog as an example:
1. the sheetlog adopts a consistent hashing algorithm, and the data stored in the blocks can be divided into small pieces to be stored on all nodes in a balanced manner by using the consistent hashing algorithm. However, it has several disadvantages:
1) the data distribution of the method cannot be completely controlled, for example, in order to improve reliability, data is usually stored in three parts, one part of the data is required to be stored in the SSD (in order to improve reading performance), and the other two parts of the data are stored in a mechanical hard disk, so that the sheetlog cannot be realized. 2) When the number of storage nodes is small, the hash algorithm is prone to data imbalance (the difference of the amount of data stored by each node is large). 3) When one node goes down, the load of the adjacent nodes is increased.
2. The large number of slices results in insufficient storage performance. One fragment is a small file at the back end of storage. To facilitate the snapshot function, the sheetlog uses 4 MB-sized fragments to combine into one large virtual disk. Thus a virtual disk will consist of a large number of fragments, and an 8TB hard disk will have two billion fragments if full. The multi-sharded operating system cannot keep all file handles open, so open, read \ write and close are required to be sequentially removed each time reading and writing are carried out. When in Open, the kernel needs to find the storage location of the fragment in the file system, and when in close, the operating system needs to flush the cache of the fragment (file) to the physical disk, so the performance of the method is low.
3. The large number of shards results in management complexity. When part of the hosts are abnormal, if the software cannot be automatically recovered, the manual recovery can hardly be realized when the software faces massive fragments.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a distributed block storage implementation method of a virtual machine, which virtualizes a plurality of Qcow2 files into a large virtual disk, and can simultaneously meet the requirements of read-write balance, no hot spot, high performance, simple code and easy maintenance.
The technical scheme adopted by the invention for overcoming the technical problems is as follows: a distributed block storage implementation method of a virtual machine is applied to a distributed storage system at least comprising 1 main control node, a plurality of client nodes and a plurality of data storage nodes, wherein the client nodes are used for mapping virtual disks to the virtual machine and sending requests for operating the virtual disks to the main control nodes, the main control nodes are used as central nodes for processing and storing information of the virtual disks, the data storage nodes are used for providing physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine at least comprises the following steps: after receiving the command of creating the virtual disk, the master control node selects a data storage node for creating the first fragment based on a preset rule and sends a corresponding data storage node address to the client node; the client node creates a shard at the data storage node based on the received data storage node address, wherein the shard file is in Qcow2 format.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the data address continuously written by the client node is not in the first fragment, the client node sends a command for creating a second fragment to the main control node; the master control node selects a data storage node according to a preset rule for storing the second fragment, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment; and the client node establishes connection with the corresponding data storage node based on the received data storage node address to which the second fragment belongs, sends a message for creating a new fragment to the data storage node, and then continuously writes data. The data storage node creates a corresponding second slice based on the received message and writes the data.
And if the client continuously writes the data, continuously creating the new fragment according to the steps.
Further, the virtual disk includes a plurality of groups, each group includes a plurality of fragments, and addresses of data blocks at corresponding positions of each fragment in each group are in staggered distribution.
Further, the number of fragments, the size of the fragments and the size of the data blocks of each group of the virtual disk
The configuration information of (2) is set in the first fragment of the virtual disk.
The number of fragments in a packet can be configured, the size of the fragments can be configured, and the size of the blocks can be configured. These configurations can be placed in the extended attribute of the first fragment of the virtual disk, so that the data distribution mode of each virtual disk can be different.
Further, each of the segments of the virtual disk are distributed on the same or different data storage nodes.
Further, the client node is at least used for deploying the distributed block storage driver, and the data storage node is at least used for deploying the agent process; the distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a fragment address to be operated based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the fragment address to be operated and an operation command of the virtual disk;
the agent process is used for operating the fragments of the data storage nodes based on the request of the distributed block storage driver.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node requests to open the virtual disk, the client node requests all fragment information of the virtual disk from the main control node, and sequentially sends commands to the data storage node based on the address of the data storage node to which the obtained fragments belong so as to open the corresponding fragments, the data storage node returns the opened fragment handle to the client node, the client node stores the received fragment handle, if the client requests to read and write again, the fragment handle is sent to the data storage node, and the data storage node directly reads and writes fragment data based on the received fragment handle.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node requests to read and write the virtual disk, the fragments needing to be read and written are calculated based on the data distribution of the virtual disk, and the read and write requests are sent to the corresponding fragments based on the data storage node addresses where the fragments to be read and written are inquired from the main control node.
Further, the method for implementing distributed block storage of the virtual machine further includes: and if the client node deletes the virtual disk, the client node requests all fragment information of the virtual disk from the main control node and sends a message to the data storage based on the obtained fragment information, so that the data storage deletes the fragments corresponding to the virtual disk.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node carries out snapshot on the virtual disk, the distributed block storage driver of the client node requests all fragment information of the virtual disk from the main control node to obtain data storage node addresses of all fragments of the virtual disk, and the distributed block storage driver of the client node sends snapshot information to corresponding data storage nodes based on the received data storage node addresses, so that the agent process of the data storage nodes carries out snapshot on the fragments corresponding to the virtual disk.
The invention has the beneficial effects that:
1. the data distribution is completely controlled, the algorithm is flexible, and the slicing position is completely controllable.
2. The Qcow2 format of the open source standard is adopted as a storage back end, and the Qcow2 format supports snapshot. Therefore, the method is simple and easy to implement, the code amount is small, and the maintenance cost of the software is greatly reduced.
3. The storage mode of 'fragmentation + interleaving' is adopted for the storage back end, so that the read-write can be uniformly fragmented on different hosts, the read-write balance is realized, and read-write hot spots are avoided.
4. The large fragments are adopted, the capacity of each fragment is at least 1GB, the number of system fragments is reduced, and all fragment positions can be directly recorded in the master control. When a cluster fails, the situation of which virtual disks are damaged by fragments can be easily known, so that manual repair becomes possible. And the data node in which the data is stored does not need to be calculated when the data is read and written each time.
5. The read-write path of the whole client side is carried out in a user mode. The read-write path is short, the efficiency is high, and the performance is better under novel storage scenes such as SSD and NVME. And the user mode program is convenient to upgrade and maintain, and the kernel mode program is very troublesome to upgrade.
Drawings
FIG. 1 is a Sheepdog architecture diagram;
FIG. 2 is a diagram illustrating roles of a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a distributed block store I/O flow according to an embodiment of the present invention;
fig. 4 is a schematic diagram of virtual disk data distribution according to an embodiment of the present invention.
Detailed Description
For further understanding of the invention, some of the terms mentioned in the present application will first be explained:
and (3) block storage: all data in the block device is partitioned into blocks of a fixed size, each block being assigned a number for addressing. The block store, which is typically a hard disk, may have non-sequential access to the stored data.
Distributed block storage: the single-machine block storage is limited by the capacity of the single-machine block storage, infinite capacity expansion cannot be realized, and single-machine faults are easy to occur. Therefore, a distributed storage technology is generally adopted in cloud computing, that is, data is stored in multiple copies, and each copy is stored in different host nodes, so that the reliability and capacity expansion capability of the cloud computing are improved.
Slicing: in order to implement functions such as I/O balancing, infinite capacity expansion, etc., distributed block storage generally divides a virtual disk into many small fragments for storage according to address offset.
User mode/kernel mode: the operating system is divided into a user mode and a kernel mode. In order to improve reliability, the user mode program cannot perform high-authority operations, but performs high-authority operations such as I/O through an interface provided by the kernel mode. When the user mode calls the kernel mode interface to operate, the CPU performs mode switching to protect the kernel mode interface, and the kernel mode interface is switched to the user mode again after the execution is finished. The cost of such a handover is a reduction in performance.
And (4) process: the Qemu process is a widely used piece of open source software that can emulate a wide variety of hardware environments, such as emulating a hard disk, and then running the guest's virtual machine operating system on the emulated hardware.
: the Qcow2 image format is one of the disk image formats supported by the open source software Qemu. It is a file format for emulating a hard disk to a virtual machine, which supports snapshot operations.
The invention is described in further detail below with reference to the figures and the specific embodiments, which are only exemplary and do not limit the scope of the invention.
The method for implementing distributed block storage of the virtual machine is applied to a distributed storage system which at least comprises 1 main control node, a plurality of client nodes and a plurality of data storage nodes, as shown in figure 2.
The three nodes of the main control node, the client node and the data storage node can be separately deployed on different machines or can be centrally deployed on the same machine, wherein only one main control node is provided, and the client node and the data storage node can be transversely expanded. The Master control node corresponds to a Master in fig. 2, the Client node corresponds to a Client in fig. 2, and the data storage node corresponds to a Chunk Server in fig. 2.
In some embodiments, the master node may also be designed as a master-slave structure, and the client nodes and data storage may extend to the scale of thousands of stations.
FIG. 3 is a diagram illustrating a flow of distributed block store I/O according to an embodiment of the present invention.
The distributed block storage drive is deployed on the client node, and the operations of creating a virtual disk, deleting the virtual disk, expanding the virtual disk, carrying out snapshot and the like can be carried out on the client node. The operation is that the client node sends a request to the main control node, and the main control node processes and stores the information of the virtual disk.
The deployed distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a to-be-operated fragment address based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the to-be-operated fragment address and an operation command of the virtual disk.
The main control node is a collector of all the fragment positions and state information of a virtual disk. The method stores information of which fragments each virtual disk is composed of, which data storage each fragment is located in, and the like in the memory.
In some embodiments, the fragmentation information in the master node is uploaded to it by the data storage node at startup. And the main control node can distribute the positions of the newly-built fragments to different data storage nodes according to a set rule.
And the data storage node runs the agent process, and the data storage node is the real storage back end of the fragment of the virtual disk. A virtual disk is formed by combining a plurality of fragments according to a certain rule. The agent process is used for processing various commands sent by the client, such as opening, reading, writing, closing, deleting and the like.
In one embodiment of the invention, each shard is a Qcow2 file with capacity exceeding 1GB in a file system.
The invention discloses a method for realizing distributed block storage of a virtual machine, which at least comprises the following steps: receiving device of master control node
After a virtual disk establishing command is sent, selecting a data storage node for establishing a first fragment based on a preset rule, and sending a corresponding data storage node address to a client node; the client node creates a first shard at the data storage node based on the received data storage node address, wherein the shard file is in a Qcow2 format.
In an embodiment of the present invention, the shard file is stored in the Qcow2 format, and different from shards in other distributed storage, the Qcow2 mode has the following advantages:
1) qcow2 is a mature, reliable storage format that is widely used in private clouds. The method has simple structure and convenient code realization, and can even directly reference open source codes in some embodiments. 2) Since the Qcow2 supports the snapshot technology, the implementation of the snapshot of the virtual disk after the file in the format is stored is very simple, as long as all the fragments of the virtual disk are simultaneously snapshot. 3) Qcow2 itself supports thin allocation without additional implementation of the thin allocation algorithm.
In addition, the set rule includes selecting the data storage node for creating the fragment based on the read-write performance or the remaining capacity or the vacancy degree.
The following describes a distributed block storage implementation method of a virtual machine by using a process of creating a virtual disk according to an embodiment of the present invention.
And the main control node receives a new virtual disk command of the client node, and selects the data storage node with the maximum residual capacity to store a new fragment based on the residual capacity of the data storage node. The master control node records the created virtual disk and the address of the data storage node corresponding to the new fragment, and sends the address of the data storage node to which the new fragment belongs to the client node.
The client node establishes connection with the data storage node when data is written for the first time based on the address of the data storage node to which the received new fragment belongs, and the data storage node can automatically create the first fragment.
If the client node continues to write data but the data address is not in the first partition, e.g. the data address is
In the second fragment, the client node sends a command for creating the second fragment to the master control node; the main control node selects one data storage node again to store the second fragment according to the idle degree of the data storage node, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment; and the client node establishes connection with the corresponding data storage node according to the address of the data storage node to which the second fragment belongs, sends a fragment creating message to the data storage node, and then continues to write data. The data storage node creates a corresponding second slice based on the received message and writes the data.
When the client node writes data, the distributed block storage driver judges according to a preset data distribution mode of the virtual disk in the embodiment of the invention, judges whether a fragment corresponding to the written data position exists, and creates the fragment if the fragment does not exist. After the subsequent write data message arrives on the data storage node, the agent process of the data storage node changes it to Qcow2 format for storage.
In some embodiments, when a disk is created, the main control node completely controls the position allocation of all the fragments, the distribution of the fragments of the virtual disk is completely controlled by a program in the main control node, and an I/O balancing algorithm or a special scheme may be used to control the distribution of the fragments. For example by capacity, for example by disc type.
The data distribution of the virtual disk in the embodiment of the present invention is shown in fig. 4. In the embodiment of the invention, the virtual disk is formed by combining a plurality of fragments, namely, a virtual disk is formed by combining a plurality of Qcow2 files.
In an embodiment shown in fig. 4, the virtual disk is divided into N fragments according to the address offset for storage, and the virtual disk is constructed in a "multi-fragment + staggered distribution" manner. Wherein, four fragments are a group, and the data in each fragment is partitioned according to 8MB size. Data for virtual disks 0-8MB are stored at locations 0-8MB of slice 0, data for virtual disks 8MB-16MB are stored at locations 0-8MB of slice 1, data for virtual disks 32MB-40MB are stored at locations 8MB-16MB of slice 0, and so on. The file storage of the virtual disk is distributed in a staggered way. The configuration information of the number of fragments, the size of the fragments and the size of the data blocks of each group of the virtual disk is set in the extended attribute of the virtual disk fragment 0, so that the data distribution mode of each virtual disk can be different. And each shard may be located at a different data storage node or may be located at the same data storage node. The number of the fragments and the size of the fragments in the grouping can be configured, and the size of the fragments can be configured according to actual requirements.
The first fragment and the second fragment described in the embodiments of the present invention do not specifically refer to a specific one of the fragments, and in the process of creating a new virtual disk described in the above embodiments, the first fragment corresponds to the fragment 0, and the second fragment corresponds to the fragment 1.
The virtual disk is constructed by adopting multi-fragmentation and staggered distribution, and the following advantages are provided. Many times the reading and writing is continuous, that is, the virtual machine reads and writes continuously offset addresses while reading and writing data. For example, 0-32M data is read continuously, so that read and write requests can be performed simultaneously in four fragments in the first packet shown in fig. 4, and it is avoided that all read and write requests are concentrated on one fragment, which results in that the I/O of the data storage node where the fragment is located is busy, and the I/O of other data storage nodes is idle. The read-write throughput of the system can be improved because each read-write request can be split into a plurality of small read-write requests to be completed by a plurality of machines in parallel. And each fragment is set to be larger, and the minimum is 1 GB. Therefore, open, read/write and close are not needed each time during reading and writing, and the open and the close are not needed after the open, thereby improving the performance. However, some open source software such as the sheetlog cannot do this, and they usually adopt 4MB of one slice, if all the open state is maintained, billions of file handles are generated for data with the size of 4TB, when the file handles are closed, the operating system forcibly writes the written data into the hard disk from the cache, and the performance of the hard disk is not continuously written, which consumes a lot of resources. By adopting the technical scheme of the invention, the 4TB data only has 4000 file handles, so the data does not need to be closed after open, the read/write is directly carried out during each read-write, the file handles do not need to be closed, the cache does not need to be flushed, and the performance is higher. If the Sheepdog uses such a large file as 1GB as a shard, there is a very large delay in writing data into the shard after the shard snapshot. Because it needs to copy a complete copy of a slice as a snapshot and then write data into the slice (copy-on-write, record only to make a snapshot at the time of snapshot, and then wait until the data is written to make a data backup), large file copy takes a long time, and thus a very large I/O delay occurs. Therefore, the sheetlog can only adopt small files such as 4MB as fragments to avoid huge delay.
In an embodiment of the present invention, a client node requests an open virtual disk, when a virtual disk is opened for I/O, a distributed block storage driver configured by the client node requests all fragment information of the virtual disk from a main control node, and based on the obtained fragment information, sends a request of the open virtual disk to an agent process of a data storage node where the fragment is located, as shown in fig. 3, opens all fragments of the data storage and stores handles of the fragments, and if the client requests to read and write again, the handles of the fragments are directly read and written, thereby performing reading and writing.
In an embodiment of the present invention, when reading and writing a virtual disk, because all the fragment location information and handles of the virtual disk are already stored in the open virtual disk, the distributed block storage driver calculates, according to the location to be read and written, which fragment the read and write request needs to be forwarded to according to the data location mapping rule of the virtual disk in fig. 4, for example, if the read and write location is 1MB, the read and write request is directly forwarded to the 0 th fragment; if the read-write location is 42MB, the read-write request is forwarded to slice 1 and the read-write location is changed to 10 MB.
In an embodiment of the present invention, when a client node deletes a virtual disk, a distributed block storage driver of the client node requests all pieces of information of the virtual disk from a main control node to obtain data storage node addresses of all pieces of the virtual disk, and then the distributed block storage driver of the client node sends a deletion message to a corresponding data storage node, so that an agent process of the data storage node operates to delete the pieces corresponding to the virtual disk.
In another embodiment of the present invention, when the client node performs snapshot on the virtual disk, the distributed block storage driver of the client node requests all the fragment information of the virtual disk from the main control node to obtain the data storage node addresses of all the fragments of the virtual disk, and then the distributed block storage driver of the client node sends a snapshot message to the corresponding data storage node, so that the agent process of the data storage node performs snapshot on the fragment corresponding to the virtual disk.
The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.
Claims (8)
1. A distributed block storage implementation method of a virtual machine is applied to a distributed storage system at least comprising 1 main control node, a plurality of client nodes and a plurality of data storage nodes, wherein the client nodes are used for mapping virtual disks to the virtual machine and sending requests for operating the virtual disks to the main control nodes, the main control nodes are used as central nodes for processing and storing information of the virtual disks, and the data storage nodes are used for providing physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine at least comprises the following steps:
after receiving the command of creating the virtual disk, the master control node selects a data storage node for creating the first fragment based on a preset rule and sends a corresponding data storage node address to the client node;
the client node creates a first fragment at the data storage node based on the received data storage node address, wherein the fragment file is in a Qcow2 format;
the virtual disk comprises a plurality of groups, each group comprises a plurality of fragments, and the address of the data block at the corresponding position of each fragment in each group is in staggered distribution;
the client node is at least used for deploying the distributed block storage driver, and the data storage node is at least used for deploying the agent process;
the distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a fragment address to be operated based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the fragment address to be operated and an operation command of the virtual disk;
the agent process is used for operating the fragments of the data storage nodes based on the request of the distributed block storage driver.
2. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising: if the data address continuously written by the client node is not in the first fragment, the client node sends a command for creating a second fragment to the main control node;
the master control node selects a data storage node according to a preset rule for storing the second fragment, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment;
the client node establishes connection with the corresponding data storage node based on the received data storage node address to which the second fragment belongs, sends a message for creating a new fragment to the data storage node, and then continues to write data;
the data storage node creates a corresponding second slice based on the received message and writes the data.
3. The method according to claim 1, wherein configuration information of the number of fragments, the size of the fragments, and the size of the data blocks in each group of the virtual disk is stored in an extended attribute of a first fragment of the virtual disk.
4. The method of claim 3, wherein each partition of the virtual disk is distributed on the same or different data storage nodes.
5. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising:
if the client node requests to open the virtual disk, the client node requests all fragment information of the virtual disk from the main control node, and sequentially sends commands to the data storage node based on the address of the data storage node to which the obtained fragments belong so as to open the corresponding fragments, the data storage node returns the opened fragment handle to the client node, the client node stores the received fragment handle, if the client requests to read and write again, the fragment handle is sent to the data storage node, and the data storage node directly reads and writes fragment data based on the received fragment handle.
6. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising:
if the client node requests to read and write the virtual disk, the fragments needing to be read and written are calculated based on the data distribution of the virtual disk, and the read and write requests are sent to the corresponding fragments based on the data storage node addresses where the fragments to be read and written are inquired from the main control node.
7. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising:
and if the client node deletes the virtual disk, the client node requests all fragment information of the virtual disk from the main control node and sends a message to the data storage based on the obtained fragment information, so that the data storage deletes the fragments corresponding to the virtual disk.
8. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising:
if the client node carries out snapshot on the virtual disk, the distributed block storage driver of the client node requests all fragment information of the virtual disk from the main control node to obtain data storage node addresses of all fragments of the virtual disk, and the distributed block storage driver of the client node sends snapshot information to corresponding data storage nodes based on the received data storage node addresses, so that the agent process of the data storage nodes carries out snapshot on the fragments corresponding to the virtual disk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111213142.2A CN113641467B (en) | 2021-10-19 | 2021-10-19 | Distributed block storage implementation method of virtual machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111213142.2A CN113641467B (en) | 2021-10-19 | 2021-10-19 | Distributed block storage implementation method of virtual machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113641467A CN113641467A (en) | 2021-11-12 |
CN113641467B true CN113641467B (en) | 2022-02-11 |
Family
ID=78427365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111213142.2A Active CN113641467B (en) | 2021-10-19 | 2021-10-19 | Distributed block storage implementation method of virtual machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113641467B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114647388B (en) * | 2022-05-24 | 2022-08-12 | 杭州优云科技有限公司 | Distributed block storage system and management method |
CN115146318B (en) * | 2022-09-02 | 2022-11-29 | 麒麟软件有限公司 | Virtual disk safe storage method |
CN117130980B (en) * | 2023-10-24 | 2024-02-27 | 杭州优云科技有限公司 | Virtual machine snapshot management method and device |
CN117591246B (en) * | 2024-01-18 | 2024-05-03 | 杭州优云科技股份有限公司 | Method and device for realizing WEB terminal of KVM (keyboard video mouse) virtual machine |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103516755A (en) * | 2012-06-27 | 2014-01-15 | 华为技术有限公司 | Virtual storage method and equipment thereof |
CN103761059A (en) * | 2014-01-24 | 2014-04-30 | 中国科学院信息工程研究所 | Multi-disk storage method and system for mass data management |
WO2018054079A1 (en) * | 2016-09-23 | 2018-03-29 | 华为技术有限公司 | Method for storing file, first virtual machine and namenode |
CN112148206A (en) * | 2019-06-28 | 2020-12-29 | 北京金山云网络技术有限公司 | Data reading and writing method and device, electronic equipment and medium |
CN112527492A (en) * | 2019-09-18 | 2021-03-19 | 华为技术有限公司 | Data storage method and device in distributed storage system |
-
2021
- 2021-10-19 CN CN202111213142.2A patent/CN113641467B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103516755A (en) * | 2012-06-27 | 2014-01-15 | 华为技术有限公司 | Virtual storage method and equipment thereof |
CN103761059A (en) * | 2014-01-24 | 2014-04-30 | 中国科学院信息工程研究所 | Multi-disk storage method and system for mass data management |
WO2018054079A1 (en) * | 2016-09-23 | 2018-03-29 | 华为技术有限公司 | Method for storing file, first virtual machine and namenode |
CN112148206A (en) * | 2019-06-28 | 2020-12-29 | 北京金山云网络技术有限公司 | Data reading and writing method and device, electronic equipment and medium |
CN112527492A (en) * | 2019-09-18 | 2021-03-19 | 华为技术有限公司 | Data storage method and device in distributed storage system |
Also Published As
Publication number | Publication date |
---|---|
CN113641467A (en) | 2021-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113641467B (en) | Distributed block storage implementation method of virtual machine | |
US10346081B2 (en) | Handling data block migration to efficiently utilize higher performance tiers in a multi-tier storage environment | |
CN107122127B (en) | Storage operation offload to storage hardware | |
US9934108B2 (en) | System and method for optimizing mirror creation | |
US8504670B2 (en) | Virtualized data storage applications and optimizations | |
JP4175764B2 (en) | Computer system | |
US9348842B2 (en) | Virtualized data storage system optimizations | |
EP2905709A2 (en) | Method and apparatus for replication of files and file systems using a deduplication key space | |
US20220083247A1 (en) | Composite aggregate architecture | |
US20050071560A1 (en) | Autonomic block-level hierarchical storage management for storage networks | |
US11860791B2 (en) | Methods for managing input-output operations in zone translation layer architecture and devices thereof | |
US20180260154A1 (en) | Selectively storing data into allocations areas using streams | |
US20120102286A1 (en) | Methods and structure for online migration of data in storage systems comprising a plurality of storage devices | |
WO2012114338A1 (en) | Cloud storage arrangement and method of operating thereof | |
CN109313538A (en) | Inline duplicate removal | |
EP3322155B1 (en) | Virtual disk processing method and apparatus | |
CN111164584B (en) | Method for managing distributed snapshots for low latency storage and apparatus therefor | |
US20110088029A1 (en) | Server image capacity optimization | |
JP4226350B2 (en) | Data migration method | |
US11614879B2 (en) | Technique for replicating oplog index among nodes of a cluster | |
WO2016088258A1 (en) | Storage system, backup program, and data management method | |
CN117348968A (en) | Cache data acceleration method, device and equipment of virtual disk | |
US12032849B2 (en) | Distributed storage system and computer program product | |
JP5278254B2 (en) | Storage system, data storage method and program | |
US20240232022A1 (en) | Backing up database files in a distributed system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 310000 room 611-612, Zhuoxin building, No. 3820, South Ring Road, Puyan street, Binjiang District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Youyun Technology Co.,Ltd. Country or region after: China Address before: 310053 room 611-612, Zhuoxin building, 3820 South Ring Road, Puyan street, Binjiang District, Hangzhou City, Zhejiang Province Patentee before: Hangzhou Youyun Technology Co.,Ltd. Country or region before: China |