CN113312300B

CN113312300B - Nonvolatile memory caching method integrating data transmission and storage

Info

Publication number: CN113312300B
Application number: CN202110670041.1A
Authority: CN
Inventors: 康亮; 童飞文; 马名; 马可
Original assignee: Shanghai Phegda Technology Co ltd; SHANGHAI DRAGONNET TECHNOLOGY CO LTD
Current assignee: Shanghai Phegda Technology Co ltd; SHANGHAI DRAGONNET TECHNOLOGY CO LTD
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2024-05-03
Anticipated expiration: 2041-06-17
Also published as: CN113312300A

Abstract

The invention relates to a nonvolatile memory caching method integrating data transmission and storage, which is used for directly accessing nonvolatile memory data of a server node in a network transmission process by RDMA technology and executing corresponding caching operation based on a data request and a mapping relation between cache resources and a hard disk space. Compared with the prior art, the invention has zero memory copy in the whole access process of the storage system; establishing a read/write separated cache system, wherein a read cache adopts a linear mapping mode, and a write cache adopts a log mode; when the cache dirty data is exchanged to the rear-end hard disk, the repeated writing requests can be removed, and the random writing requests are ordered and combined, so that the method has the advantages of high speed and high performance.

Description

Nonvolatile memory caching method integrating data transmission and storage

Technical Field

The invention relates to a data storage method, in particular to a nonvolatile memory caching method integrating data transmission and storage.

Background

Nonvolatile memory (NVRAM) is a nonvolatile storage medium that has memory access characteristics both in hardware and in software, which is located in a memory slot in hardware, and which supports memory address access in software. Currently, the price and performance of NVRAM per unit capacity is between memory and Solid State Disk (SSD), and the single NVRAM capacity is also between memory and SSD; in the current server, the NVRAM capacity cannot be large due to the limitation of the number of memory slots, price and other factors, so that the NVRAM is suitable to be used as a cache of a hard disk (particularly an SSD).

The storage device or the system can ensure the atomic operation to a certain extent so as to avoid data damage during faults. The atomic operation granularity of the block device is generally one sector (512 bytes), while the atomic operation granularity of the nonvolatile memory device is generally one CPU CACHE LINE length (8 bytes), so that the nonvolatile memory needs to be at least simulated by software as storage to ensure the atomic operation of the sector unit.

Currently, a nonvolatile cache software system is generally built on a block device system, and a cache medium needs to be accessed through a block device interface, namely, a nonvolatile memory needs to be simulated into a block device, and then data access is realized through a memory copy mode, which has the following disadvantages:

1. The block device interface needs to run the program of the general block device layer, and NVRAM access delay is increased;

2. remote Direct Memory Access (RDMA) transfer techniques cannot directly access block devices, and require transfer through operating system memory;

3. The data transmission between the internal memory and the NVRAM needs to be copied by a CPU, so that a large amount of CPU resources are consumed, and the response speed of the storage system is low;

4. NVRAM simulates a block device, and needs to consume additional CPU resources to solve the atomic operation problem of sector units, and the method can only ensure the atomicity of each sector in a write request, but cannot ensure the atomicity of the whole write request;

5. the existing caching technology generally uses a read/write integrated caching system, the caching granularity is smaller, effective data combination cannot be carried out, and a large amount of data exchange can cause a hard disk to become a performance bottleneck;

6. the read/write integrated cache system cannot guarantee the integrity of the data write request, and the write request part is updated in some cases.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a high-speed and high-performance non-volatile memory caching method for fusing data transmission and storage, which can fuse a transmission protocol into data storage, realize zero memory copy of a caching system, ensure atomicity of a writing request, maximally reduce system delay, reduce CPU (Central processing Unit) resource consumption and exert the memory characteristics of NVRAM (random access memory), thereby improving the performance of the caching system.

The aim of the invention can be achieved by the following technical scheme:

A non-volatile memory caching method integrating data transmission and storage is characterized in that non-volatile memory data of a server node is directly accessed in a network transmission process through an RDMA technology, and corresponding caching operation is executed based on a data request and a mapping relation between cache resources and a hard disk space.

Further, the nonvolatile memory is correspondingly provided with a cache resource pool,

The nonvolatile memory is divided into a data area having a plurality of data blocks chunk for storing cache data, a metadata area for creating 2 metadata blocks meta0 and meta1 for each chunk, and an index area for creating one valid index for each chunk,

One quadruple (index, meta0, meta1, chunk) forms a cache data resource, and all quadruples in the nonvolatile memory form the cache resource pool.

Further, an atomic update method is adopted to update the metadata of the data area.

Further, in the hard disk space, each hard disk is allocated with a unique hard disk ID, according to the size of the data block, each hard disk is divided into a plurality of logical spaces according to a linear mapping mode, a hard disk Hashtable is constructed based on logical space offset, and each logical space has at least one cache data block.

Further, the data request contains at least one triplet (ID, offset), indicating the hard disk ID based on the ID, offset indicating the access hard disk position offset, length indicating the access hard disk data length,

After receiving the data request, based on the hard disk ID and the corresponding hard disk Hashtable, inquiring or creating a required logic space, inquiring the associated nonvolatile memory data block, mapping data to the cache data block, calculating the memory address of the data request in the cache data block, establishing an RDMA operation, and directly reading/writing the memory address on the nonvolatile memory data block.

Further, when the data request is a read request, the following caching steps are executed:

101 Judging whether a corresponding read cache data block R _chunk exists in the required logic space, if yes, executing step 102), if not, applying a cache data resource to the cache resource pool, constructing a read cache data block R _chunk, recording a hard disk ID, a logic space offset and an initialization bitmap in corresponding metadata, and executing step 102);

102 According to the bitmap in the metadata of R _chunk, calculating whether the cache area required by the current read request is valid data according to a linear mapping mode;

103 Judging whether the cache area required by the current read request contains invalid data in the bitmap, if so, loading the data segment to R _chunk by adopting a piecewise linear mapping data loading mode, and if not, executing step 104);

104 A) compute the memory address of the current read request in R _chunk, which is read directly using RDMA.

Further, the piecewise linear mapping data loading mode specifically includes:

And calculating the valid bit of the bitmap required in the logic space according to the offset and length of the read request, and calculating the bitmap of the data segment required to be loaded by combining the initialized bitmap in the read cache data block.

Further, when the data request is a write request, the following caching steps are executed:

201 Judging whether a corresponding write cache data block Wchunk exists in the required logic space, if yes, executing step 202), if not, applying a cache data resource to a cache resource pool, constructing a write cache data block Wchunk, and recording a hard disk ID, a logic space offset and a global unique ID in corresponding metadata, wherein the global unique ID is realized based on a cache pool allocation sequence ID;

202 According to length in the current write request, calculating whether the remaining space of the write cache data block Wchunk meets the requirement of adding the write request in a log writing mode, if yes, executing step 204), if not, executing step 203);

203 Marking the current write cache data block Wchunk as Schunk, immediately applying a cache data resource to the cache resource pool again as Wchunk, starting a background thread, synchronizing data in Schunk to a hard disk, and returning to the step 202);

204 By log writing, the additional data address required by the write request is calculated, and then the memory address is directly written by RDMA.

Further, the synchronizing the data in Schunk to the hard disk is specifically:

1a) Creating a linear address mapping table in a logic space;

1b) Determining log writing data by retrieving and checking data in a write request log header WLH on a write cache data block;

1c) Mapping the log data addresses to a linear address mapping table according to a log sequence, wherein a later-written log covers a first-written log address space;

1d) And merging and updating the hard disk data according to the effective address sequence in the linear address mapping table.

Further, in the log writing mode, the writing request is written into the cache DATA block according to the log mode, and each log is divided into two parts, namely a writing request log header WLH and writing request DATA, specifically:

2a) Append a write request log header WLH, wherein the write request status is marked as ongoing;

2b) Append write request DATA;

2c) Modifying the write request status flag in the WLH to complete;

2d) And updating the WLH integrity checking value.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention directly accesses the nonvolatile memory data of the server node in the network transmission process by RDMA technology, maximally utilizes the memory characteristic of the nonvolatile memory, reduces unnecessary software stack expenditure, reduces the delay of a storage system, and copies zero memory in the whole storage system access process.

(2) The invention combines RDMA technology, the data access in the whole cache system is zero memory copy, and the CPU resource consumption is greatly reduced.

(3) The invention adopts a read/write separated cache design, can access read/write cache data according to different modes, adopts piecewise linear mapping for the read cache, and adopts a log adding mode for the write cache, so that when hard disk data is loaded, the hard disk data can be loaded as required, and less invalid data is loaded; when writing requests, the system is abnormal, so that the atomicity of the whole writing requests can be ensured.

(4) When the cache dirty data is exchanged to the rear-end hard disk, the repeated writing requests can be removed, and the random writing requests are ordered and combined.

Drawings

FIG. 1 is a schematic diagram of the principles of the present invention;

FIG. 2 is a diagram showing a nonvolatile memory data area layout according to the present invention;

FIG. 3 is a metadata update diagram of the present invention;

FIG. 4 is a diagram illustrating a cache resource map according to the present invention;

FIG. 5 is a data segment loading diagram of the present invention;

Fig. 6 is a data synchronization diagram of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

The embodiment provides a method for caching a nonvolatile memory by fusing data transmission and storage, as shown in fig. 1, wherein a data transmission layer and a storage layer are fused, data of a server node is directly accessed in a network transmission process through a Remote Direct Memory Access (RDMA) technology, and corresponding caching operation is executed based on a data request and a mapping relation between cache resources and a hard disk space.

The method supports the segmented loading of the cache data, supports the sequencing, merging and swapping out of the cache data and has the following key technologies:

1) Creating a cache resource pool

Formatting a nonvolatile memory, as shown in fig. 2, wherein the nonvolatile memory is divided into a data area, a metadata area and an index area, wherein the data area is divided into data blocks (chunk) with the same fixed size, and the chunk is used as a basic unit for cache allocation; the metadata area is used for storing metadata of each data block in the data area, two metadata spaces are required to be allocated in the metadata area for each data block in order to ensure transactional updating of the metadata, namely 2 metadata blocks (meta 0 and meta 1) are created for each chunk, one is effective, and the other is ineffective; the index area is used for storing the validity index of the metadata area, creating an effective index (index) for each chunk, the size of the effective index is just the granularity of nonvolatile memory atoms, and the valid metadata block index in the metadata area is recorded.

Based on the above division, four tuples (index, meta0, meta1, chunk) can be obtained, one quad representing a cache data resource, and all the four tuples in the nonvolatile memory form a cache resource pool.

As shown in fig. 2, in the metadata area, there are two metadata correspondence (meta 0, meta 1) for each chunk, and when the metadata of the chunk is updated as shown in fig. 3, the valid metadata block is not covered, but an invalid metadata block is written, and then the index area is modified to point to the newly written metadata block. Since the indexes in the index area are all atomic operations, the metadata update can ensure atomicity.

2) Constructing cache resource to hard disk space mapping

And allocating unique IDs in a cache system for each hard disk, and organizing all the hard disks into a red-black tree according to the hard disk IDs.

According to the size of the data block chunk, the hard disk is divided into a plurality of logical spaces in a linear mapping mode, each logical space manages the allocated non-quantitative NVRAM data block, and the logical space ID is the hard disk ID and the offset in the hard disk where the logical space is located.

And dynamically creating and distributing physical resources to the logic space according to the read-write request, and constructing Hashtable for each hard disk by taking the offset of the logic space in the hard disk as a Key for inquiring the logic space information.

According to the read and write requests of the logic space, different NVRAMs are allocated as cache data blocks for each logic space, and the allocated NVRAMs are used as RDMA memory access addresses for data transmission. At this time, the transmission behavior is also an NVRAM memory access operation.

The NVRAM read-only cache data block has a metadata area containing a 64-bit valid bit flag, and can load the required data according to the data amount required by the read request and flag the validity of the data. And when the NVRAM only writes the cache data block in a log mode and the logical space data block is swapped out, performing duplication removal, sequencing and merging on all write operations in the space, and then swapping out to the hard disk.

As shown in FIG. 4, each logical space may be associated with 0-1 read cache data blocks and 0-2 write cache data blocks, but each logical space has at least one cache data block.

3) Piecewise linear mapping data loading

If the data request is a read request, a read-only data block is allocated from the cache resource pool, and hard disk data is loaded by using a linear address mapping mode.

The read cache data block is divided into a plurality of data segments, a bitmap is stored in a metadata area corresponding to the read cache data block and used for marking effective data on the read cache data block, and a read request is calculated according to the bitmap on the read cache data block and in a linear mapping mode, and the read request needs to load the data segments.

As shown in fig. 5, the read request can calculate the valid bit of the bitmap required in the logical space according to the offset and length, and then calculate the bitmap of the data segment required to be loaded by combining the bitmaps in the read cache data block.

load bitmap＝read bitmap&(～valid bitmap)

And then, loading the data of the corresponding position from the hard disk according to the load bitmap, and updating the valid bitmap.

4) Log writing mode

If the data request is a write request, 1-2 write-only data blocks are allocated from the cache resource pool, and the write request data is stored in a log mode.

As shown in fig. 4, the write request is written into the cache DATA block in a log manner, and each log is divided into a write request log header (WLH) and write request DATA (DATA); the WLH includes a data request triplet (ID), a global unique ID of the chunk, a write request status, and a WLH integrity check value.

The log writing process may be divided into the following steps;

a) Append a write request log header (WLH), wherein the write request status is marked as ongoing;

b) Append write request DATA (DATA);

c) Modifying the write request status flag in the WLH to complete;

d) And updating the WLH integrity checking value.

5) Data synchronization

As shown in FIG. 6, the present invention uses a linear address mapping table to merge and sort the updated hard disk data. The main process is as follows:

a) Creating a linear address mapping table in a logic space;

b) Determining log writing data by retrieving and checking data in WLH on the writing cache data block;

c) Mapping the log data addresses to a linear address mapping table according to a log sequence, wherein a later-written log covers a first-written log address space;

d) And merging and updating the hard disk data according to the effective address sequence in the linear address mapping table.

6) Data request access

The maximum length of the client data request is limited, and the length of the read request, the length of the write request and the length of the head of the write request log data need to be smaller than or equal to the length of the cache data block. Logical spaces may be queried or created by data request access.

Each data request contains at least one triplet (ID), a hard disk ID, an access hard disk location offset, and an access hard disk data length, respectively. Through the ID, based on the hard disk ID mangrove data, the hard disk to be accessed can be found. And then according to Hashtable created on each hard disk, the logic space where the data request is located can be found. If the logical space does not exist, an empty logical space insert Hashtable is created.

Based on the obtained logic space, inquiring the associated nonvolatile memory data block, mapping data to the cache data block, calculating the memory address of the data request in the cache data block, establishing RDMA operation, and directly reading/writing the memory address on the nonvolatile memory data block by the client node.

7) Read request access

When the data request is a read request, the logical space corresponding to the read request is found or created according to the data request access described in 6).

As shown in fig. 4, if the corresponding read cache data block does not exist in the logical space (R _chunk), a cache data resource is applied to the cache resource pool, and the hard disk ID, logical space offset, and initialization bitmap (bitmap) are recorded in the cache data metadata.

If the corresponding R _chunk exists in the logic space, according to the bitmap in the metadata of R _chunk, whether the cache area required by the current read request is valid data is calculated according to a linear mapping method.

If the buffer area required by the current read request contains invalid data in the bitmap, the piecewise linear mapping data loading method described in 3) is used to load the data segment into R _chunk.

If the cache area required by the current read request is valid data in the bitmap, the memory address of the current read request in R _chunk is calculated, and the memory address is directly read by using RDMA.

8) Write request access

When the data request is a write request, the logical space corresponding to the write request is found or created according to the data request access described in 6).

As shown in fig. 4, if there is no corresponding write cache data block in the logical space (Wchunk), a cache data resource is applied to the cache resource pool, and the hard disk ID, the logical space offset, and the globally unique ID are recorded in the Wchunk metadata. The write cache global unique ID may be implemented by allocating a sequence ID to the cache pool, i.e., incrementing the sequence ID each time a cache resource is allocated on the cache resource pool.

If the logical space has corresponding Wchunk, according to the length of the current write request, calculating whether the residual space of the write cache data block meets the requirement of the log mode to add the write request. The calculation method is that the length of a write request log header (WLH) plus the length of write request data is less than or equal to Wchunk remaining spaces.

If the current Wchunk is insufficient, it is marked as Schunk, and a cache data resource is immediately applied to the cache resource pool again as Wchunk. And simultaneously starting a background thread, and synchronizing the data in Schunk to the hard disk by adopting the data synchronization method as described in the 4).

If Wchunk space is sufficient, the log writing mode described in 4) is immediately started, the additional data address required by the write request is calculated, and then the memory address is directly written into by using RDMA.

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. A non-volatile memory caching method integrating data transmission and storage is characterized in that non-volatile memory data of a server node are directly accessed in a network transmission process through an RDMA technology, and corresponding caching operation is executed based on a data request and a mapping relation between cache resources and a hard disk space;

The nonvolatile memory is correspondingly provided with a cache resource pool,

A quadruple (index, meta0, meta1, chunk) forms a cache data resource, and all the quadruples in the nonvolatile memory form the cache resource pool;

Each hard disk is allocated with a unique hard disk ID, each hard disk is divided into a plurality of logic spaces according to the size of the data block and a linear mapping mode, physical resources are dynamically created and allocated to the logic spaces according to read-write requests, the offset of the logic spaces in the hard disk is used as a Key, hashtable is built for each hard disk and used for inquiring logic space information, different NVRAMs are allocated for each logic space according to the read-write requests of the logic spaces and used as cache data blocks, and each logic space is at least provided with one cache data block, and the cache data block uses the NVRAM memory physical address of the cache data block as an RDMA memory access address for data transmission;

the data request contains at least one triplet (ID), indicating the hard disk ID based on the ID, the offset indicating the access hard disk position offset, the length indicating the access hard disk data length,

2. The method for merging data transmission and storage according to claim 1, wherein metadata updating of the data area is performed by an atomic updating method.

3. The method for buffering a non-volatile memory in combination with data transmission and storage according to claim 1, wherein when the data request is a read request, the following buffering steps are performed:

4. The method for buffering a nonvolatile memory for data transmission and storage fusion according to claim 3, wherein the piecewise linear mapping data loading mode specifically comprises:

5. The method for buffering a non-volatile memory in combination with data transmission and storage according to claim 1, wherein when the data request is a write request, the following buffering steps are performed:

201 Judging whether a corresponding write cache data block Wchunk exists in the required logic space, if yes, executing step 202), if not, applying a cache data resource to a cache resource pool, constructing a write cache data block Wchunk, and recording a hard disk ID, a logic space offset and a global unique ID in corresponding metadata, wherein the global unique ID is realized based on a cache resource pool allocation sequence ID;

6. The method for merging data transmission and storage according to claim 5, wherein the synchronizing the data in Schunk to the hard disk is specifically:

1a) Creating a linear address mapping table in a logic space;

7. The method for buffering a nonvolatile memory for DATA transmission and storage fusion according to claim 5, wherein in the log writing mode, the write request is written into the buffered DATA block according to the log mode, and each log is divided into a write request log header WLH and write request DATA, specifically:

2b) Append write request DATA;

2c) Modifying the write request status flag in the WLH to complete;

2d) And updating the WLH integrity checking value.