CN111897784A

CN111897784A - Key value storage-oriented near data computing cluster system

Info

Publication number: CN111897784A
Application number: CN202010668559.7A
Authority: CN
Inventors: 孙辉; 王强
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2020-11-06
Anticipated expiration: 2040-07-13
Also published as: CN111897784B

Abstract

The invention provides a key value storage-oriented near data computing cluster system, which comprises: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device; the host end includes: the device comprises a cluster device management module, a file distribution module and a file migration module; the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device; and the file migration module is used for migrating the file to be compressed to the NDP equipment of the corresponding storage key value interval according to the key value range to perform storage, compression and sequencing. According to the invention, the calculable storage array is formed by arranging the plurality of NDP devices, so that the storage capacity and the calculation capacity of the whole system are ensured, meanwhile, the storage space of the host end is prevented from being occupied, and the host CPU bottleneck of database compression sequencing operation is greatly relieved, thereby being beneficial to improving the data processing efficiency of the whole system.

Description

Key value storage-oriented near data computing cluster system

Technical Field

The invention relates to the technical field, in particular to a key value storage-oriented near data computing cluster system.

Background

The rapid development of computer technology and the internet has prompted the emergence of semi-structured and unstructured data, and the proportion of the data in the total amount of data is increasing exponentially. However, under the condition of increasing unstructured data scale, the traditional relational database cannot meet the requirements of high-efficiency storage, high concurrency and high expandability of mass data. In contrast, key-value storage does not require a predefined data structure, and has been widely applied to unstructured data storage and management for providing low-latency read-write speed and supporting massive data traffic. At present, a key value storage system widely uses a log structure merge tree (LSM-tree) to realize data storage and management, and converts random writing into sequential writing, thereby obtaining excellent writing performance. To efficiently manage data, a log-structured merged tree based key-value storage system generates a compact operation during operation to update and migrate a file table (SSTable) to a next level. However, the compression operation occupies a large amount of I/O bandwidth at the host end and the storage device end, which leads to performance degradation, and the update of the file table also causes the problem of enlarging the write data size.

Disclosure of Invention

Based on the technical problems in the background art, the invention provides a key value storage-oriented near data computing cluster system.

The invention provides a key value storage-oriented near data computing cluster system, which comprises: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device;

the host end includes:

the cluster equipment management module is used for generating an NDP equipment information object, and comprises the steps of setting key value thresholds of the NDP equipment and calculating storage key value intervals of the NDP equipment by combining the key value thresholds; adding and deleting equipment; managing the working state of the equipment;

the file distribution module is used for receiving a file sent by an upper layer application and sending the file to the NDP equipment of the corresponding storage key value interval for storage according to the threshold range of the file;

the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device, screening files to be compressed from the compression storage layers corresponding to each NDP device according to the compression requirements, and acquiring files which are overlapped with the key value range of the files to be compressed from the next compression storage layer as supplementary files to be compressed;

and the file migration module is used for migrating the file to be compressed and the supplementary file to be compressed to the NDP equipment in the corresponding storage key value interval according to the key value range to store, compress and sort the file and the supplementary file.

Preferably, the file migration module includes: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit;

the storage monitoring unit is used for acquiring the compression requirements of each compression storage layer of each NDP device and acquiring files to be compressed and compressed supplementary files corresponding to the compression requirements;

the file copying unit is connected with the storage monitoring unit and is used for screening files with key value ranges exceeding the storage key value interval of the currently located NDP equipment from the files to be compressed and the compressed supplementary files as files to be migrated and acquiring the files;

the file dividing unit is connected with the file copying unit, and is used for acquiring a file containing at least one key value threshold value in a key value range from the file to be migrated as a cutting target and dividing the cutting target according to the key value threshold value;

and the task sending unit is respectively connected with the file copying unit and the file dividing unit and is used for acquiring files to be migrated and divided except the cutting target and distributing the files to the NDP equipment corresponding to the stored key value intervals according to the key value range.

Preferably, the task sending unit is further configured to divide all the files to be migrated into two parts and send the two parts to the compression storage layer at the host end and the NDP device corresponding to the key value range for compression, respectively, when the key value ranges of all the files to be migrated are located in the same storage key value interval.

Preferably, the task sending unit is further configured to obtain a file compressed by the host side and send the file to the NDP device corresponding to the key value range for storage.

Preferably, the host further includes a load balancing module, configured to monitor a task process of each compression storage layer of each NDP device, and count a process occupation time and an idle time of each NDP device in the same task process; the NDP equipment is also used for generating a key value adjusting instruction according to a comparison result of the process occupation time of each NDP equipment and a preset process time consumption threshold value and a comparison result of the idle time of each NDP equipment and a preset idle threshold value;

the cluster device management module is connected with the load balancing module and used for adjusting the key value threshold of the NDP device according to the key value adjusting instruction.

Preferably, the load balancing module is further connected to the file distribution module and the file migration module, respectively, and is configured to monitor and monitor current data deployment and data flow direction;

the load balancing module is used for counting key value sparse conditions and local key value threshold adjusting times of each NDP equipment compression task, and carrying out balancing processing on compression task distribution and data flow direction according to a counting result.

Preferably, the NDP device whose process occupation time is greater than the process time consumption threshold or whose idle time is greater than the idle threshold is used as the adjustment object, the NDP device whose storage key value range is adjacent to the adjustment object is used as the adjacent object, and the cluster device management module is configured to adjust the key value threshold between the adjustment object and any adjacent object according to the key value adjustment instruction.

Preferably, the key value adjusting instruction includes an adjusting object and the task time consumption of two adjacent objects in the task process; when the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and an adjacent object with less task time consumption, and reduces the key value storage interval of the adjustment object; when the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object.

Preferably, the host end and the NDP device both include a cache management module, the host end includes a cache management module for expanding the host cache function, and the NDP device end cache management module is configured to undertake caching of read data in a compression sorting operation.

Preferably, the host side communicates with the NDP device via an ethernet switch.

According to the near data computing cluster system for key value storage, the plurality of NDP devices are arranged to form the storage array, so that the storage capacity of the whole system is guaranteed, meanwhile, the occupied storage space of a host computer end is avoided, and the data processing efficiency of the whole system is improved.

In the invention, the files on the compression storage layer of each NDP device are managed through the file migration module, so that the ordered storage of the files on each NDP device is realized, the unified management efficiency of the files is improved, the unified distribution of the files is facilitated, and the balance of compression tasks obtained by each NDP device is further ensured.

According to the invention, the cluster management module is used for monitoring the state of each NDP device and adjusting the threshold value of the key value, so that the flexible control of each NDP device is realized, the balance adjustment of the task process of each NDP device is ensured through the key value adjustment, and the minimum time consumption of a single compression task is ensured through the balance of the task process of each NDP device, thereby improving the compression efficiency and realizing the maximum utilization of the storage space of each NDP device.

Drawings

Fig. 1 is a block diagram of a near data computing cluster system oriented to key value storage according to the present invention.

Detailed Description

Referring to fig. 1, a near data computing cluster system for key value storage according to the present invention includes: a host side and a plurality of NDP devices. The host end is respectively connected with each NDP device.

The host end includes:

the cluster equipment management module is used for generating an NDP equipment information object, and comprises the steps of setting key value thresholds of the NDP equipment and calculating storage key value intervals of the NDP equipment by combining the key value thresholds; adding and deleting equipment; and managing the working state of the equipment.

Specifically, in this embodiment, the NDP devices are sorted according to the key value threshold from small to large, the lower limit of the storage key value interval of each NDP device is the key value threshold of the NDP device adjacent to the storage key value interval, and the upper limit of the storage key value interval of each NDP device is the own key value threshold; the lower limit value of the key value storage interval of the NDP device corresponding to the minimum key value threshold is 0.

That is, assume that the key value threshold of NDP device 1 is f1, and the key value threshold of NDP device n is fn; f1 < f2 < … < f (n-1) < fn. Then, the NDP device 1 has a key value interval of (0, f 1), and the NDP device n has a key value interval of (f (n-1), fn ].

And the file distribution module is used for receiving the file sent by the upper layer application and sending the file to the NDP equipment of the corresponding storage key value interval for storage according to the threshold range of the file.

Specifically, suppose that the upper layer application newly sends a file A to the host, the key value range of the file A is [ a1, a2], and f (i-1) < a1 < a2 < fi; the file a is sent to the L0 compressed storage layer of NDP device i for storage.

Suppose that the upper layer application newly sends file B to the host, the key value range of the file B is [ B1, B2], and f (i-2) < B1 < f (i-1) < B2 < fi. It can be seen that the key value range of the file B overlaps with the storage key value intervals of the NDP device i-1 and the NDP device i, and at this time, the file B is sent to the L0 compressed storage layer of the NDP device i-1 and the L0 compressed storage layer of the NDP device i for storage.

And the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device, screening the file to be compressed from the compression storage layer corresponding to each NDP device according to the compression requirements, and acquiring the file which is overlapped with the key value range of the file to be compressed from the next compression storage layer as the supplementary file to be compressed.

Specifically, in this embodiment, when the number of files stored in any one of the compression storage layers of any one of the NDP devices reaches the corresponding upper limit value, the compression requirement is generated. In this embodiment, the storage compression layer of each NDP device may be monitored in real time by the cluster device management module, and the compression requirement may be obtained. The file migration module obtains the compression requirements from the cluster management module. Therefore, the monitoring of the working state of the NDP equipment is separated from the file processing, and the file processing efficiency is further improved.

Specifically, in the present embodiment, when the storage file of the LO compressed storage layer reaches the upper limit value, the LO compressed storage layer is directly compressed and stored in the L1 compressed storage layer of the NDP device.

For the L1 compressed storage layer and the compressed storage layers above, the files to be compressed and the supplemental files to be compressed are sorted according to the key value range. Assuming that a file on the Lj compression storage layer of a storage array composed of a plurality of NDP devices reaches an upper storage limit, a compression requirement is generated, a file C on the Lj compression storage layer of an NDP device i is selected as a file to be compressed, and a file D on the L (j +1) compression storage layer of the NDP device i is selected as a supplemental file to be compressed.

Wherein, the key value range of the file C is [ C1, C2], f (i-2) < C1 < C2 < f (i-1), the key value range of the file D is [ D1, D2], and f (i-2) < D1 < f (i-1) < D2 < fi.

It can be seen that the key value range of the file C is entirely within the key value range of the NDP device i-1, and the key value range of the file D is partitioned between the key value ranges of the NDP device i-1 and the NDP device i. Therefore, in this embodiment, the file migration module acquires the file C from the NDP device i, sends the file C to the Lj compression storage layer of the NDP device i-1, and performs compression on the file C by the Lj compression storage layer of the NDP device i-1 and stores the compressed file in the local L (j +1) compression storage layer. Meanwhile, the file migration module acquires the file D from the NDP device i and divides the file D into: the file D1 with key value range [ D1, f (i-1) ] and the file D2, the file D1 and the file D2 with key value range (f (i-1), D2] are respectively sent to the Lj compression storage layer of the NDP device i-1 and the Lj compression storage layer of the NDP device i, the NDP device i-1 compresses a file D1 and stores the compressed file on the local L (j +1) compression storage layer, and the NDP device i compresses a file D2 and stores the compressed file on the local L (j +1) compression storage layer.

Specifically, in this embodiment, the file migration module includes: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit.

And the storage monitoring unit is used for acquiring the compression requirements of each compression storage layer of each NDP device and acquiring the files to be compressed and the compressed supplementary files corresponding to the compression requirements.

Specifically, in the system, each NDP device is monitored through a cluster device management module, the cluster device management module obtains a compression requirement of the NDP device at the first time, a file migration module obtains the compression requirement from the cluster device management module, obtains a corresponding compression storage layer according to the analysis of the compression requirement, and then performs file sorting processing.

In this embodiment, the storage monitoring unit communicates with the cluster device management module in real time to obtain the compression requirement.

And the file copying unit is connected with the storage monitoring unit and is used for screening files with key value ranges exceeding the storage key value interval of the current NDP equipment from the files to be compressed and the compressed supplementary files as files to be migrated and acquiring the files. In this embodiment, the file copying unit copies the file to be migrated, the cluster management module supervises the copied file of the NDP device, and when the file is copied and sent to the new NDP device, the cluster management module notifies the original NDP device to delete the copied file, so as to avoid redundant storage.

And the file segmentation unit is connected with the file copying unit, is used for acquiring a file containing at least one key value threshold value in a key value range from the file to be migrated as a cutting target, and is used for segmenting the cutting target according to the key value threshold value.

For example, in the above embodiment, after the file copying unit copies the file C and the file D from the NDP device i, the file dividing unit divides the file D into: a file D1 with key value range [ D1, f (i-1) ] and a file D2 with key value range (f (i-1), D2 ].

The task transmission unit acquires the file C from the file copying unit and transmits the file C to the NDP device i-1. The task transmission unit acquires the file D1 and the file D2 from the file splitting unit and transmits them to the NDP device i-1 and the NDP device i, respectively.

In this embodiment, the task sending unit is further configured to, when the key value ranges of all the files to be migrated are located in the same storage key value interval, divide all the files to be migrated into two parts, and send the two parts to the compression storage layer at the host end and the NDP device corresponding to the key value range for compression. For example, in the above document of file C, after the task sending unit obtains file C, the file C is divided into file C1 and file C2, file C1 is compressed by the host, and file C2 is sent to NDP device i-1 for compression and storage in the local L (j +1) compression storage layer.

In this embodiment, the task sending unit is further configured to obtain a file compressed by the host and send the file to the NDP device corresponding to the key value range for storage. Thus, after the file C1 is compressed by the host, the compressed file is sent to the NDP device i-1 through the task sending unit and is stored in the L (j +1) compression storage layer by the NDP device i-1. Therefore, the host end and the NDP equipment are synchronously compressed, and the compression efficiency is improved. Meanwhile, the occupied storage space of the host end is also avoided.

In this embodiment, the host further includes a load balancing module, configured to monitor a task process of each compression storage layer of each NDP device, and count a process occupation time and an idle time of each NDP device in the same task process. The load balancing module is further configured to generate a key value adjustment instruction according to a comparison result between the process occupation time of each NDP device and a preset process time consumption threshold and a comparison result between the idle time of each NDP device and a preset idle threshold.

The cluster device management module is connected with the load balancing module and used for adjusting the key value threshold of the NDP device according to the key value adjusting instruction. Therefore, by adjusting the key value threshold, the file distribution module can distribute files according to the processing efficiency of different NDP devices, so that the task process of each NDP device is balanced.

Specifically, in this embodiment, the load balancing module is configured to count key value sparsity and local key value threshold tuning times of each NDP device compression task, and perform balancing processing on compression task distribution and data flow direction according to a statistical result.

Specifically, the load balancing module is configured to perform compression task distribution balancing by adjusting the priority of the compression queue, and is configured to calculate an optimal position of each key value threshold according to a ratio of the average file of the NDP device to the expected data migration to balance the data flow direction.

In this embodiment, the load balancing module is further connected to the file distribution module and the file migration module, respectively, and is configured to monitor and monitor current data deployment and data flow direction. Specifically, the load balancing module is connected with other modules to monitor data access optimization, data storage and the like in real time and control compression task distribution and data amount balance. The load balancing module dynamically adjusts the priority of the compressed task queue through judging the current data deployment and flow direction, and counts the adjustment times of the key value threshold value, thereby controlling the task selection work and the file distribution direction of the file migration module.

In this embodiment, when performing balanced distribution of compression tasks, the priority of the balanced compression queue is adjusted, and the tasks in the queue are sparsely compared according to the key values of the tasks and sorted, so as to determine the task to be completed by the balanced compression. When data flow equalization is executed, the optimal position of each key value threshold is determined by calculating the ratio of the average file of each NDP device to the expected data migration, then the priority of a compression queue is adjusted, and a task for adjusting the compression queue is selected and distributed to each device to be executed.

Thus, the load balancing module is used for balancing the key value sparsity ratio of each NDP equipment compression task; and analyzing the current data distribution and the data migration cost to calculate the optimal value of each key value threshold. In the working process, the load balancing module assists the system to change the flow direction of data by adjusting the key value threshold, control file migration and distribution and realize load balancing.

Specifically, in this embodiment, the NDP device whose process occupation time is greater than the process time consumption threshold or whose idle time is greater than the idle threshold is used as the adjustment object, the NDP device whose storage key value range is adjacent to the adjustment object is used as the adjacent object, and the cluster device management module is configured to adjust the key value threshold between the adjustment object and any adjacent object according to the key value adjustment instruction.

Thus, assuming that 5 NDP devices are installed in a certain system, and the process occupation time of the NDP device 1 is greater than the process elapsed time threshold when a certain compression task is executed, the NDP device 1 is set as an adjustment target, the NDP device 2 is set as an adjacent target, and the key value threshold f1 of the NDP device 1 is adjusted to f1', f1' < f 1. In this way, the range of key values stored in the NDP device 1 is narrowed to reduce the files obtained by compressing the storage layer by the L0 of the NDP device 1, thereby reducing the amount of compression tasks; meanwhile, as the range of the stored key values is expanded, the L0 of the NDP device 2 compresses the storage layer to obtain more files, so as to share the calculation pressure of the NDP device 1, and realize task equalization of the NDP device 1 and the NDP device 2.

Assuming that 5 NDP devices are provided in a certain system and that the idle time of the NDP device 5 is greater than the idle threshold when a certain compression task is executed, the NDP device 5 is set as an adjustment target, the NDP device 4 is set as an adjacent target, and the key value threshold f4 of the NDP device 4 is adjusted to f4', f4' > f 4. Thus, the NDP device 4, due to the expansion of the range of key values, has the L0 compressing the storage layer to obtain more files to share the computational stress of the NDP device 5. The range of key values stored in the NDP device 5 is reduced to reduce the amount of compression tasks for the file obtained by compressing the storage layer by the L0 of the NDP device 5, thereby achieving task equalization between the NDP device 4 and the NDP device 5.

In this way, in the embodiment, by sharing the tasks of the adjacent object and the adjustment object in a balanced manner, compared with the unified adjustment of the key value thresholds of all the NDP devices, the file migration data amount in the subsequent task is reduced, which is beneficial to reducing the calculation pressure caused by file migration.

In this embodiment, the key value adjustment instruction includes an adjustment object and a task time consumption of two adjacent objects in the task process. When the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object with less task time consumption, and the key value storage interval of the adjustment object is reduced. When the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object. Therefore, the key value equalization processing efficiency is further ensured.

Specifically, in this embodiment, the cluster device management module is configured to generate an information object of each NDP device. The information object is used for managing the number of the corresponding NDP device and the key value threshold, and calculating the key value storage interval of each NDP device by combining the key value threshold. The information object is also used for monitoring the communication between the corresponding NDP equipment and the host side and controlling the file deployment and the file compression of the NDP equipment.

In this embodiment, the host side and the NDP device communicate with each other through an ethernet switch.

In this embodiment, both the host side and the NDP device include a cache management module. The host side comprises a cache management module for expanding the host cache function and bearing the cache work of the read operation data. And the NDP device end cache management module is used for bearing the cache of the read data in the compression sorting operation. Therefore, the read operation is ended at the host end as much as possible, and meanwhile, the NDP equipment end caches the data of the latest compression task, so that the spatial locality and the temporal locality of the compression sorting process are effectively utilized.

In this embodiment, the load balancing module is further configured to sense a behavior rule of the read operation, calculate an access frequency for target data of the read operation, and instruct the cache to perform hot and cold data layering.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims

1. A key-value storage oriented near data computing cluster system, comprising: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device;

the host end includes:

2. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the file migration module comprises: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit;

3. The key-value-storage-oriented near data computing cluster system of claim 2, wherein the task sending unit is further configured to, when the key value ranges of all the files to be migrated are located in the same storage key value interval, divide all the files to be migrated into two parts, and send the two parts to the compressed storage layer at the host end and the NDP device corresponding to the key value range for compression, respectively.

4. The key-value-storage-oriented near data computing cluster system of claim 3, wherein the task sending unit is further configured to obtain a host-side compressed file and send the host-side compressed file to the NDP device corresponding to the key-value range for storage.

5. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the host further comprises a load balancing module for monitoring task processes of each compressed storage layer of each NDP device, and counting process occupation time and idle time of each NDP device in the same task process; the NDP equipment is also used for generating a key value adjusting instruction according to a comparison result of the process occupation time of each NDP equipment and a preset process time consumption threshold value and a comparison result of the idle time of each NDP equipment and a preset idle threshold value;

6. The key-value-storage-oriented near data computing cluster system of claim 5, wherein the load balancing module is further connected to the file distribution module and the file migration module, respectively, for monitoring and monitoring current data deployment and data flow direction;

7. The key-value-storage-oriented near data computing cluster system of claim 5, wherein NDP devices with process occupation time greater than a process time-consuming threshold or idle time greater than an idle threshold are used as adjustment objects, NDP devices with a key-value-storage range adjacent to the adjustment objects are used as adjacent objects, and the cluster device management module is configured to adjust the key-value threshold between the adjustment objects and any one of the adjacent objects according to the key-value adjustment instruction.

8. The key-value-storage-oriented near data computing cluster system of claim 7, wherein the key-value adjustment instruction comprises an adjustment object and task time consumption of two adjacent objects in the current task process; when the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and an adjacent object with less task time consumption, and reduces the key value storage interval of the adjustment object; when the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object.

9. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the host side and the NDP device each comprise a cache management module, the host side comprises a cache management module for expanding a host cache function, and the NDP device side cache management module is for assuming caching of read data in a compression sorting operation.

10. The key-value-storage-oriented near data computing cluster system of any one of claims 1-9, wherein the host-side and the NDP device communicate via an ethernet switch.