Nothing Special   »   [go: up one dir, main page]

CN111897784A - Key value storage-oriented near data computing cluster system - Google Patents

Key value storage-oriented near data computing cluster system Download PDF

Info

Publication number
CN111897784A
CN111897784A CN202010668559.7A CN202010668559A CN111897784A CN 111897784 A CN111897784 A CN 111897784A CN 202010668559 A CN202010668559 A CN 202010668559A CN 111897784 A CN111897784 A CN 111897784A
Authority
CN
China
Prior art keywords
ndp
file
storage
key value
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010668559.7A
Other languages
Chinese (zh)
Other versions
CN111897784B (en
Inventor
孙辉
王强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202010668559.7A priority Critical patent/CN111897784B/en
Publication of CN111897784A publication Critical patent/CN111897784A/en
Application granted granted Critical
Publication of CN111897784B publication Critical patent/CN111897784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a key value storage-oriented near data computing cluster system, which comprises: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device; the host end includes: the device comprises a cluster device management module, a file distribution module and a file migration module; the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device; and the file migration module is used for migrating the file to be compressed to the NDP equipment of the corresponding storage key value interval according to the key value range to perform storage, compression and sequencing. According to the invention, the calculable storage array is formed by arranging the plurality of NDP devices, so that the storage capacity and the calculation capacity of the whole system are ensured, meanwhile, the storage space of the host end is prevented from being occupied, and the host CPU bottleneck of database compression sequencing operation is greatly relieved, thereby being beneficial to improving the data processing efficiency of the whole system.

Description

Key value storage-oriented near data computing cluster system
Technical Field
The invention relates to the technical field, in particular to a key value storage-oriented near data computing cluster system.
Background
The rapid development of computer technology and the internet has prompted the emergence of semi-structured and unstructured data, and the proportion of the data in the total amount of data is increasing exponentially. However, under the condition of increasing unstructured data scale, the traditional relational database cannot meet the requirements of high-efficiency storage, high concurrency and high expandability of mass data. In contrast, key-value storage does not require a predefined data structure, and has been widely applied to unstructured data storage and management for providing low-latency read-write speed and supporting massive data traffic. At present, a key value storage system widely uses a log structure merge tree (LSM-tree) to realize data storage and management, and converts random writing into sequential writing, thereby obtaining excellent writing performance. To efficiently manage data, a log-structured merged tree based key-value storage system generates a compact operation during operation to update and migrate a file table (SSTable) to a next level. However, the compression operation occupies a large amount of I/O bandwidth at the host end and the storage device end, which leads to performance degradation, and the update of the file table also causes the problem of enlarging the write data size.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a key value storage-oriented near data computing cluster system.
The invention provides a key value storage-oriented near data computing cluster system, which comprises: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device;
the host end includes:
the cluster equipment management module is used for generating an NDP equipment information object, and comprises the steps of setting key value thresholds of the NDP equipment and calculating storage key value intervals of the NDP equipment by combining the key value thresholds; adding and deleting equipment; managing the working state of the equipment;
the file distribution module is used for receiving a file sent by an upper layer application and sending the file to the NDP equipment of the corresponding storage key value interval for storage according to the threshold range of the file;
the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device, screening files to be compressed from the compression storage layers corresponding to each NDP device according to the compression requirements, and acquiring files which are overlapped with the key value range of the files to be compressed from the next compression storage layer as supplementary files to be compressed;
and the file migration module is used for migrating the file to be compressed and the supplementary file to be compressed to the NDP equipment in the corresponding storage key value interval according to the key value range to store, compress and sort the file and the supplementary file.
Preferably, the file migration module includes: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit;
the storage monitoring unit is used for acquiring the compression requirements of each compression storage layer of each NDP device and acquiring files to be compressed and compressed supplementary files corresponding to the compression requirements;
the file copying unit is connected with the storage monitoring unit and is used for screening files with key value ranges exceeding the storage key value interval of the currently located NDP equipment from the files to be compressed and the compressed supplementary files as files to be migrated and acquiring the files;
the file dividing unit is connected with the file copying unit, and is used for acquiring a file containing at least one key value threshold value in a key value range from the file to be migrated as a cutting target and dividing the cutting target according to the key value threshold value;
and the task sending unit is respectively connected with the file copying unit and the file dividing unit and is used for acquiring files to be migrated and divided except the cutting target and distributing the files to the NDP equipment corresponding to the stored key value intervals according to the key value range.
Preferably, the task sending unit is further configured to divide all the files to be migrated into two parts and send the two parts to the compression storage layer at the host end and the NDP device corresponding to the key value range for compression, respectively, when the key value ranges of all the files to be migrated are located in the same storage key value interval.
Preferably, the task sending unit is further configured to obtain a file compressed by the host side and send the file to the NDP device corresponding to the key value range for storage.
Preferably, the host further includes a load balancing module, configured to monitor a task process of each compression storage layer of each NDP device, and count a process occupation time and an idle time of each NDP device in the same task process; the NDP equipment is also used for generating a key value adjusting instruction according to a comparison result of the process occupation time of each NDP equipment and a preset process time consumption threshold value and a comparison result of the idle time of each NDP equipment and a preset idle threshold value;
the cluster device management module is connected with the load balancing module and used for adjusting the key value threshold of the NDP device according to the key value adjusting instruction.
Preferably, the load balancing module is further connected to the file distribution module and the file migration module, respectively, and is configured to monitor and monitor current data deployment and data flow direction;
the load balancing module is used for counting key value sparse conditions and local key value threshold adjusting times of each NDP equipment compression task, and carrying out balancing processing on compression task distribution and data flow direction according to a counting result.
Preferably, the NDP device whose process occupation time is greater than the process time consumption threshold or whose idle time is greater than the idle threshold is used as the adjustment object, the NDP device whose storage key value range is adjacent to the adjustment object is used as the adjacent object, and the cluster device management module is configured to adjust the key value threshold between the adjustment object and any adjacent object according to the key value adjustment instruction.
Preferably, the key value adjusting instruction includes an adjusting object and the task time consumption of two adjacent objects in the task process; when the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and an adjacent object with less task time consumption, and reduces the key value storage interval of the adjustment object; when the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object.
Preferably, the host end and the NDP device both include a cache management module, the host end includes a cache management module for expanding the host cache function, and the NDP device end cache management module is configured to undertake caching of read data in a compression sorting operation.
Preferably, the host side communicates with the NDP device via an ethernet switch.
According to the near data computing cluster system for key value storage, the plurality of NDP devices are arranged to form the storage array, so that the storage capacity of the whole system is guaranteed, meanwhile, the occupied storage space of a host computer end is avoided, and the data processing efficiency of the whole system is improved.
In the invention, the files on the compression storage layer of each NDP device are managed through the file migration module, so that the ordered storage of the files on each NDP device is realized, the unified management efficiency of the files is improved, the unified distribution of the files is facilitated, and the balance of compression tasks obtained by each NDP device is further ensured.
According to the invention, the cluster management module is used for monitoring the state of each NDP device and adjusting the threshold value of the key value, so that the flexible control of each NDP device is realized, the balance adjustment of the task process of each NDP device is ensured through the key value adjustment, and the minimum time consumption of a single compression task is ensured through the balance of the task process of each NDP device, thereby improving the compression efficiency and realizing the maximum utilization of the storage space of each NDP device.
Drawings
Fig. 1 is a block diagram of a near data computing cluster system oriented to key value storage according to the present invention.
Detailed Description
Referring to fig. 1, a near data computing cluster system for key value storage according to the present invention includes: a host side and a plurality of NDP devices. The host end is respectively connected with each NDP device.
The host end includes:
the cluster equipment management module is used for generating an NDP equipment information object, and comprises the steps of setting key value thresholds of the NDP equipment and calculating storage key value intervals of the NDP equipment by combining the key value thresholds; adding and deleting equipment; and managing the working state of the equipment.
Specifically, in this embodiment, the NDP devices are sorted according to the key value threshold from small to large, the lower limit of the storage key value interval of each NDP device is the key value threshold of the NDP device adjacent to the storage key value interval, and the upper limit of the storage key value interval of each NDP device is the own key value threshold; the lower limit value of the key value storage interval of the NDP device corresponding to the minimum key value threshold is 0.
That is, assume that the key value threshold of NDP device 1 is f1, and the key value threshold of NDP device n is fn; f1 < f2 < … < f (n-1) < fn. Then, the NDP device 1 has a key value interval of (0, f 1), and the NDP device n has a key value interval of (f (n-1), fn ].
And the file distribution module is used for receiving the file sent by the upper layer application and sending the file to the NDP equipment of the corresponding storage key value interval for storage according to the threshold range of the file.
Specifically, suppose that the upper layer application newly sends a file A to the host, the key value range of the file A is [ a1, a2], and f (i-1) < a1 < a2 < fi; the file a is sent to the L0 compressed storage layer of NDP device i for storage.
Suppose that the upper layer application newly sends file B to the host, the key value range of the file B is [ B1, B2], and f (i-2) < B1 < f (i-1) < B2 < fi. It can be seen that the key value range of the file B overlaps with the storage key value intervals of the NDP device i-1 and the NDP device i, and at this time, the file B is sent to the L0 compressed storage layer of the NDP device i-1 and the L0 compressed storage layer of the NDP device i for storage.
And the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device, screening the file to be compressed from the compression storage layer corresponding to each NDP device according to the compression requirements, and acquiring the file which is overlapped with the key value range of the file to be compressed from the next compression storage layer as the supplementary file to be compressed.
Specifically, in this embodiment, when the number of files stored in any one of the compression storage layers of any one of the NDP devices reaches the corresponding upper limit value, the compression requirement is generated. In this embodiment, the storage compression layer of each NDP device may be monitored in real time by the cluster device management module, and the compression requirement may be obtained. The file migration module obtains the compression requirements from the cluster management module. Therefore, the monitoring of the working state of the NDP equipment is separated from the file processing, and the file processing efficiency is further improved.
And the file migration module is used for migrating the file to be compressed and the supplementary file to be compressed to the NDP equipment in the corresponding storage key value interval according to the key value range to store, compress and sort the file and the supplementary file.
Specifically, in the present embodiment, when the storage file of the LO compressed storage layer reaches the upper limit value, the LO compressed storage layer is directly compressed and stored in the L1 compressed storage layer of the NDP device.
For the L1 compressed storage layer and the compressed storage layers above, the files to be compressed and the supplemental files to be compressed are sorted according to the key value range. Assuming that a file on the Lj compression storage layer of a storage array composed of a plurality of NDP devices reaches an upper storage limit, a compression requirement is generated, a file C on the Lj compression storage layer of an NDP device i is selected as a file to be compressed, and a file D on the L (j +1) compression storage layer of the NDP device i is selected as a supplemental file to be compressed.
Wherein, the key value range of the file C is [ C1, C2], f (i-2) < C1 < C2 < f (i-1), the key value range of the file D is [ D1, D2], and f (i-2) < D1 < f (i-1) < D2 < fi.
It can be seen that the key value range of the file C is entirely within the key value range of the NDP device i-1, and the key value range of the file D is partitioned between the key value ranges of the NDP device i-1 and the NDP device i. Therefore, in this embodiment, the file migration module acquires the file C from the NDP device i, sends the file C to the Lj compression storage layer of the NDP device i-1, and performs compression on the file C by the Lj compression storage layer of the NDP device i-1 and stores the compressed file in the local L (j +1) compression storage layer. Meanwhile, the file migration module acquires the file D from the NDP device i and divides the file D into: the file D1 with key value range [ D1, f (i-1) ] and the file D2, the file D1 and the file D2 with key value range (f (i-1), D2] are respectively sent to the Lj compression storage layer of the NDP device i-1 and the Lj compression storage layer of the NDP device i, the NDP device i-1 compresses a file D1 and stores the compressed file on the local L (j +1) compression storage layer, and the NDP device i compresses a file D2 and stores the compressed file on the local L (j +1) compression storage layer.
Specifically, in this embodiment, the file migration module includes: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit.
And the storage monitoring unit is used for acquiring the compression requirements of each compression storage layer of each NDP device and acquiring the files to be compressed and the compressed supplementary files corresponding to the compression requirements.
Specifically, in the system, each NDP device is monitored through a cluster device management module, the cluster device management module obtains a compression requirement of the NDP device at the first time, a file migration module obtains the compression requirement from the cluster device management module, obtains a corresponding compression storage layer according to the analysis of the compression requirement, and then performs file sorting processing.
In this embodiment, the storage monitoring unit communicates with the cluster device management module in real time to obtain the compression requirement.
And the file copying unit is connected with the storage monitoring unit and is used for screening files with key value ranges exceeding the storage key value interval of the current NDP equipment from the files to be compressed and the compressed supplementary files as files to be migrated and acquiring the files. In this embodiment, the file copying unit copies the file to be migrated, the cluster management module supervises the copied file of the NDP device, and when the file is copied and sent to the new NDP device, the cluster management module notifies the original NDP device to delete the copied file, so as to avoid redundant storage.
And the file segmentation unit is connected with the file copying unit, is used for acquiring a file containing at least one key value threshold value in a key value range from the file to be migrated as a cutting target, and is used for segmenting the cutting target according to the key value threshold value.
And the task sending unit is respectively connected with the file copying unit and the file dividing unit and is used for acquiring files to be migrated and divided except the cutting target and distributing the files to the NDP equipment corresponding to the stored key value intervals according to the key value range.
For example, in the above embodiment, after the file copying unit copies the file C and the file D from the NDP device i, the file dividing unit divides the file D into: a file D1 with key value range [ D1, f (i-1) ] and a file D2 with key value range (f (i-1), D2 ].
The task transmission unit acquires the file C from the file copying unit and transmits the file C to the NDP device i-1. The task transmission unit acquires the file D1 and the file D2 from the file splitting unit and transmits them to the NDP device i-1 and the NDP device i, respectively.
In this embodiment, the task sending unit is further configured to, when the key value ranges of all the files to be migrated are located in the same storage key value interval, divide all the files to be migrated into two parts, and send the two parts to the compression storage layer at the host end and the NDP device corresponding to the key value range for compression. For example, in the above document of file C, after the task sending unit obtains file C, the file C is divided into file C1 and file C2, file C1 is compressed by the host, and file C2 is sent to NDP device i-1 for compression and storage in the local L (j +1) compression storage layer.
In this embodiment, the task sending unit is further configured to obtain a file compressed by the host and send the file to the NDP device corresponding to the key value range for storage. Thus, after the file C1 is compressed by the host, the compressed file is sent to the NDP device i-1 through the task sending unit and is stored in the L (j +1) compression storage layer by the NDP device i-1. Therefore, the host end and the NDP equipment are synchronously compressed, and the compression efficiency is improved. Meanwhile, the occupied storage space of the host end is also avoided.
In this embodiment, the host further includes a load balancing module, configured to monitor a task process of each compression storage layer of each NDP device, and count a process occupation time and an idle time of each NDP device in the same task process. The load balancing module is further configured to generate a key value adjustment instruction according to a comparison result between the process occupation time of each NDP device and a preset process time consumption threshold and a comparison result between the idle time of each NDP device and a preset idle threshold.
The cluster device management module is connected with the load balancing module and used for adjusting the key value threshold of the NDP device according to the key value adjusting instruction. Therefore, by adjusting the key value threshold, the file distribution module can distribute files according to the processing efficiency of different NDP devices, so that the task process of each NDP device is balanced.
Specifically, in this embodiment, the load balancing module is configured to count key value sparsity and local key value threshold tuning times of each NDP device compression task, and perform balancing processing on compression task distribution and data flow direction according to a statistical result.
Specifically, the load balancing module is configured to perform compression task distribution balancing by adjusting the priority of the compression queue, and is configured to calculate an optimal position of each key value threshold according to a ratio of the average file of the NDP device to the expected data migration to balance the data flow direction.
In this embodiment, the load balancing module is further connected to the file distribution module and the file migration module, respectively, and is configured to monitor and monitor current data deployment and data flow direction. Specifically, the load balancing module is connected with other modules to monitor data access optimization, data storage and the like in real time and control compression task distribution and data amount balance. The load balancing module dynamically adjusts the priority of the compressed task queue through judging the current data deployment and flow direction, and counts the adjustment times of the key value threshold value, thereby controlling the task selection work and the file distribution direction of the file migration module.
In this embodiment, when performing balanced distribution of compression tasks, the priority of the balanced compression queue is adjusted, and the tasks in the queue are sparsely compared according to the key values of the tasks and sorted, so as to determine the task to be completed by the balanced compression. When data flow equalization is executed, the optimal position of each key value threshold is determined by calculating the ratio of the average file of each NDP device to the expected data migration, then the priority of a compression queue is adjusted, and a task for adjusting the compression queue is selected and distributed to each device to be executed.
Thus, the load balancing module is used for balancing the key value sparsity ratio of each NDP equipment compression task; and analyzing the current data distribution and the data migration cost to calculate the optimal value of each key value threshold. In the working process, the load balancing module assists the system to change the flow direction of data by adjusting the key value threshold, control file migration and distribution and realize load balancing.
Specifically, in this embodiment, the NDP device whose process occupation time is greater than the process time consumption threshold or whose idle time is greater than the idle threshold is used as the adjustment object, the NDP device whose storage key value range is adjacent to the adjustment object is used as the adjacent object, and the cluster device management module is configured to adjust the key value threshold between the adjustment object and any adjacent object according to the key value adjustment instruction.
Thus, assuming that 5 NDP devices are installed in a certain system, and the process occupation time of the NDP device 1 is greater than the process elapsed time threshold when a certain compression task is executed, the NDP device 1 is set as an adjustment target, the NDP device 2 is set as an adjacent target, and the key value threshold f1 of the NDP device 1 is adjusted to f1', f1' < f 1. In this way, the range of key values stored in the NDP device 1 is narrowed to reduce the files obtained by compressing the storage layer by the L0 of the NDP device 1, thereby reducing the amount of compression tasks; meanwhile, as the range of the stored key values is expanded, the L0 of the NDP device 2 compresses the storage layer to obtain more files, so as to share the calculation pressure of the NDP device 1, and realize task equalization of the NDP device 1 and the NDP device 2.
Assuming that 5 NDP devices are provided in a certain system and that the idle time of the NDP device 5 is greater than the idle threshold when a certain compression task is executed, the NDP device 5 is set as an adjustment target, the NDP device 4 is set as an adjacent target, and the key value threshold f4 of the NDP device 4 is adjusted to f4', f4' > f 4. Thus, the NDP device 4, due to the expansion of the range of key values, has the L0 compressing the storage layer to obtain more files to share the computational stress of the NDP device 5. The range of key values stored in the NDP device 5 is reduced to reduce the amount of compression tasks for the file obtained by compressing the storage layer by the L0 of the NDP device 5, thereby achieving task equalization between the NDP device 4 and the NDP device 5.
In this way, in the embodiment, by sharing the tasks of the adjacent object and the adjustment object in a balanced manner, compared with the unified adjustment of the key value thresholds of all the NDP devices, the file migration data amount in the subsequent task is reduced, which is beneficial to reducing the calculation pressure caused by file migration.
In this embodiment, the key value adjustment instruction includes an adjustment object and a task time consumption of two adjacent objects in the task process. When the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object with less task time consumption, and the key value storage interval of the adjustment object is reduced. When the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object. Therefore, the key value equalization processing efficiency is further ensured.
Specifically, in this embodiment, the cluster device management module is configured to generate an information object of each NDP device. The information object is used for managing the number of the corresponding NDP device and the key value threshold, and calculating the key value storage interval of each NDP device by combining the key value threshold. The information object is also used for monitoring the communication between the corresponding NDP equipment and the host side and controlling the file deployment and the file compression of the NDP equipment.
In this embodiment, the host side and the NDP device communicate with each other through an ethernet switch.
In this embodiment, both the host side and the NDP device include a cache management module. The host side comprises a cache management module for expanding the host cache function and bearing the cache work of the read operation data. And the NDP device end cache management module is used for bearing the cache of the read data in the compression sorting operation. Therefore, the read operation is ended at the host end as much as possible, and meanwhile, the NDP equipment end caches the data of the latest compression task, so that the spatial locality and the temporal locality of the compression sorting process are effectively utilized.
In this embodiment, the load balancing module is further configured to sense a behavior rule of the read operation, calculate an access frequency for target data of the read operation, and instruct the cache to perform hot and cold data layering.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims (10)

1. A key-value storage oriented near data computing cluster system, comprising: a host side and a plurality of NDP devices; the host end is respectively connected with each NDP device;
the host end includes:
the cluster equipment management module is used for generating an NDP equipment information object, and comprises the steps of setting key value thresholds of the NDP equipment and calculating storage key value intervals of the NDP equipment by combining the key value thresholds; adding and deleting equipment; managing the working state of the equipment;
the file distribution module is used for receiving a file sent by an upper layer application and sending the file to the NDP equipment of the corresponding storage key value interval for storage according to the threshold range of the file;
the file migration module is used for acquiring the compression requirements of each compression storage layer of each NDP device, screening files to be compressed from the compression storage layers corresponding to each NDP device according to the compression requirements, and acquiring files which are overlapped with the key value range of the files to be compressed from the next compression storage layer as supplementary files to be compressed;
and the file migration module is used for migrating the file to be compressed and the supplementary file to be compressed to the NDP equipment in the corresponding storage key value interval according to the key value range to store, compress and sort the file and the supplementary file.
2. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the file migration module comprises: the system comprises a storage monitoring unit, a file copying unit, a file dividing unit and a task sending unit;
the storage monitoring unit is used for acquiring the compression requirements of each compression storage layer of each NDP device and acquiring files to be compressed and compressed supplementary files corresponding to the compression requirements;
the file copying unit is connected with the storage monitoring unit and is used for screening files with key value ranges exceeding the storage key value interval of the currently located NDP equipment from the files to be compressed and the compressed supplementary files as files to be migrated and acquiring the files;
the file dividing unit is connected with the file copying unit, and is used for acquiring a file containing at least one key value threshold value in a key value range from the file to be migrated as a cutting target and dividing the cutting target according to the key value threshold value;
and the task sending unit is respectively connected with the file copying unit and the file dividing unit and is used for acquiring files to be migrated and divided except the cutting target and distributing the files to the NDP equipment corresponding to the stored key value intervals according to the key value range.
3. The key-value-storage-oriented near data computing cluster system of claim 2, wherein the task sending unit is further configured to, when the key value ranges of all the files to be migrated are located in the same storage key value interval, divide all the files to be migrated into two parts, and send the two parts to the compressed storage layer at the host end and the NDP device corresponding to the key value range for compression, respectively.
4. The key-value-storage-oriented near data computing cluster system of claim 3, wherein the task sending unit is further configured to obtain a host-side compressed file and send the host-side compressed file to the NDP device corresponding to the key-value range for storage.
5. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the host further comprises a load balancing module for monitoring task processes of each compressed storage layer of each NDP device, and counting process occupation time and idle time of each NDP device in the same task process; the NDP equipment is also used for generating a key value adjusting instruction according to a comparison result of the process occupation time of each NDP equipment and a preset process time consumption threshold value and a comparison result of the idle time of each NDP equipment and a preset idle threshold value;
the cluster device management module is connected with the load balancing module and used for adjusting the key value threshold of the NDP device according to the key value adjusting instruction.
6. The key-value-storage-oriented near data computing cluster system of claim 5, wherein the load balancing module is further connected to the file distribution module and the file migration module, respectively, for monitoring and monitoring current data deployment and data flow direction;
the load balancing module is used for counting key value sparse conditions and local key value threshold adjusting times of each NDP equipment compression task, and carrying out balancing processing on compression task distribution and data flow direction according to a counting result.
7. The key-value-storage-oriented near data computing cluster system of claim 5, wherein NDP devices with process occupation time greater than a process time-consuming threshold or idle time greater than an idle threshold are used as adjustment objects, NDP devices with a key-value-storage range adjacent to the adjustment objects are used as adjacent objects, and the cluster device management module is configured to adjust the key-value threshold between the adjustment objects and any one of the adjacent objects according to the key-value adjustment instruction.
8. The key-value-storage-oriented near data computing cluster system of claim 7, wherein the key-value adjustment instruction comprises an adjustment object and task time consumption of two adjacent objects in the current task process; when the process occupation time of the adjustment object is larger than the process time consumption threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and an adjacent object with less task time consumption, and reduces the key value storage interval of the adjustment object; when the idle time of the adjustment object is larger than the idle threshold, the cluster equipment management module adjusts the key value threshold between the adjustment object and the adjacent object which consumes more time, and enlarges the key value storage interval of the adjustment object.
9. The key-value-storage-oriented near data computing cluster system of claim 1, wherein the host side and the NDP device each comprise a cache management module, the host side comprises a cache management module for expanding a host cache function, and the NDP device side cache management module is for assuming caching of read data in a compression sorting operation.
10. The key-value-storage-oriented near data computing cluster system of any one of claims 1-9, wherein the host-side and the NDP device communicate via an ethernet switch.
CN202010668559.7A 2020-07-13 2020-07-13 Key value storage-oriented near data computing cluster system Active CN111897784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010668559.7A CN111897784B (en) 2020-07-13 2020-07-13 Key value storage-oriented near data computing cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010668559.7A CN111897784B (en) 2020-07-13 2020-07-13 Key value storage-oriented near data computing cluster system

Publications (2)

Publication Number Publication Date
CN111897784A true CN111897784A (en) 2020-11-06
CN111897784B CN111897784B (en) 2022-12-06

Family

ID=73192481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010668559.7A Active CN111897784B (en) 2020-07-13 2020-07-13 Key value storage-oriented near data computing cluster system

Country Status (1)

Country Link
CN (1) CN111897784B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779024A (en) * 2021-08-05 2021-12-10 安徽大学 Asynchronous parallel optimization method for key value storage system under near data processing architecture
CN118051643A (en) * 2024-02-23 2024-05-17 中国科学院信息工程研究所 Metadata sparse distribution-oriented LSM data organization method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198074A1 (en) * 2004-03-08 2005-09-08 Transreplicator, Inc. Apparatus, systems and methods for relational database replication and proprietary data transformation
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN107479833A (en) * 2017-08-21 2017-12-15 中国人民解放军国防科技大学 Key value storage-oriented remote nonvolatile memory access and management method
US9870168B1 (en) * 2014-12-22 2018-01-16 Emc Corporation Key-value store with internal key-value storage interface
CN110928483A (en) * 2018-09-19 2020-03-27 华为技术有限公司 Data storage method, data acquisition method and equipment
CN110995871A (en) * 2019-12-24 2020-04-10 浪潮云信息技术有限公司 Method for realizing high availability of KV storage service
CN111400312A (en) * 2020-02-25 2020-07-10 华南理工大学 Edge storage database based on improved L SM tree

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198074A1 (en) * 2004-03-08 2005-09-08 Transreplicator, Inc. Apparatus, systems and methods for relational database replication and proprietary data transformation
US9870168B1 (en) * 2014-12-22 2018-01-16 Emc Corporation Key-value store with internal key-value storage interface
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN107479833A (en) * 2017-08-21 2017-12-15 中国人民解放军国防科技大学 Key value storage-oriented remote nonvolatile memory access and management method
CN110928483A (en) * 2018-09-19 2020-03-27 华为技术有限公司 Data storage method, data acquisition method and equipment
CN110995871A (en) * 2019-12-24 2020-04-10 浪潮云信息技术有限公司 Method for realizing high availability of KV storage service
CN111400312A (en) * 2020-02-25 2020-07-10 华南理工大学 Edge storage database based on improved L SM tree

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HUI SUN 等: "《DStore: A Holistic Key-Value Store Exploring Near-Data Processing and On-Demand Scheduling for Compaction Optimization》", 《IEEE》 *
HUISUN等: "《Co-KV: A Collaborative Key-Value Store Using Near-Data Processing to Improve Compaction for the LSM-tree》", 《ARXIV》 *
刘伟: "《基于近数据计算的LSM-tree键值存储系统Compaction优化方法》", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
王洋洋等: "基于SSD-SMR混合存储的LSM树键值存储系统的性能优化", 《计算机科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779024A (en) * 2021-08-05 2021-12-10 安徽大学 Asynchronous parallel optimization method for key value storage system under near data processing architecture
CN113779024B (en) * 2021-08-05 2024-02-09 安徽大学 Asynchronous parallel optimization method for key value storage system under near data processing architecture
CN118051643A (en) * 2024-02-23 2024-05-17 中国科学院信息工程研究所 Metadata sparse distribution-oriented LSM data organization method and device
CN118051643B (en) * 2024-02-23 2024-11-05 中国科学院信息工程研究所 Metadata sparse distribution-oriented LSM data organization method and device

Also Published As

Publication number Publication date
CN111897784B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
US11874833B2 (en) Selective operating system configuration of processing resources of a database system
US10374792B1 (en) Layout-independent cryptographic stamp of a distributed dataset
US11586366B2 (en) Managing deduplication characteristics in a storage system
US10019459B1 (en) Distributed deduplication in a distributed system of hybrid storage and compute nodes
US11429630B2 (en) Tiered storage for data processing
Li et al. SCALLA: A platform for scalable one-pass analytics using MapReduce
CN111897784B (en) Key value storage-oriented near data computing cluster system
US11288186B2 (en) Adjustment of garbage collection parameters in a storage system
US10678788B2 (en) Columnar caching in tiered storage
CN112559459B (en) Cloud computing-based self-adaptive storage layering system and method
CN113486026A (en) Data processing method, device, equipment and medium
CN116982035A (en) Measurement and improvement of index quality in distributed data systems
KR101872414B1 (en) Dynamic partitioning method for supporting load balancing of distributed RDF graph
CN117331487A (en) Data deduplication method and related system
WO2024169851A1 (en) Data compression method, system, and device, and computer readable storage medium
CN114116790A (en) Data processing method and device
CN114930281A (en) Dynamic adaptive partition partitioning
CN116431615A (en) Flexible data partition routing method for complex service scene
CN118394784A (en) Data query method, system, device cluster, medium and program product
CN115509702A (en) Cluster computing system and operation method thereof
DE112021002652T5 (en) TECHNIQUES TO ESTIMATE DEDUPLICATION BETWEEN VOLUMES
CN111897783B (en) KV storage system optimization method for executing parallel tasks based on multi-NDP collaborative host
CN118502683B (en) Task processing method and system for memory chip
US20230409530A1 (en) Grouping data to conserve storage capacity
KR102024846B1 (en) File system program and method for controlling data cener using it

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant