Nothing Special   »   [go: up one dir, main page]

CN111176584B - Data processing method and device based on hybrid memory - Google Patents

Data processing method and device based on hybrid memory Download PDF

Info

Publication number
CN111176584B
CN111176584B CN201911424993.4A CN201911424993A CN111176584B CN 111176584 B CN111176584 B CN 111176584B CN 201911424993 A CN201911424993 A CN 201911424993A CN 111176584 B CN111176584 B CN 111176584B
Authority
CN
China
Prior art keywords
data
memory
node
ram
heat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911424993.4A
Other languages
Chinese (zh)
Other versions
CN111176584A (en
Inventor
郭庆
谢莹莹
于宏亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Sugon Information Industry Chengdu Co ltd
Dawning Information Industry Beijing Co Ltd
Original Assignee
Zhongke Sugon Information Industry Chengdu Co ltd
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Sugon Information Industry Chengdu Co ltd, Dawning Information Industry Beijing Co Ltd filed Critical Zhongke Sugon Information Industry Chengdu Co ltd
Priority to CN201911424993.4A priority Critical patent/CN111176584B/en
Publication of CN111176584A publication Critical patent/CN111176584A/en
Application granted granted Critical
Publication of CN111176584B publication Critical patent/CN111176584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0623Securing storage systems in relation to content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The application provides a data processing method and device based on a hybrid memory. The method is applied to one node in a distributed storage system, the distributed storage system comprises a plurality of nodes which are connected with each other in a communication way, each node comprises an HFDD and an external memory, the HFDD comprises a memory and a solid state disk SSD, the memory comprises a random access memory RAM and an NVDIMM, and the method comprises the following steps: calculating the heat of each data, wherein the heat represents the frequency of the corresponding data being accessed; and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory. In the embodiment of the application, the HFDD is fault-tolerant distributed data abstraction based on the RAM and NVM hybrid memory, and the data is stored according to the heat of the data, so that the storage capacity of the memory is improved on one hand, and the data access efficiency is improved on the other hand.

Description

Data processing method and device based on hybrid memory
Technical Field
The application relates to the technical field of computers, in particular to a data processing method and device based on a hybrid memory.
Background
The existing big data computing technology can process PB data, the concept of memory computing has the necessary reasons, all data are loaded into a memory in an initialization stage in a memory computing mode, data and query operations are executed in a high-speed memory, and a CPU directly reads the data from the memory to perform real-time computation and analysis, so that disk data access is reduced, the influence of network and disk I/O is reduced, the data throughput and the processing speed of computing processing are greatly improved, and the I/O cost which originally occupies a large amount of computing resources is reduced. Through the application of memory calculation, the I/O bottleneck is avoided, and the calculation result in the time of hours and days can be completed in seconds in the memory calculation environment.
Currently, the main memory computing technology is memory computing based on a single node. The single-node memory computing system operates on a single physical node that has one or more processors and shared memory, and the memory structure may be a centralized shared memory or a non-coherent shared memory. The memory calculation on a single node utilizes a multi-core CPU, adopts large memory and multithreading parallelism, and fully plays the calculation efficiency of a single machine. However, for memory computation of a single node, when the memory is insufficient due to limitation of hardware resources, the processing efficiency of data is affected.
Disclosure of Invention
The embodiment of the application aims to provide a data processing method and device based on a hybrid memory, which are used for solving the problem of low data processing efficiency in the prior art.
In a first aspect, an embodiment of the present application provides a data processing method based on a hybrid memory, applied to a node in a distributed storage system, where the distributed storage system includes a plurality of nodes that are communicatively connected to each other, each node includes a hybrid memory, a fault-tolerant distributed data set HFDD, and an external storage, where the HFDD includes a memory and a solid state disk SSD, and where the memory includes a random access memory RAM and a nonvolatile dual inline memory module NVDIMM, the method includes: calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed; and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
In the embodiment of the application, the HFDD is fault-tolerant distributed data abstraction based on the RAM and NVM hybrid memory, and the data is stored according to the heat of the data, so that the storage capacity of the memory is improved on one hand, and the data access efficiency is improved on the other hand.
Further, the storing each data according to the heat of each data and the storage capacities corresponding to the memory, the SSD and the external memory respectively includes: according to the storage capacity of the RAM in the memory, starting from the data with highest heat, storing the data with the storage capacity smaller than or equal to that of the RAM into the RAM; according to the storage capacity of the SSD, starting from the data with highest heat in the remaining data, storing the data with the storage capacity smaller than or equal to the storage capacity into the SSD; the remaining data is stored in the external memory. According to the embodiment of the application, the data is distributed and stored through the heat of the data and the capacities of the memory, the SSD and the external memory, and the efficiency of the node for accessing the data is ensured on the basis of improving the memory capacity.
Further, the method further comprises: and receiving an access request, and if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the memory size required by the data to be accessed, and storing the data to be accessed in the RAM. According to the embodiment of the application, the performance guarantee under the premise of high-capacity hybrid memory is realized through the data swap-in and swap-out technology between the RAM and the SSD.
Further, the method further comprises: if the node is abnormally powered down, before the abnormal power down, the working state data corresponding to the node is stored into an external memory from the NVDIMM; and after the state of the node is recovered from failure to be normal, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data. The embodiment of the application can ensure the data reliability after power failure outside the system by using the NVDIMM.
Further, the method further comprises: determining a time node for creating a copy; and if the time node is reached, sending a copy creation request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creation request.
Further, the method further comprises: if the data in one node is lost, a copy which is created last time from the current moment is acquired from other nodes; and sending the data in the copy to a node losing the data. The embodiment of the application realizes the data reliability assurance among multiple nodes by a data multi-copy fault tolerance technology in a distributed environment.
Further, the method further comprises: generating a snapshot of the HFDD in the node according to a preset period in the running process of the distributed storage system; and determining the storage position of the corresponding snapshot according to the time of generating each snapshot. According to the embodiment of the application, the snapshot is generated according to the preset period, so that the reliability of the data in the node is ensured.
In a second aspect, an embodiment of the present application provides a hybrid memory-based data processing apparatus, applied to a node in a distributed storage system, where the distributed storage system includes a plurality of nodes communicatively connected to each other, each node includes a hybrid memory and a fault-tolerant distributed data set HFDD and an external memory, the HFDD includes a memory and a solid state disk SSD, the memory includes a random access memory RAM and a nonvolatile dual inline memory module NVDIMM, and the method includes:
the heat calculation module is used for calculating the heat of each data, wherein the heat represents the frequency of the corresponding data being accessed;
and the data storage module is used for storing the data according to the heat of the data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
In a third aspect, an embodiment of the present application provides a distributed storage system, including a plurality of the devices according to the second aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus, wherein,
the processor and the memory complete communication with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of the first aspect.
In a fifth aspect, embodiments of the present application provide a non-transitory computer readable storage medium comprising:
the non-transitory computer-readable storage medium stores computer instructions that cause the computer to perform the method of the first aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a distributed storage system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a distributed system according to an embodiment of the present application;
FIG. 3 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a device structure according to an embodiment of the present application;
fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
Because single-node memory computation is limited by hardware resources, problems in terms of hardware scalability are faced when processing larger-scale data. In the context of rapid development of large-scale distributed data processing technologies represented by MapReduce, one has also begun to implement memory computation on distributed systems. The memory computation utilizes a cluster formed by a plurality of computers to construct a distributed large memory, and the data to be processed is stored in the distributed memory through uniform resource scheduling, so that the rapid access and processing of large-scale data are realized.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application, where, as shown in fig. 1, the system includes a plurality of nodes, each of which may be formed by a server, and the plurality of nodes are connected by a network I/O communication, and in particular may be interconnected by a high-speed ethernet. Each node comprises a Hybrid-memory and Fault-tolerant distributed data set (HFDD) and an external memory, wherein the HFDD comprises a Random Access Memory (RAM), a nonvolatile memory (NVDIMM) and a Solid State Disk (SSD), and the RAM is an internal memory for directly exchanging data with a CPU, and is also called as a main memory (memory). It can be read and written at any time and is fast, usually as a temporary data storage medium for an operating system or other program in operation. According to the working principle of the memory unit, the random access memory is further divided into a Static RAM (SRAM) and a Dynamic RAM (DRAM). A non-volatile dual in-line memory module (NVDIMM) is a random access memory for a computer. Nonvolatile memory is memory that retains its contents even when powered down, including unexpected power down, system crashes, or normal shutdown. In the embodiment of the application, the RAM and the NVDIMM are called as a memory. The non-volatile memory (NVM) is a memory capable of maintaining data after power failure, and has the advantages of non-volatility, byte-by-byte access, high storage density, low energy consumption, and read-write performance similar to DRAM, but with asymmetric read-write speed and limited lifetime.
The RAM and the NVDIMM form GB-level data storage, and the RAM can realize 40GB/S data transmission. While SSD constitutes a TB-level data storage with a data transfer efficiency of 1GB/S. Therefore, although the storage capacity of the SSD is large, the data transmission is slow, and the memory has higher data transmission efficiency compared with the SSD. An external memory (hard disk) in each node, which refers to a memory other than the computer memory and the CPU cache, is referred to as an external memory in the embodiment of the present application. Such a memory typically retains data after power is turned off. The capacity of the external memory is of the order of 10TB, but its speed of data transfer is only about 0.2GB/S.
In a distributed storage system, HFDDs in each node are connected through network I/O communication, and the HFDDs of all nodes jointly form the HFDD of the system.
It should be noted that, in fig. 1, a system formed by only three nodes is shown, and in practical application, the number of nodes in the system may be set according to practical situations, which is not limited in particular by the embodiment of the present application.
The external memories in all nodes in the system constitute a file storage cluster in which all data in the system is stored. In a conventional environment, each piece of data required for computation needs to be read out from the bottom layer, which is suitable for applications where many pieces of data are not reused. However, we also see an increasing number of other application scenarios: some data sets are frequently accessed multiple times, including iterative computations within machine learning and image processing (each iterative step accessing the same data), and interactive data mining techniques (multiple queries to the same data set by a user), among others. Fig. 2 is a schematic storage diagram of a distributed system according to an embodiment of the present application, where, as shown in fig. 2, data of a hybrid storage system is divided into three layers according to performance levels: the storage device comprises a memory, an SSD and an external memory, wherein the memory comprises a RAM and an NVDIMM, and the storage capacity is sequentially increased. All data required by execution of each subtask is stored in the external memory, and a user can select to cache some data into the memory when submitting the tasks. To this end, the system defines the data of all operations in data sets, which a user can cache for multiple frequent accesses. Since the cache is located in a separate node, other nodes cannot see, the cached data can only be read and cannot be modified, otherwise, the modification of one node cannot be obtained by other nodes, which can lead to inconsistent data.
The user should not cache a very large data set into memory, but should cache a data set that may be accessed multiple times, taking into account the size of the data set and the memory when deciding which portion of the data needs to be cached. In order to accommodate more application scenarios and service larger data sets, the hybrid storage system incorporates an SSD layer, which is interposed between the memory and the external memory, not directly used to store user data, but rather as an extension of the memory. The data set which is required to be cached by the user can be automatically moved to the SSD when the data set is not stored in the memory, and the data set is not required to be acquired from the slow external memory when the data set is accessed next time, so that the performance of the user task is greatly improved.
As shown in fig. 1, the RAM using the NVDIMM can ensure the data reliability after the system is accidentally powered off, and the SSD as the NVM device can effectively expand the capacity of the HFDD.
The HFDD provides basic management operation primitives as a hybrid memory-based dataset including: creation, destruction, updating, etc. of HFDD; while HFDD supports various basic computing operations thereon, computing is largely classified into 2 classes: 1) Basic calculation, namely returning a numerical value after calculating the data set; for example: the Reduce aggregates all elements of the data set with a function and returns the final result to the program; 2) Conversion calculation: creating a new data set according to the data set, and returning a new HFDD after calculation; for example: after each element of the data is calculated by the Map through a certain function, a new HFDD is returned.
Based on the system provided in the foregoing embodiment, an embodiment of the present application provides a data processing method, as shown in fig. 3, where the method is applied to one node in a distributed storage system, and it can be understood that the node is any node in the system, and the method includes:
step 301: calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed;
step 302: and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
In a specific implementation process, the node calculates the heat of each data in real time, wherein a data heat calculation program can be configured in the node in advance, and the heat of each data is calculated through the heat calculation program. In short, the number of times each data is accessed in a preset time period can be calculated, and the more the number of times, the higher the heat, and conversely, the fewer the number of times, the lower the heat.
In order to reduce the amount of computation, the heat of the data stored in the memory and the SSD and the data accessed in the preset period of time may be computed. While for data that is stored in the external memory and that has not been accessed for a preset period of time, its heat may not be calculated.
Because the capacities of the memory, the SSD and the external memory are limited, and the data transmission speeds of the memory, the SSD and the external memory are different, the data can be distributed and stored according to the heat of the data and the capacities of the memory, the SSD and the external memory. The high heat is stored in the internal memory preferentially, and the low heat is stored in the external memory. Namely, according to the storage capacity of the RAM in the memory, starting from the data with highest heat, storing the data with the storage capacity smaller than or equal to that of the RAM into the RAM; according to the storage capacity of the SSD, starting from the data with highest heat in the rest data, storing the data smaller than or equal to the storage capacity into the SSD; finally, the remaining data is stored in an external memory.
It should be noted that the type of memory may be replaced with other memories having the same function, in addition to the RAM, NDIMM, and SSD.
In the embodiment of the application, the HFDD is fault-tolerant distributed data abstraction based on the RAM and NVM hybrid memory, and the data is stored according to the heat of the data, so that the storage capacity of the memory is improved on one hand, and the data access efficiency is improved on the other hand.
On the basis of the above embodiment, the method further includes:
and receiving an access request, and if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the memory size required by the data to be accessed, and storing the data to be accessed in the RAM.
In a specific implementation process, due to the bandwidth difference between the RAM and the NVM, the technology of data swap-in and swap-out between volatile and nonvolatile storage becomes a key for improving the access speed of the HFDD on a single node on the basis of ensuring capacity. The swap-in and swap-out technology is realized by adding a heterogeneous storage medium management module (application program) into an operating system of a node, so that volatile storage and nonvolatile storage with different characteristics and different performances are managed in a unified way, and a compatible block interface is presented to the outside. Meanwhile, the data exchange management mechanism is used for placing frequently used data on volatile storage with higher access speed, namely RAM, and placing data which is not accessed for a relatively long time on nonvolatile storage, namely SSD, through monitoring information. When the access request received by the node is data which needs to be stored on the SSD, and under the condition that the storage capacity of the RAM is full, the data which is stored in the RAM is swapped out according to a flexible and extensible strategy, and the data to be accessed which is stored in the SSD is stored in the RAM. Wherein the size of the data swapped out of the RAM should not be smaller than the size of the data to be accessed that needs to be swapped in. In addition, data may be stored in the NVDIMM, i.e., when the memory capacity in the RAM is full, data may be stored in the VNDIMM, and swap-in and swap-out data may be performed between the RAM and the VNDIMM.
In addition, the scalable policy may be understood as a policy for determining a volatile memory block in the RAM that needs to be swapped out, and specifically, a volatile memory block with the lowest heat degree in the RAM may be swapped out.
It should be noted that if the data to be accessed is stored in the SSD and the size of the remaining capacity in the RAM is larger than the size of the data to be accessed, the data to be accessed may be directly stored in the RAM.
When the volatile memory block is swapped in and swapped out, the limitation of erasable times of storage media such as nonvolatile storage needs to be considered, and if the current heat of the swapped-out volatile memory block is lower than a preset value, the volatile memory block is stored in an external memory instead of being stored in an SSD.
According to the embodiment of the application, the performance guarantee under the premise of high-capacity hybrid memory is realized through the data swap-in and swap-out technology between the RAM and the SSD.
On the basis of the above embodiment, the method further includes: if the node is abnormally powered down, before the abnormal power down, the working state data corresponding to the node is stored into an external memory from the NVDIMM; and after the state of the node is recovered from failure to be normal, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
In a specific implementation process, after the system is powered down, information stored in the RAM will be lost at a power-down node, and in order to solve this problem, an NVDIMM is introduced into the node. The NVDIMM is a memory bank specification integrating a DRAM and a nonvolatile memory chip, can still store complete memory data when the power is completely cut off, can solve the problem of memory data storage work under the condition of abnormal power failure of a system, and can continue the previous work after the system is restored to normal operation.
In the running process of the storage system, working state data are stored in the NVDIMM, the NVDIMM can store the working state data (including hardware devices such as a CPU, a bridge chip, a network card and the like and all processes in the node) of the whole node before power failure into an external memory in a short time, and the external memory can be a magnetic disk or the like. After the node is powered up again, working state data is obtained from the external memory and stored in the NVDIMM, so that the system can be restored to the previous running state.
On the basis of guaranteeing the reliability of the single-node volatile storage data, the faults of the nodes can be found in time through heartbeat information which is mutually transmitted between the nodes on the basis of NVDIMM. In addition, the data copy of the node can be stored in other nodes, and if the data in one node is lost, the lost data can be retrieved by the data copy in the other nodes, and the node can normally operate.
The embodiment of the application can ensure the data reliability after power failure outside the system by using the RAM of the NVDIMM.
On the basis of the above embodiment, the method further includes: determining a time node for creating a copy; and if the time node is reached, sending a copy creation request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creation request.
In a specific implementation process, the reliability of the data can be ensured by storing the data on multiple nodes in multiple copies, however, the direct redundant storage method can bring about large space waste and data network transmission cost. In order to reduce the data redundancy space and the data network transmission cost to the maximum extent while ensuring the data reliability in volatile storage, the embodiment of the application adopts a multi-dimensional fault tolerance technology combining multiple copies and check points. That is, a complete basic operation set is defined, the execution plan of the load is converted into an operation evolution process for the HFDD, and the evolution process of each load operation set is logged into log information. The basic operation set refers to an instruction set for performing basic operations such as read-write query analysis on data. The execution plan of the load refers to the stepwise process of the task in the system execution process and the corresponding operation plan. Instead of multi-copy storing the data in the HFDD for each stage during load execution, a key check point, namely a time node for creating a copy, is properly selected in combination with execution planning and data distribution, and only the data in the HFDD on the check point is redundantly stored.
If the data in the node is lost, the copy created last time from the current moment can be obtained from other nodes storing the data copy of the node.
The embodiment of the application realizes the data reliability assurance among multiple nodes by a data multi-copy fault tolerance technology in a distributed environment.
On the basis of the above embodiment, the method further includes: generating a snapshot of the HFDD in the node according to a preset period in the running process of the distributed storage system; and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
In a specific implementation process, the reliability of the data set in the HFDD is guaranteed based on the snapshot, and the snapshot of the HFDD in the node is generated according to a preset period, so that a recoverable time point for providing a plurality of data can be guaranteed, and meanwhile, the influence on the system performance caused by frequent backup of the data can be avoided. Thus, the snapshot-based data backup technique provides the following solution of aspect 4. First, by adopting an efficient snapshot generation mechanism, backup can be quickly established and less system resources are consumed. Meanwhile, the generated snapshot is stored in an external memory disk, and the snapshot to be accessed is stored in a memory according to the use condition of the snapshot, so that the data can be recovered quickly. Second, a snapshot mechanism with multiple recovery points is established, and by setting up multiple recovery points, the user is helped to more reliably implement recovery, and losses caused by downtime of the system or other accidents are minimized. It is understood that the recovery point is a snapshot. Thirdly, an integrated snapshot backup management scheme is established aiming at the management problem of a large number of snapshots, a convenient GUI is provided, and a user is helped to quickly find out required backups. Fourth, to the problem of storage space caused by a large number of snapshot backups, multi-level storage of snapshot backup data is realized. Storing the user's recent backup in external memory that facilitates quick recovery, for example: in RAM, and store the long-time snapshot in a low-cost, slower-speed external memory, such as: SSD or external memory. At the same time, a set of backup elimination mechanisms is also established, and policy-based elimination (e.g., deleting 50% of the backups a year ago) is performed for the more remote snapshot backups.
According to the embodiment of the application, the snapshot is generated according to the preset period, so that the reliability of the data in the node is ensured.
On the basis of the above embodiment, the HDFF in the node may also be fragmented. The data slicing mode has decisive influence on the query plan and the query performance, and the embodiment of the application provides data slicing strategies under various distributed environments to improve the efficiency of different queries, and the specific slicing strategies can be divided into two types: static slicing strategy and dynamic slicing strategy.
Static sharding policies are explicitly specified by an application or user before importing data. Comprising the following steps: horizontal slicing, hash slicing by column and user-defined slicing. The horizontal splitting refers to that input data is divided equally by combining the data quantity and the number of nodes so as to ensure the load balance of the data on a plurality of nodes. The horizontal slicing has good effects on data operations such as selection, aggregation and the like, the operations do not involve transmission of data among a plurality of nodes, and the horizontal slicing can ensure load balancing among tasks. However, for data operations such as grouping, concatenation, etc., simple horizontal fragmentation can result in a large number of data network transfers, and HFDD provides a strategy for hash fragmentation for one or more columns. The hash slicing can enable data meeting the same grouping or connection condition to be divided into data slices on the same node, so that data transmission cost is reduced. When the data is unevenly distributed on the hash key value, the fixed hash algorithm can lead to uneven data distribution, so that the computing resources of the cluster cannot be effectively utilized. Therefore, the embodiment of the application also provides a user-defined slicing algorithm, which can enable the user to define a hash algorithm to realize slicing according to the distribution condition of data.
In order to reduce the user use burden and provide an optimal slicing method in combination with different load types, the embodiment of the application provides a dynamic data partitioning strategy. The partitioning strategy does not require a user to self-formulate a partitioning method and is tightly combined with query optimization. The nodes construct different partition strategies and corresponding query plans based on the characteristics of the user load and the statistical information of the data distribution, for example: for some data, a horizontal slicing strategy is used, while another part of data adopts a column hash strategy, etc. And selecting the query plan and the fragmentation strategy with the minimum cost, namely the optimal query plan and the optimal fragmentation strategy in the partition strategy and the query plan space by utilizing a RRS (Recursive Random Search) black box optimization technology.
Fig. 4 is a schematic structural diagram of an apparatus provided in an embodiment of the present application, where the apparatus may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus corresponds to the above embodiment of the method of fig. 3, and is capable of executing the steps involved in the embodiment of the method of fig. 3, and specific functions of the apparatus may be referred to in the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy. The apparatus is applied to a node in a distributed storage system, the distributed storage system including a plurality of nodes communicatively connected to each other, each node including a hybrid memory and fault tolerant distributed data set HFDD and an external storage, the HFDD including a memory and a solid state disk SSD, the memory including a random access memory RAM and a non-volatile dual inline memory module NVDIMM, the apparatus comprising: a heat calculation module 401 and a data storage module 402, wherein:
the heat calculating module 401 is configured to calculate heat of each data, where the heat represents how frequently the corresponding data is accessed; the data storage module 402 is configured to store each data according to the heat of each data and the storage capacities corresponding to the RAM, the NDIMM memory, the SSD, and the external memory, respectively.
On the basis of the above embodiment, the data storage module is specifically configured to:
according to the storage capacity of the RAM in the memory, starting from the data with highest heat, storing the data with the storage capacity smaller than or equal to that of the RAM into the RAM;
according to the storage capacity of the SSD, starting from the data with highest heat in the remaining data, storing the data with the storage capacity smaller than or equal to the storage capacity into the SSD;
the remaining data is stored in the external memory.
On the basis of the embodiment, the device further comprises a swap-in and swap-out module for:
and receiving an access request, and if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the memory size required by the data to be accessed, and storing the data to be accessed in the RAM.
On the basis of the embodiment, the device further comprises a power-down protection module, which is used for:
if the node is abnormally powered down, before the abnormal power down, the working state data corresponding to the node is stored into an external memory from the NVDIMM;
and after the state of the node is recovered from failure to be normal, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
On the basis of the above embodiment, the apparatus further includes a copy creation module configured to:
determining a time node for creating a copy;
and if the time node is reached, sending a copy creation request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creation request.
On the basis of the above embodiment, the apparatus further includes a data recovery module configured to:
if the data in one node is lost, the copy created last time from the current moment is acquired from other nodes.
On the basis of the above embodiment, the apparatus further includes a snapshot generating module, configured to:
generating a snapshot of the HFDD in the node according to a preset period in the running process of the distributed storage system;
and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
In summary, the embodiment of the application realizes the technology of data exchange between volatile and nonvolatile storage and data exchange through HFDD, and ensures the single machine access performance under the premise of high-capacity hybrid memory; through a data slicing strategy in a distributed environment, the HFDD access performance of the data set global management and cluster hierarchy is realized; the reliability guarantee of the single machine volatile storage data and the reliability guarantee technology of the data set based on the snapshot are realized, and the reliability of the data set of the node level is guaranteed; the multi-copy fault-tolerant technology of the data in the distributed environment is realized, and the reliability of the data set of the cluster level is ensured.
Fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present application, as shown in fig. 5, where the electronic device includes: a processor (processor) 501, a memory (memory) 502, and a bus 503; wherein,,
the processor 501 and the memory 502 complete communication with each other via the bus 503;
the processor 501 is configured to invoke the program instructions in the memory 502 to perform the methods provided in the above method embodiments, for example, including: calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed; and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
The processor 501 may be an integrated circuit chip having signal processing capabilities. The processor 501 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Which may implement or perform the various methods, steps, and logical blocks disclosed in embodiments of the application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 502 may include, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example comprising: calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed; and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
The present embodiment provides a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed; and storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (9)

1. A hybrid memory-based data processing method, applied to a node in a distributed storage system, the distributed storage system including a plurality of nodes communicatively connected to each other, each node including a hybrid memory and a fault tolerant distributed data set HFDD and an external storage, the HFDD including a memory and a solid state disk SSD, the memory including a random access memory RAM and a non-volatile dual inline memory module NVDIMM, the method comprising:
calculating the heat degree of each data, wherein the heat degree represents the frequency of the corresponding data being accessed;
storing each data according to the heat of each data and the storage capacities corresponding to the memory, the SSD and the external memory respectively;
the method further comprises the steps of:
determining a time node for creating a copy;
if the time node is reached, sending a copy creation request to other nodes so that the other nodes create copies of corresponding data after receiving the copy creation request;
the determining a time node at which the copy was created includes:
the method comprises the steps of acquiring a basic operation set, converting an execution plan of a load into an operation evolution process of the HFDD, and logging the evolution process of each load operation set into log information; the basic operation set refers to an instruction set for performing basic operations such as read-write query analysis on data; the execution plan of the load refers to the step-by-step process and the corresponding operation plan of the task in the system execution process;
and in the process of loading the execution result, creating a time node of the copy by combining the execution plan and the data distribution.
2. The method of claim 1, wherein storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD, and the external memory comprises:
according to the storage capacity of the RAM in the memory, starting from the data with highest heat, storing the data with the storage capacity smaller than or equal to that of the RAM into the RAM;
according to the storage capacity of the SSD, starting from the data with highest heat in the remaining data, storing the data with the storage capacity smaller than or equal to the storage capacity into the SSD;
the remaining data is stored in the external memory.
3. The method according to claim 1, wherein the method further comprises:
and receiving an access request, and if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the memory size required by the data to be accessed, and storing the data to be accessed in the RAM.
4. The method according to claim 1, wherein the method further comprises:
if the node is abnormally powered down, before the abnormal power down, the working state data corresponding to the node is stored into an external memory from the NVDIMM;
and after the state of the node is recovered from failure to be normal, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
5. The method according to claim 1, wherein the method further comprises:
if the data in one node is lost, the copy created last time from the current moment is acquired from other nodes.
6. The method according to claim 1, wherein the method further comprises:
generating a snapshot of the HFDD in the node according to a preset period in the running process of the distributed storage system;
and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
7. A hybrid memory-based data processing apparatus for use in a node in a distributed storage system, the distributed storage system comprising a plurality of nodes communicatively coupled to each other, each node comprising a hybrid memory and a fault tolerant distributed data set, HFDD, and an external storage, the HFDD comprising a memory and a solid state disk, SSD, the memory comprising a random access memory, RAM, and a non-volatile dual inline memory module, NVDIMM, the apparatus comprising:
the heat calculation module is used for calculating the heat of each data, wherein the heat represents the frequency of the corresponding data being accessed;
the data storage module is used for storing each data according to the heat of each data and the storage capacities respectively corresponding to the memory, the SSD and the external memory;
the apparatus further comprises a copy creation module for:
determining a time node for creating a copy;
if the time node is reached, sending a copy creation request to other nodes so that the other nodes create copies of corresponding data after receiving the copy creation request;
the determining a time node at which the copy was created includes:
the method comprises the steps of acquiring a basic operation set, converting an execution plan of a load into an operation evolution process of the HFDD, and logging the evolution process of each load operation set into log information; the basic operation set refers to an instruction set for performing basic operations such as read-write query analysis on data; the execution plan of the load refers to the step-by-step process and the corresponding operation plan of the task in the system execution process;
and in the process of loading the execution result, creating a time node of the copy by combining the execution plan and the data distribution.
8. An electronic device, comprising: a processor, a memory, and a bus, wherein,
the processor and the memory complete communication with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-6.
9. A non-transitory computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-6.
CN201911424993.4A 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory Active CN111176584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911424993.4A CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911424993.4A CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Publications (2)

Publication Number Publication Date
CN111176584A CN111176584A (en) 2020-05-19
CN111176584B true CN111176584B (en) 2023-10-31

Family

ID=70656050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911424993.4A Active CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Country Status (1)

Country Link
CN (1) CN111176584B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114063888A (en) * 2020-07-31 2022-02-18 中移(苏州)软件技术有限公司 Data storage system, data processing method, terminal and storage medium
CN111984201B (en) * 2020-09-01 2023-01-31 云南财经大学 Astronomical observation data high-reliability acquisition method and system based on persistent memory
CN115129230A (en) * 2021-03-26 2022-09-30 华为技术有限公司 Cache management method and storage device
CN113076135B (en) * 2021-04-06 2023-12-26 谷芯(广州)技术有限公司 Logic resource sharing method for special instruction set processor

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694863A (en) * 2012-05-30 2012-09-26 电子科技大学 Realization method of distributed storage system on basis of load adjustment and system fault tolerance
CN103810113A (en) * 2014-01-28 2014-05-21 华中科技大学 Fusion memory system of nonvolatile memory and dynamic random access memory
US8751725B1 (en) * 2012-01-27 2014-06-10 Netapp, Inc. Hybrid storage aggregate
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
CN105183379A (en) * 2015-09-01 2015-12-23 上海新储集成电路有限公司 Mixed memory data backup system and method
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device
CN107168657A (en) * 2017-06-15 2017-09-15 深圳市云舒网络技术有限公司 It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored
CN107797944A (en) * 2017-10-24 2018-03-13 郑州云海信息技术有限公司 A kind of hierarchy type isomery mixing memory system
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN108255712A (en) * 2017-12-29 2018-07-06 曙光信息产业(北京)有限公司 The test system and test method of data system
CN108762671A (en) * 2018-05-23 2018-11-06 中国人民解放军陆军工程大学 Hybrid memory system based on PCM and DRAM and management method thereof
CN109901800A (en) * 2019-03-14 2019-06-18 重庆大学 A kind of mixing memory system and its operating method
CN110347336A (en) * 2019-06-10 2019-10-18 华中科技大学 A kind of key assignments storage system based on NVM with SSD mixing storage organization

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785767B2 (en) * 2000-12-26 2004-08-31 Intel Corporation Hybrid mass storage system and method with two different types of storage medium
WO2005008524A1 (en) * 2003-07-16 2005-01-27 Joltid Ltd. Distributed database system
US20110179219A1 (en) * 2004-04-05 2011-07-21 Super Talent Electronics, Inc. Hybrid storage device
US10698628B2 (en) * 2015-06-09 2020-06-30 Ultrata, Llc Infinite memory fabric hardware implementation with memory

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751725B1 (en) * 2012-01-27 2014-06-10 Netapp, Inc. Hybrid storage aggregate
CN102694863A (en) * 2012-05-30 2012-09-26 电子科技大学 Realization method of distributed storage system on basis of load adjustment and system fault tolerance
CN103810113A (en) * 2014-01-28 2014-05-21 华中科技大学 Fusion memory system of nonvolatile memory and dynamic random access memory
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
CN105183379A (en) * 2015-09-01 2015-12-23 上海新储集成电路有限公司 Mixed memory data backup system and method
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device
CN107168657A (en) * 2017-06-15 2017-09-15 深圳市云舒网络技术有限公司 It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored
CN107797944A (en) * 2017-10-24 2018-03-13 郑州云海信息技术有限公司 A kind of hierarchy type isomery mixing memory system
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN108255712A (en) * 2017-12-29 2018-07-06 曙光信息产业(北京)有限公司 The test system and test method of data system
CN108762671A (en) * 2018-05-23 2018-11-06 中国人民解放军陆军工程大学 Hybrid memory system based on PCM and DRAM and management method thereof
CN109901800A (en) * 2019-03-14 2019-06-18 重庆大学 A kind of mixing memory system and its operating method
CN110347336A (en) * 2019-06-10 2019-10-18 华中科技大学 A kind of key assignments storage system based on NVM with SSD mixing storage organization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
van Renen等.Managing non-volatile memory in database systems.《Proceedings of the 2018 International Conference on Management of Data - SIGMOD'18》.2018,1541-1555. *

Also Published As

Publication number Publication date
CN111176584A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
US12105620B2 (en) Storage system buffering
US12038906B2 (en) Database system with database engine and separate distributed storage service
US11704066B2 (en) Heterogeneous erase blocks
CN111176584B (en) Data processing method and device based on hybrid memory
US8499121B2 (en) Methods and apparatus to access data in non-volatile memory
US20200387479A1 (en) Using data characteristics to optimize grouping of similar data for garbage collection
US20180024964A1 (en) Disaggregated compute resources and storage resources in a storage system
US9251003B1 (en) Database cache survivability across database failures
US11869583B2 (en) Page write requirements for differing types of flash memory
US10515701B1 (en) Overlapping raid groups
US20230004464A1 (en) Snapshot commitment in a distributed system
US10671494B1 (en) Consistent selection of replicated datasets during storage system recovery
US10496284B1 (en) Software-implemented flash translation layer policies in a data processing system
US10776202B1 (en) Drive, blade, or data shard decommission via RAID geometry shrinkage
EP3485363B1 (en) Distributed integrated high-speed solid-state non-volatile random-access memory
CN118277344B (en) Storage node interlayer merging method and device of distributed key value storage system
CN118626432A (en) Data processing method, storage system, network interface device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211015

Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing

Applicant after: Dawning Information Industry (Beijing) Co.,Ltd.

Applicant after: ZHONGKE SUGON INFORMATION INDUSTRY CHENGDU Co.,Ltd.

Address before: Building 36, yard 8, Dongbei Wangxi Road, Haidian District, Beijing

Applicant before: Dawning Information Industry (Beijing) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant