Nothing Special   »   [go: up one dir, main page]

CN111176584A - Data processing method and device based on hybrid memory - Google Patents

Data processing method and device based on hybrid memory Download PDF

Info

Publication number
CN111176584A
CN111176584A CN201911424993.4A CN201911424993A CN111176584A CN 111176584 A CN111176584 A CN 111176584A CN 201911424993 A CN201911424993 A CN 201911424993A CN 111176584 A CN111176584 A CN 111176584A
Authority
CN
China
Prior art keywords
data
memory
node
storage
ram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911424993.4A
Other languages
Chinese (zh)
Other versions
CN111176584B (en
Inventor
郭庆
谢莹莹
于宏亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Sugon Information Industry Chengdu Co ltd
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201911424993.4A priority Critical patent/CN111176584B/en
Publication of CN111176584A publication Critical patent/CN111176584A/en
Application granted granted Critical
Publication of CN111176584B publication Critical patent/CN111176584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0623Securing storage systems in relation to content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The application provides a data processing method and device based on a hybrid memory. The method is applied to a node in a distributed storage system, the distributed storage system comprises a plurality of nodes which are mutually communicated and connected, each node comprises an HFDD and an external storage, the HFDD comprises an internal storage and a solid-state disk SSD, and the internal storage comprises a random access memory RAM and an NVDIMM, and the method comprises the following steps: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data; and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively. In the embodiment of the application, the HFDD is a fault-tolerant distributed data abstraction based on the RAM + NVM hybrid memory, and the data is stored according to the heat of the data, so that on one hand, the storage capacity of the memory is improved, and on the other hand, the data access efficiency is improved.

Description

Data processing method and device based on hybrid memory
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus based on a hybrid memory.
Background
At present, PB-level data can be processed by a big data computing technology, the inevitable reason for the concept of memory computing exists, all data are completely loaded into a memory in an initialization stage in a memory computing mode, data and query operations are executed in a high-speed memory, a CPU (central processing unit) directly reads the data from the memory to perform real-time computing and analysis, disk data access is reduced, the influence of a network and disk I/O (input/output) is reduced, the data throughput and processing speed of computing processing are greatly improved, and the I/O overhead which originally occupies a large amount of computing resources is reduced. By the application of memory calculation, the bottleneck of I/O is avoided, and the calculation results in hours and days can be completed in seconds in the memory calculation environment.
The main memory computing technology at present is memory computing based on a single node. A single-node memory computing system operates on a single physical node having one or more processors and shared memory, where the memory structure may be centralized shared memory or non-coherent shared memory. The memory calculation on the single node utilizes a multi-core CPU, adopts a large memory and a plurality of threads in parallel, and fully exerts the calculation efficiency of a single machine. However, for memory calculation of a single node, the processing efficiency of data is affected when the memory is insufficient due to the limitation of hardware resources.
Disclosure of Invention
An embodiment of the present application provides a data processing method and device based on a hybrid memory, so as to solve the problem of low data processing efficiency in the prior art.
In a first aspect, an embodiment of the present application provides a hybrid memory-based data processing method applied to a node in a distributed storage system, where the distributed storage system includes multiple nodes communicatively connected to each other, each node includes a hybrid memory and a fault-tolerant distributed data set HFDD and an external storage, the HFDD includes a memory and a solid state disk SSD, and the memory includes a random access memory RAM and a non-volatile dual in-line memory module NVDIMM, and the method includes: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data; and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
In the embodiment of the application, the HFDD is a fault-tolerant distributed data abstraction based on the RAM + NVM hybrid memory, and the data is stored according to the heat of the data, so that on one hand, the storage capacity of the memory is improved, and on the other hand, the data access efficiency is improved.
Further, the storing each data according to the heat of each data and the storage capacity corresponding to the memory, the SSD, and the external storage, respectively, includes: storing data which is less than or equal to the storage capacity of the RAM into the RAM according to the storage capacity of the RAM in the RAM, starting from the data with the highest heat; storing data smaller than or equal to the storage capacity into the SSD from the data with the highest heat in the residual data according to the storage capacity of the SSD; storing the remaining data in the external memory. According to the embodiment of the application, the data are distributed and stored according to the heat of the data and the capacities of the memory, the SSD and the external storage, and the efficiency of node-to-node data access is ensured on the basis of improving the capacity of the memory.
Further, the method further comprises: and receiving an access request, if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the size of the memory required by the data to be accessed, and storing the data to be accessed in the RAM. According to the embodiment of the application, the performance guarantee under the premise of a high-capacity hybrid memory is realized through a data exchange-in and exchange-out technology between the RAM and the SSD.
Further, the method further comprises: if the node is abnormally powered down, storing the working state data corresponding to the node before the abnormal power down into an external memory from an NVDIMM; and after the state of the node is recovered to be normal from the fault, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data. According to the embodiment of the application, the data reliability of the system after external power failure can be ensured by using the NVDIMM.
Further, the method further comprises: determining a time node for creating a copy; and if the time node is reached, sending a copy creating request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creating request.
Further, the method further comprises: if the data in one node is lost, acquiring the copy created most recently from the current time from other nodes; and sending the data in the copy to the node losing the data. According to the embodiment of the application, the reliability of data among multiple nodes is guaranteed through a data multi-copy fault-tolerant technology in a distributed environment.
Further, the method further comprises: generating a snapshot of HFDD in a node according to a preset period in the operation process of the distributed storage system; and determining the storage position of the corresponding snapshot according to the time of generating each snapshot. According to the method and the device, the snapshot is generated according to the preset period, and the reliability of the data inside the node is guaranteed.
In a second aspect, an embodiment of the present application provides a hybrid memory-based data processing apparatus, applied to a node in a distributed storage system, where the distributed storage system includes a plurality of nodes communicatively connected to each other, each node includes a hybrid memory and a fault-tolerant distributed data set HFDD and an external storage, the HFDD includes a memory and a solid state disk SSD, and the memory includes a random access memory RAM and a non-volatile dual in-line memory module NVDIMM, and the method includes:
the heat degree calculation module is used for calculating the heat degree of each datum, wherein the heat degree represents the frequency of accessing the corresponding datum;
and the data storage module is used for storing each data according to the heat degree of each data and the storage capacity corresponding to the memory, the SSD and the external storage respectively.
In a third aspect, an embodiment of the present application provides a distributed storage system, including a plurality of apparatuses described in the second aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor, the processor being capable of performing the method of the first aspect when invoked by the program instructions.
In a fifth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium, including:
the non-transitory computer readable storage medium stores computer instructions that cause the computer to perform the method of the first aspect.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
FIG. 2 is a schematic storage diagram of a distributed system according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Since single-node memory computation is limited by hardware resources, problems in terms of hardware scalability are faced when processing larger-scale data. In the background of rapid development of large-scale distributed data processing technologies represented by MapReduce, people also start to implement memory computation on distributed systems. The memory calculation utilizes a cluster formed by a plurality of computers to construct a distributed large memory, and the data to be processed is stored in the distributed memory through uniform resource scheduling, so that the rapid access and processing of large-scale data are realized.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application, and as shown in fig. 1, the system includes a plurality of nodes, each of the nodes may be formed by a server, and the plurality of nodes are connected by network I/O communication, and may be specifically interconnected by a high-speed ethernet. Each node comprises a Hybrid-memory and Fault-tolerant Distributed data set (HFDD) and an external memory, wherein the HFDD comprises a Random Access Memory (RAM), a nonvolatile memory (NVDIMM), and a Solid State Disk (SSD), and the RAM is an internal memory which directly exchanges data with the CPU and is also called a main memory (memory). It can be read and written at any time, and is fast, usually used as temporary data storage medium of operating system or other running program. The random access memory is further classified into Static random access memory (Static RAM, SRAM) and Dynamic random access memory (Dynamic RAM, DRAM) according to the operating principle of the memory cell. A non-volatile dual in-line memory module (NVDIMM) is a random access memory for computers. Non-volatile memory is memory that retains its contents even when power is removed, including an unexpected power loss, a system crash, or a normal shutdown. In the embodiments of the present application, the RAM and NVDIMM are referred to as memory. A non-volatile memory (NVM) is a memory that can still hold data after power is off, and has the advantages of non-volatility, byte access, high storage density, low energy consumption, and read-write performance close to DRAM, but the read-write speed is asymmetric and the lifetime is limited.
RAM and NVDIMM constitute GB class of data storage, and RAM enables 40GB/S data transfer. While the SSD constitutes TB-level data storage, the data transmission efficiency is 1 GB/S. Therefore, although the storage capacity of the SSD is large, the data transmission is slow, and the memory has faster data transmission efficiency than the SSD. External storage (harddisk) in each node refers to storage other than computer memory and CPU cache, and in the embodiments of the present application, storage other than HFDD is referred to as external storage. Such storage typically remains capable of holding data after a power outage. The capacity of the external memory is 10TB class, but the speed of transmitting data is only about 0.2 GB/S.
In a distributed storage system, the HFDDs in each node are communicatively connected to each other via a network I/O, and the HFDDs of all nodes together form the HFDD of the system.
It should be noted that fig. 1 only shows a system formed by three nodes, and in practical applications, the number of nodes in the system may be set according to practical situations, which is not specifically limited in this embodiment of the present application.
The external storage in all nodes in the system constitutes a file storage cluster in which all data in the system is stored. In a conventional environment, data required for each calculation needs to be read from the underlying layer, which is suitable for applications where many data are not reused. However, we also see an increasing number of other application scenarios: a certain data set is frequently accessed multiple times, and these applications include machine learning and iterative computations within image processing (each iterative step accesses the same data), interactive data mining techniques (users make multiple queries on the same data set), and so on. Fig. 2 is a schematic storage diagram of a distributed system according to an embodiment of the present application, and as shown in fig. 2, data of a hybrid storage system is divided into three layers according to performance levels: the memory comprises a RAM and NVDIMM, and the storage capacity is increased in sequence. All data required by execution of each subtask is stored in an external memory, and a user can select to cache some data in the memory when submitting the tasks. To this end, the system defines the data for all operations as data sets, which a user may cache for frequent access. Since the cache is located in a single node and cannot be seen by other nodes, the cached data can only be read and accessed, and cannot be modified, otherwise, the data cannot be obtained by other nodes due to modification of one node, which may cause data inconsistency.
When deciding which part of data needs to be cached, the user needs to consider the sizes of the data sets and the memory, and should not cache a very large data set in the memory, but cache data sets that may be accessed many times. In order to adapt to more application scenarios and serve larger data sets, the hybrid storage system adds an SSD layer, which is interposed between the internal memory and the external storage, and is not directly used for storing user data, but rather is used as an extension of the internal memory. When the user specifies that the data set needing to be cached is not stored in the memory, the data set can be automatically moved to the SSD, and the data set does not need to be acquired from the slow external memory when being accessed next time, so that the performance of the user task is greatly improved.
As shown in fig. 1, the RAM of the NVDIMM can ensure the data reliability after the system is powered off unexpectedly, and the SSD as the NVM device can effectively expand the capacity of the HFDD.
HFDDs as hybrid memory-based data sets provide basic management operation primitives including: creation, destruction, updating, etc. of HFDD; meanwhile, HFDD supports various basic calculation operations thereon, and the calculation is mainly classified into 2 types: 1) basic calculation, namely, returning a numerical value after the data set is calculated; for example: after all elements of the data set are aggregated by a certain function, the Reduce returns a final result to the program; 2) conversion calculation: creating a new data set according to the data set, and returning a new HFDD after calculation; for example: map returns a new HFDD after calculating each element of data through a certain function.
Based on the system provided by the foregoing embodiment, an embodiment of the present application provides a data processing method, as shown in fig. 3, where the method is applied to one node in a distributed storage system, and it can be understood that the node is any node in the system, and the method includes:
step 301: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data;
step 302: and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
In a specific implementation process, the node may calculate the heat of each data in real time, where a data heat calculation program may be configured in the node in advance, and the heat of each data may be calculated by the heat calculation program. In short, the number of times each data is accessed in the preset time period can be calculated, the more the number of times, the higher the heat degree, and conversely, the less the number of times, the lower the heat degree.
In order to reduce the amount of computation, the heat of data stored in the memory and the SSD, and data accessed within a preset time period may be calculated. And for data that is stored in the external memory and has not been accessed for a preset period of time, the heat thereof may not be calculated.
Because the capacities of the internal memory, the SSD and the external memory are limited and the data transmission speeds of the internal memory, the SSD and the external memory are different, the data can be distributed and stored according to the heat degree of the data and the capacities of the internal memory, the SSD and the external memory. The high heat is preferentially stored in the memory, and the low heat is stored in the external memory. That is, data which is less than or equal to the storage capacity of the RAM is stored in the RAM according to the storage capacity of the RAM in the memory from the data with the highest heat; storing data smaller than or equal to the storage capacity into the SSD from the data with the highest heat in the residual data according to the storage capacity of the SSD; finally, the remaining data is stored in an external memory.
It should be noted that the type of memory may be replaced with other memories having the same function in addition to the RAM, NDIMM, and SSD.
In the embodiment of the application, the HFDD is a fault-tolerant distributed data abstraction based on the RAM + NVM hybrid memory, and the data is stored according to the heat of the data, so that on one hand, the storage capacity of the memory is improved, and on the other hand, the data access efficiency is improved.
On the basis of the above embodiment, the method further includes:
and receiving an access request, if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the size of the memory required by the data to be accessed, and storing the data to be accessed in the RAM.
In a specific implementation process, due to the bandwidth difference between the RAM and the NVM, a data swap-in and swap-out technology between volatile and nonvolatile storage becomes a key for increasing the access speed of the HFDD on a single node on the basis of ensuring the capacity. The swap-in and swap-out technology is realized by adding a heterogeneous storage medium management module (application program) in an operating system of a node, so that volatile storage and nonvolatile storage with different characteristics and different performances are uniformly managed, and a compatible block interface is presented to the outside. Meanwhile, the data swap-in and swap-out management mechanism places frequently-used data in volatile storage with high access speed, namely RAM, and places data which is not accessed for a relatively long time in nonvolatile storage, namely SSD, through monitoring information. When the access request received by the node is data which needs to be stored on the SSD, and under the condition that the storage capacity of the RAM is full, part of the data stored in the RAM is swapped out according to a flexible and extensible strategy, and the data to be accessed and stored in the SSD is stored in the RAM. Wherein, the size of the data swapped out from the RAM should be not less than the size of the data to be accessed that needs to be swapped in. In addition, data can be stored in the NVDIMM, namely, when the memory capacity in the RAM is full, the data can be stored in the VNDIMM, and the data can be exchanged between the RAM and the VNDIMM.
In addition, the extensible policy may be understood as a policy for determining a volatile memory block in the RAM that needs to be swapped out, and specifically may be to swap out a volatile memory block with the lowest degree in the RAM.
It should be noted that if data to be accessed is stored in the SSD and the size of the remaining capacity in the RAM is larger than the size of the data to be accessed, the data to be accessed may be directly stored in the RAM.
When the volatile memory block is replaced, the limit of the erasable times of the storage media such as the nonvolatile memory needs to be considered, and if the current heat of the replaced volatile memory block is lower than the preset value, the volatile memory block is stored in the external memory instead of the SSD.
According to the embodiment of the application, the performance guarantee under the premise of a high-capacity hybrid memory is realized through a data exchange-in and exchange-out technology between the RAM and the SSD.
On the basis of the above embodiment, the method further includes: if the node is abnormally powered down, storing the working state data corresponding to the node before the abnormal power down into an external memory from an NVDIMM; and after the state of the node is recovered to be normal from the fault, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
In a specific implementation process, after a system is powered down, information stored in a RAM is lost at a power-down node, and in order to solve the problem, an NVDIMM is introduced into the node. The NVDIMM is a memory bank specification integrating DRAM and nonvolatile memory chips, can still store complete memory data when the power is completely cut off, can solve the problem of memory data storage work under the condition of abnormal power failure of a system, and can continue previous work after the system recovers to normal operation.
In the operation process of the storage system, the working state data is stored in the NVDIMM, and the NVDIMM can store the working state data (including hardware devices such as a CPU, a bridge chip, and a network card and all processes in the node) of the whole node before power failure in a short time to the external storage, and it can be understood that the external storage may be a disk and the like. After the node is powered on again to run, the working state data is obtained from the external memory and stored in the NVDIMM, so that the system can be restored to the previous running state.
On the basis that the reliability of the volatile storage data of the single node is guaranteed on the NVDIMM, the fault of the node can be timely discovered through heartbeat information mutually sent among the nodes. In addition, the data copies of the node can be stored in other nodes, and if the data in one node is lost, the lost data can be retrieved by the node through the data copies in other nodes, and the node can operate normally.
According to the embodiment of the application, the RAM of the NVDIMM is used, so that the data reliability of the system after external power failure can be ensured.
On the basis of the above embodiment, the method further includes: determining a time node for creating a copy; and if the time node is reached, sending a copy creating request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creating request.
In a specific implementation process, data reliability can be ensured by performing multi-copy storage on data on a plurality of nodes, however, the direct redundant storage method brings great space waste and data network transmission cost. In order to reduce data redundancy space and data network transmission cost to the maximum extent while ensuring data reliability in volatile storage, the embodiment of the application adopts a multi-dimensional fault-tolerant technology combining multiple copies and check points. That is, a complete basic set of operations is defined, the execution plan of the load is translated into an operation evolution process for the HFDD, and the evolution process for each set of load operations is logged. The basic operation set refers to an instruction set for performing basic operations such as read-write query analysis on data. The execution plan of the load refers to a step process and a corresponding operation plan of the task in the system execution process. During load execution, instead of performing multiple copies of data in the HFDD at each stage, only the data in the HFDD at the checkpoint is redundantly stored, with appropriate selection of the critical checkpoint, i.e., the time node at which the copy was created, in conjunction with execution planning and data distribution.
If the data in the node is lost, the copy created most recently from the current time can be obtained from other nodes storing the data copy of the node.
According to the embodiment of the application, the reliability of data among multiple nodes is guaranteed through a data multi-copy fault-tolerant technology in a distributed environment.
On the basis of the above embodiment, the method further includes: generating a snapshot of HFDD in a node according to a preset period in the operation process of the distributed storage system; and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
In a specific implementation process, the reliability of a data set in the HFDD is ensured based on the snapshot, and the snapshot of the HFDD in the node is generated according to a preset period, so that not only can the restorable time points of a plurality of data be provided be ensured, but also the great influence on the system performance caused by frequent backup of the data can be avoided. Therefore, based on the snapshot data backup technology, the following scheme of aspect 4 is provided. Firstly, an efficient snapshot generation mechanism is adopted, backup can be quickly established, and less system resources are consumed. Meanwhile, the generated snapshot is stored in an external storage disk, and the snapshot to be accessed is stored in a memory according to the using condition of the snapshot, so that the data can be rapidly recovered. Secondly, a snapshot mechanism of multiple recovery points is established, and the multiple recovery points are established to help a user to implement recovery more reliably and minimize the loss caused by system downtime or other accidents. It is understood that the recovery point is the snapshot. Thirdly, aiming at the management problem of a large number of snapshots, an integrated snapshot backup management scheme is established, a convenient GUI is provided, and a user is helped to quickly find the needed backup. Fourthly, multi-level storage of the snapshot backup data is realized aiming at the problem of storage space caused by a large number of snapshot backups. The user's recent backup is deposited in an external storage that facilitates fast recovery, such as: in RAM, and stores the more recent snapshots in a lower cost, slower external memory, such as: SSD or external memory. Meanwhile, a set of backup elimination mechanism is also established, and elimination based on policy (for example, 50% of backup before one year is deleted) is carried out on the long-distance snapshot backup.
According to the method and the device, the snapshot is generated according to the preset period, and the reliability of the data inside the node is guaranteed.
On the basis of the above embodiment, the HDFF in the node may also be fragmented. The data fragmentation mode has decisive influence on the query plan and the query performance, the embodiment of the application provides data fragmentation strategies in various distributed environments to improve the efficiency of different queries, and the specific fragmentation strategies can be divided into two types: static fragmentation strategies and dynamic fragmentation strategies.
The static sharding policy is explicitly specified by the application or user prior to importing the data. The method comprises the following steps: horizontal slicing, hash slicing by column and user-defined slicing. The horizontal slicing refers to average division of input data according to the data volume and the number of nodes so as to ensure load balance of the data on a plurality of nodes. The horizontal slicing has a good effect on data operations such as selection, aggregation and the like, the operations do not involve the transmission of data among a plurality of nodes, and the horizontal slicing can ensure the load balance among tasks. Whereas for data operations such as packets, connections, etc., simple horizontal fragmentation can result in large data network transmissions, HFDD provides a policy for hash fragmentation for one or more columns. The hash fragmentation can divide data meeting the same grouping or connection condition into data fragments on the same node, thereby reducing the data transmission cost. When the data are distributed unevenly on the hash key values, the fixed hash algorithm can cause the data to be distributed unevenly, so that the computing resources of the cluster cannot be effectively utilized. Therefore, the embodiment of the application also provides a user-defined fragmentation algorithm, so that the user can define the hash algorithm to realize fragmentation according to the distribution condition of the data.
In order to reduce the user usage burden and provide an optimal fragmentation method in combination with different load types, the embodiment of the application provides a dynamic data partitioning strategy. The partitioning strategy does not need a user to establish a partitioning method by himself and is tightly combined with query optimization. The node constructs different partition strategies and corresponding query plans based on the characteristics of user load and statistical information of data distribution, for example: for some data, a horizontal slicing policy is employed, while another portion of the data employs a column hash policy, etc. And selecting the query plan and the slicing strategy with the minimum cost, namely the optimal cost in the partition strategy and the query plan space by using an RRS (regenerative Random Search) black box optimization technology.
Fig. 4 is a schematic structural diagram of an apparatus provided in an embodiment of the present application, where the apparatus may be a module, a program segment, or code on an electronic device. It should be understood that the apparatus corresponds to the above-mentioned embodiment of the method of fig. 3, and can perform various steps related to the embodiment of the method of fig. 3, and the specific functions of the apparatus can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The apparatus is applied to a node in a distributed storage system, the distributed storage system comprising a plurality of nodes communicatively connected to each other, each node comprising a hybrid memory and a fault tolerant distributed data set HFDD comprising a memory and a solid state disk SSD, and an external memory, the memory comprising a random access memory RAM and a non-volatile dual in-line memory module NVDIMM, the apparatus comprising: a heat calculation module 401 and a data storage module 402, wherein:
the heat calculation module 401 is configured to calculate a heat of each data, where the heat represents a frequency of accessing the corresponding data; the data storage module 402 is configured to store each data according to the heat of each data and the storage capacity corresponding to the RAM, the NDIMM memory, the SSD, and the external storage.
On the basis of the above embodiment, the data storage module is specifically configured to:
storing data which is less than or equal to the storage capacity of the RAM into the RAM according to the storage capacity of the RAM in the RAM, starting from the data with the highest heat;
storing data smaller than or equal to the storage capacity into the SSD from the data with the highest heat in the residual data according to the storage capacity of the SSD;
storing the remaining data in the external memory.
On the basis of the above embodiment, the apparatus further includes a swap-in swap-out module, configured to:
and receiving an access request, if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the size of the memory required by the data to be accessed, and storing the data to be accessed in the RAM.
On the basis of the above embodiment, the apparatus further includes a power down protection module, configured to:
if the node is abnormally powered down, storing the working state data corresponding to the node before the abnormal power down into an external memory from an NVDIMM;
and after the state of the node is recovered to be normal from the fault, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
On the basis of the above embodiment, the apparatus further includes a copy creation module configured to:
determining a time node for creating a copy;
and if the time node is reached, sending a copy creating request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creating request.
On the basis of the above embodiment, the apparatus further includes a data recovery module, configured to:
and if the data in one node is lost, acquiring the copy created most recently from the current time from other nodes.
On the basis of the above embodiment, the apparatus further includes a snapshot generating module, configured to:
generating a snapshot of HFDD in a node according to a preset period in the operation process of the distributed storage system;
and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
In summary, in the embodiments of the present application, the HFDD is used to implement a data swap-in and swap-out technology between volatile and nonvolatile memories, so as to ensure a single-machine access performance on the premise of a high-capacity hybrid memory; realizing data set global management and cluster level HFDD access performance through a data fragmentation strategy in a distributed environment; the reliability guarantee of single-machine volatile storage data and a snapshot-based data set reliability guarantee technology are realized, and the reliability of a node-level data set is guaranteed; the data multi-copy fault-tolerant technology under the distributed environment is realized, and the reliability of a data set of a cluster level is ensured.
Fig. 5 is a schematic structural diagram of an entity of an electronic device provided in an embodiment of the present application, and as shown in fig. 5, the electronic device includes: a processor (processor)501, a memory (memory)502, and a bus 503; wherein,
the processor 501 and the memory 502 are communicated with each other through the bus 503;
the processor 501 is configured to call program instructions in the memory 502 to perform the methods provided by the above-mentioned method embodiments, for example, including: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data; and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
The processor 501 may be an integrated circuit chip having signal processing capabilities. The processor 501 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. Which may implement or perform the various methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The Memory 502 may include, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read Only Memory (PROM), Erasable Read Only Memory (EPROM), electrically Erasable Read Only Memory (EEPROM), and the like.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data; and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data; and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A hybrid memory based data processing method applied to a node in a distributed storage system, the distributed storage system comprising a plurality of nodes communicatively connected to each other, each node comprising a hybrid memory and a fault tolerant distributed data set HFDD comprising a memory and a solid state disk SSD, and an external storage, the memory comprising a random access memory RAM and a non-volatile dual in-line memory module NVDIMM, the method comprising:
calculating the heat of each data, wherein the heat represents the frequency of accessing the corresponding data;
and storing each data according to the heat of each data and the storage capacity corresponding to the internal memory, the SSD and the external memory respectively.
2. The method according to claim 1, wherein storing each data according to the heat of each data and the storage capacity corresponding to each of the memory, the SSD, and the external storage comprises:
storing data which is less than or equal to the storage capacity of the RAM into the RAM according to the storage capacity of the RAM in the RAM, starting from the data with the highest heat;
storing data smaller than or equal to the storage capacity into the SSD from the data with the highest heat in the residual data according to the storage capacity of the SSD;
storing the remaining data in the external memory.
3. The method of claim 1, further comprising:
and receiving an access request, if the storage capacity of the RAM is full and the data to be accessed is stored in the SSD, taking out volatile storage data from the RAM according to the size of the memory required by the data to be accessed, and storing the data to be accessed in the RAM.
4. The method of claim 1, further comprising:
if the node is abnormally powered down, storing the working state data corresponding to the node before the abnormal power down into an external memory from an NVDIMM;
and after the state of the node is recovered to be normal from the fault, writing the working state into the NVDIMM again from the external memory, and continuing to operate according to the working state data.
5. The method of claim 1, further comprising:
determining a time node for creating a copy;
and if the time node is reached, sending a copy creating request to other nodes so that the other nodes create copies of the corresponding data after receiving the copy creating request.
6. The method of claim 5, further comprising:
and if the data in one node is lost, acquiring the copy created most recently from the current time from other nodes.
7. The method of claim 1, further comprising:
generating a snapshot of HFDD in a node according to a preset period in the operation process of the distributed storage system;
and determining the storage position of the corresponding snapshot according to the time of generating each snapshot.
8. A hybrid memory based data processing apparatus for use in a node in a distributed storage system, the distributed storage system comprising a plurality of nodes communicatively coupled to each other, each node comprising a hybrid memory and a fault tolerant distributed data set HFDD comprising a memory and a solid state disk SSD, and an external storage, the memory comprising a random access memory RAM and a non-volatile dual in-line memory module NVDIMM, the method comprising:
the heat degree calculation module is used for calculating the heat degree of each datum, wherein the heat degree represents the frequency of accessing the corresponding datum;
and the data storage module is used for storing each data according to the heat degree of each data and the storage capacity corresponding to the memory, the SSD and the external storage respectively.
9. An electronic device, comprising: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any one of claims 1-7.
10. A non-transitory computer-readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-7.
CN201911424993.4A 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory Active CN111176584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911424993.4A CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911424993.4A CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Publications (2)

Publication Number Publication Date
CN111176584A true CN111176584A (en) 2020-05-19
CN111176584B CN111176584B (en) 2023-10-31

Family

ID=70656050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911424993.4A Active CN111176584B (en) 2019-12-31 2019-12-31 Data processing method and device based on hybrid memory

Country Status (1)

Country Link
CN (1) CN111176584B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984201A (en) * 2020-09-01 2020-11-24 云南财经大学 Astronomical observation data high-reliability acquisition method and system based on persistent memory
CN113076135A (en) * 2021-04-06 2021-07-06 谷芯(广州)技术有限公司 Logic resource sharing method for special instruction set processor
CN114063888A (en) * 2020-07-31 2022-02-18 中移(苏州)软件技术有限公司 Data storage system, data processing method, terminal and storage medium
WO2022199258A1 (en) * 2021-03-26 2022-09-29 华为技术有限公司 Cache management method and storage device

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083264A1 (en) * 2000-12-26 2002-06-27 Coulson Richard L. Hybrid mass storage system and method
US20050108247A1 (en) * 2003-07-16 2005-05-19 Ahti Heinla Distributed database system
US20110179219A1 (en) * 2004-04-05 2011-07-21 Super Talent Electronics, Inc. Hybrid storage device
CN102694863A (en) * 2012-05-30 2012-09-26 电子科技大学 Realization method of distributed storage system on basis of load adjustment and system fault tolerance
CN103810113A (en) * 2014-01-28 2014-05-21 华中科技大学 Fusion memory system of nonvolatile memory and dynamic random access memory
US8751725B1 (en) * 2012-01-27 2014-06-10 Netapp, Inc. Hybrid storage aggregate
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
CN105183379A (en) * 2015-09-01 2015-12-23 上海新储集成电路有限公司 Mixed memory data backup system and method
US20160364172A1 (en) * 2015-06-09 2016-12-15 Ultrata Llc Infinite memory fabric hardware implementation with memory
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device
CN107168657A (en) * 2017-06-15 2017-09-15 深圳市云舒网络技术有限公司 It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored
CN107797944A (en) * 2017-10-24 2018-03-13 郑州云海信息技术有限公司 A kind of hierarchy type isomery mixing memory system
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN108255712A (en) * 2017-12-29 2018-07-06 曙光信息产业(北京)有限公司 The test system and test method of data system
CN108762671A (en) * 2018-05-23 2018-11-06 中国人民解放军陆军工程大学 Hybrid memory system based on PCM and DRAM and management method thereof
CN109901800A (en) * 2019-03-14 2019-06-18 重庆大学 A kind of mixing memory system and its operating method
CN110347336A (en) * 2019-06-10 2019-10-18 华中科技大学 A kind of key assignments storage system based on NVM with SSD mixing storage organization

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083264A1 (en) * 2000-12-26 2002-06-27 Coulson Richard L. Hybrid mass storage system and method
US20050108247A1 (en) * 2003-07-16 2005-05-19 Ahti Heinla Distributed database system
US20110179219A1 (en) * 2004-04-05 2011-07-21 Super Talent Electronics, Inc. Hybrid storage device
US8751725B1 (en) * 2012-01-27 2014-06-10 Netapp, Inc. Hybrid storage aggregate
CN102694863A (en) * 2012-05-30 2012-09-26 电子科技大学 Realization method of distributed storage system on basis of load adjustment and system fault tolerance
CN103810113A (en) * 2014-01-28 2014-05-21 华中科技大学 Fusion memory system of nonvolatile memory and dynamic random access memory
CN104408106A (en) * 2014-11-20 2015-03-11 浙江大学 Scheduling method for big data inquiry in distributed file system
US20160364172A1 (en) * 2015-06-09 2016-12-15 Ultrata Llc Infinite memory fabric hardware implementation with memory
CN105183379A (en) * 2015-09-01 2015-12-23 上海新储集成电路有限公司 Mixed memory data backup system and method
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device
CN107168657A (en) * 2017-06-15 2017-09-15 深圳市云舒网络技术有限公司 It is a kind of that cache design method is layered based on the virtual disk that distributed block is stored
CN107797944A (en) * 2017-10-24 2018-03-13 郑州云海信息技术有限公司 A kind of hierarchy type isomery mixing memory system
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN108255712A (en) * 2017-12-29 2018-07-06 曙光信息产业(北京)有限公司 The test system and test method of data system
CN108762671A (en) * 2018-05-23 2018-11-06 中国人民解放军陆军工程大学 Hybrid memory system based on PCM and DRAM and management method thereof
CN109901800A (en) * 2019-03-14 2019-06-18 重庆大学 A kind of mixing memory system and its operating method
CN110347336A (en) * 2019-06-10 2019-10-18 华中科技大学 A kind of key assignments storage system based on NVM with SSD mixing storage organization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VAN RENEN等: "Managing non-volatile memory in database systems", pages 1541 - 1555 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114063888A (en) * 2020-07-31 2022-02-18 中移(苏州)软件技术有限公司 Data storage system, data processing method, terminal and storage medium
CN111984201A (en) * 2020-09-01 2020-11-24 云南财经大学 Astronomical observation data high-reliability acquisition method and system based on persistent memory
CN111984201B (en) * 2020-09-01 2023-01-31 云南财经大学 Astronomical observation data high-reliability acquisition method and system based on persistent memory
WO2022199258A1 (en) * 2021-03-26 2022-09-29 华为技术有限公司 Cache management method and storage device
CN113076135A (en) * 2021-04-06 2021-07-06 谷芯(广州)技术有限公司 Logic resource sharing method for special instruction set processor
CN113076135B (en) * 2021-04-06 2023-12-26 谷芯(广州)技术有限公司 Logic resource sharing method for special instruction set processor

Also Published As

Publication number Publication date
CN111176584B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
US12038906B2 (en) Database system with database engine and separate distributed storage service
US11120152B2 (en) Dynamic quorum membership changes
US12105620B2 (en) Storage system buffering
US10229011B2 (en) Log-structured distributed storage using a single log sequence number space
US8499121B2 (en) Methods and apparatus to access data in non-volatile memory
US20220188276A1 (en) Metadata journal in a distributed storage system
CN111176584B (en) Data processing method and device based on hybrid memory
US20200387479A1 (en) Using data characteristics to optimize grouping of similar data for garbage collection
US9507843B1 (en) Efficient replication of distributed storage changes for read-only nodes of a distributed database
US11030055B2 (en) Fast crash recovery for distributed database systems
US9672237B2 (en) System-wide checkpoint avoidance for distributed database systems
US10409804B2 (en) Reducing I/O operations for on-demand demand data page generation
US10303564B1 (en) Reduced transaction I/O for log-structured storage systems
US10223184B1 (en) Individual write quorums for a log-structured distributed storage system
CN118277344B (en) Storage node interlayer merging method and device of distributed key value storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211015

Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing

Applicant after: Dawning Information Industry (Beijing) Co.,Ltd.

Applicant after: ZHONGKE SUGON INFORMATION INDUSTRY CHENGDU Co.,Ltd.

Address before: Building 36, yard 8, Dongbei Wangxi Road, Haidian District, Beijing

Applicant before: Dawning Information Industry (Beijing) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant