Nothing Special   »   [go: up one dir, main page]

WO2024001304A1 - 一种数据处理方法及相关设备 - Google Patents

一种数据处理方法及相关设备 Download PDF

Info

Publication number
WO2024001304A1
WO2024001304A1 PCT/CN2023/081418 CN2023081418W WO2024001304A1 WO 2024001304 A1 WO2024001304 A1 WO 2024001304A1 CN 2023081418 W CN2023081418 W CN 2023081418W WO 2024001304 A1 WO2024001304 A1 WO 2024001304A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target data
management device
data management
storage
Prior art date
Application number
PCT/CN2023/081418
Other languages
English (en)
French (fr)
Inventor
张子怡
曲强
杨锐捷
杜明晓
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210983123.6A external-priority patent/CN117376364A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024001304A1 publication Critical patent/WO2024001304A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks

Definitions

  • This application relates to the field of blockchain technology, and in particular to a data processing method, system, device, computing equipment cluster, computer-readable storage medium, and computer program product.
  • Blockchain technology uses block chain data structures to verify and store data, uses distributed node consensus algorithms to generate and update data, uses cryptography to ensure the security of data transmission and access, and uses automated scripts to A decentralized architecture and computing paradigm that uses smart contracts composed of code to program and operate data.
  • the network built based on blockchain technology is called a blockchain network.
  • the nodes in the blockchain network jointly maintain a distributed ledger.
  • the distributed ledger serves as a storage carrier and generally stores a series of simple data structures such as key values or relational data.
  • industry-related data such as video, audio, image and other rich media data or modeling files and other big data, are of great importance to high-tech
  • the demand for reliable on-chain storage is growing day by day.
  • the industry has proposed a storage method that combines on-chain storage and off-chain storage. Specifically, rich media data or big data is stored in an off-chain storage system, and the hash value of the above data is uploaded to the chain. In this way, users can obtain the hash value on the chain and obtain data from the off-chain storage system, calculate the hash value of the data, and compare the hash value on the chain with the hash value calculated off the chain. This ensures data consistency.
  • This application provides a data processing method, which manages the uploading, downloading, etc. of data by introducing a distributed data management system.
  • the data management device in the distributed data management system is mounted with each data management device.
  • the storage resource pool of the blockchain network formed by the storage interacts to realize input and output operations such as data upload and download, and records the storage addresses of data fragments and other related information in the blockchain network. Even if the client, transmission network, and storage network cause data inconsistency due to stability or security issues, data can be restored based on the storage address of the data copy stored on the chain to ensure data consistency and improve data security, availability, Accessibility.
  • This application also provides a distributed data management system, a data management device, a computing device cluster, a computer-readable storage medium and a computer program product corresponding to the above method.
  • this application provides a data processing method.
  • the method is applied to a distributed data management system, which includes multiple data management devices.
  • the first data management device among the plurality of data management devices corresponds to the first blockchain node of the blockchain network
  • the second data management device among the plurality of data management devices corresponds to the second blockchain node of the blockchain network.
  • the storage mounted by the first data management device and the storage mounted by the second data management device are used to form a storage resource pool of the blockchain network.
  • the target data management device among the multiple data management devices can receive a data operation request.
  • the data operation request is used to perform input and output IO operations on the target data.
  • the target data management device obtains the target from the blockchain network according to the data operation request.
  • the storage addresses of multiple data fragments of the data are used to perform IO on the target data in the storage resource pool based on the storage addresses of the multiple data fragments.
  • the storage resource pool is managed by the distributed data management system, and all interactions with the storage resource pool (such as IO operations on target data) need to be processed by the data management device in the distributed data management system and are processed by
  • the data management device also uploads the storage address of the target data of the IO operation. Even if the client, transmission network, and storage network cause data inconsistency due to stability or security issues, data can be restored based on the storage address of the data copy stored on the chain to ensure data consistency and improve data security, availability, Accessibility.
  • this method uploads relevant information of IO operations to the chain, and can also achieve operation traceability.
  • the data operation request is a write request
  • the write request is used to write target data, that is, to upload target data.
  • the target data management device can obtain the allocation strategy based on the smart contract of the blockchain network according to the data operation request, and then the target data management device allocates storage for multiple data shards of the target data from the storage resource pool according to the allocation strategy. Resource, obtain the storage addresses of multiple data shards.
  • the target data management device can write multiple data fragments into the storage resource pool according to the storage address of at least one data fragment, and store the storage addresses of the multiple data fragments in the distributed ledger of the blockchain network.
  • This method provides a distributed data management system for the blockchain network, determines the distribution strategy through the data management device in the distributed data management system, and disperses and stores multiple data shards of the target data according to the distribution strategy to meet the needs of distributed management and avoid Construct a trustworthy system to deal with the risks of managers committing evil acts in centralized management.
  • the target data management device can determine the weights of different storage resources based on the allocation strategy, combined with at least one of the capacity, bandwidth and historical fault records of each storage, and allocate storage resources to the data fragments based on the weights. , thereby obtaining the storage address of each data fragment.
  • the target data management device can also obtain the sharding strategy based on the smart contract of the blockchain network according to the data operation request. Then the target data management device can obtain the sharding algorithm, the number of shards, and the number of copies of each data shard according to the sharding policy.
  • the target data management device can fragment the target data according to the fragmentation algorithm and the number of fragments, obtain multiple data fragments of the target data, and then obtain multiple data fragments based on the multiple data fragments.
  • the storage address of each copy of each data fragment in the shard is written into the storage resource pool, and the storage address of each copy of each data fragment is stored in the area.
  • the distributed ledger of the blockchain network is used to obtain the sharding algorithm, the number of shards, and the number of copies of each data shard according to the sharding policy.
  • This method divides the target data according to the sharding strategy obtained in the blockchain network, obtains multiple data shards, and then stores the multiple data shards in a distributed manner in the storage resource pool, which can improve the efficiency of the target data. Storage (upload) or read (download) efficiency.
  • each data fragment includes multiple copies. Even if several copies of the data fragment are lost, deleted, or tampered with, the data can be restored based on other copies.
  • the target data management device may write multiple copies of each data fragment into different types of storage media in the storage resource pool. In this way, even if one or more types of storage media fail, the data can be restored through copies stored in other types of storage media, which improves storage reliability and ensures data security.
  • the number of replicas of a data shard is equal to the number of blockchain nodes. That is, for each data fragment of the target data, the target data management device can store a copy in the storage mounted on the data management device corresponding to each blockchain node in the blockchain network, so as to achieve the same goal as the data.
  • the effect of sharded storage on the blockchain network does not require a large amount of on-chain storage resources on the blockchain network, ensuring storage reliability through lower storage costs.
  • the target data management device may also determine at least one of a hash value of the target data, a hash value of each data fragment in the plurality of data fragments, and a data attribute of the target data.
  • the data attributes may include one or more of the creator, creation time, and subject. Then the target data management device may store at least one of the hash value of the target data, the hash value of each data fragment in the plurality of data fragments, and the data attributes of the target data to the distributed ledger of the blockchain network.
  • the querying data when querying data, it can support data query based on the hash value of the target data, the hash value of the data fragment, and the data attributes of the target data. On the one hand, it can speed up the query efficiency, and on the other hand, it can ensure the accuracy of the query. .
  • the data operation request may be a read request, and the read request is used to read the target data, that is, to download the target data.
  • the target data management device can obtain the storage addresses of multiple data shards of the target data from the distributed ledger of the blockchain network according to the read request, and then the target data management device obtains the storage addresses of the multiple data shards from The storage resource pool obtains multiple data fragments, and then the target data management device aggregates the multiple data fragments to obtain the target data.
  • the target data management device uses the blockchain network to concurrently read multiple data fragments from the storage resource pool, and obtains the target data based on the multiple data fragments, thereby improving the data reading (downloading) efficiency. Moreover, this method ensures the consistency of read data through the blockchain network.
  • a storage resource pool can store multiple copies of data shards.
  • the target data management device when the target data management device reads data fragments from the storage resource pool, it can obtain the allocation strategy from the blockchain network based on the smart contract, and according to the allocation strategy, combine the capacity, bandwidth and historical fault records of each storage.
  • At least one method is to determine weights of different storage resources, and based on the weights, a target path can be determined from multiple paths.
  • the target path can access the path with the lowest cost or lowest delay among multiple paths to the data shard.
  • the target data management device can access the target path and obtain each data fragment. This can further shorten the reading delay of target data and reduce the cost of reading target data.
  • the target data management device can obtain the aggregation strategy based on the smart contract of the blockchain network according to the data operation request.
  • the target data management device can aggregate the multiple data fragments according to the aggregation strategy to obtain the target data.
  • This method uses the aggregation strategy stored on the chain to aggregate data fragments to obtain target data. If some data shards in the storage resource pool are tampered with, deleted, or lost, copies of the data shards can be obtained in time and aggregated to ensure data consistency.
  • the target data management device may obtain local hash values of multiple data fragments and On-chain hash value.
  • the local hash value can be obtained through a hash algorithm, for example, the data management device calculates it through a hash algorithm based on the contents of locally stored data fragments.
  • On-chain hashes are hashes stored in the blockchain network. The target data management device can first perform verification based on local hash values or on-chain hash values, thereby detecting tampered, deleted or lost data fragments in advance.
  • the target data management device determines that the local hash value matches the on-chain hash value, it starts aggregation of multiple data fragments to obtain aggregate data.
  • the target data management device can then determine the hash value of the aggregated data and obtain the hash value of the target data from the blockchain network.
  • the target data management device may perform verification based on a hash value of the aggregate data or a hash value of the target data. When the hash value of the aggregated data matches the hash value of the target data, the aggregated data is determined to be the target data.
  • the target data management device may obtain the first metainformation of the data fragments in storage mounted by the target data management device from the blockchain node corresponding to the target data management device, and mount the data from the target data management device. Obtain the second meta-information of the data fragment from the stored storage. When the first information does not match the second meta-information, the target data management device determines that a fault has occurred and stores the fault information in the distributed ledger of the blockchain network.
  • the target data management device can periodically scan blocks to obtain nodes and local storage mounted on the device, by performing meta-information on data fragments stored in blockchain nodes and meta-information on locally stored data fragments. Verification, thereby speeding up fault inspection, improving inspection efficiency, and thus providing assistance for fault recovery.
  • the target data management device can read fault information from the blockchain network.
  • the target data management device can obtain the data fragments from the storage mounted by other data management devices and store them locally, and then The target data management device stores the updated storage address in the distributed ledger of the blockchain network.
  • the target data management device reads the fault information stored on the chain and performs fault recovery based on the fault information related to the current device, ensuring data consistency.
  • this application provides a distributed data management system.
  • the distributed data management system includes a plurality of data management devices; the first data management device among the plurality of data management devices corresponds to the first blockchain node of the blockchain network, and the first data management device among the plurality of data management devices corresponds to the first blockchain node of the blockchain network.
  • the second data management device corresponds to the second blockchain node of the blockchain network; the storage mounted by the first data management device and the storage mounted by the second data management device are used to form the blockchain Network storage resource pool;
  • the target data management device among the plurality of data management devices is used to receive a data operation request, and the data operation request is used to perform input and output IO operations on the target data;
  • the target data management device is also configured to obtain the storage addresses of multiple data fragments of the target data from the blockchain network according to the data operation request. According to the storage addresses of the multiple data fragments, Perform IO on the target data in the storage resource pool.
  • the data operation request is a write request
  • the target data management device is specifically used to:
  • allocation policy allocate storage resources from the storage resource pool to multiple data fragments of the target data, and obtain storage addresses of the multiple data fragments;
  • the target data management device is also used to:
  • the sharding strategy obtain the sharding algorithm, the number of shards, and the number of copies of each data shard;
  • the target data management device is specifically used for:
  • each data shard includes multiple copies
  • the target data management device is specifically used for:
  • the target data management device is also used to:
  • the data operation request is a read request
  • the target data management device is specifically used to:
  • the target data management device is specifically used for:
  • the target data management device is also used to:
  • the target data management device is specifically used for:
  • the target data management device is specifically used to:
  • the local hash value is obtained through a hash algorithm.
  • the on-chain hash value is the hash stored in the blockchain network. value;
  • the aggregated data is the target data.
  • the target data management device is also used to:
  • the fault information is stored in the distributed ledger of the blockchain network.
  • the target data management device is also used to:
  • the present application provides a data management device.
  • the data management device corresponds to the blockchain node in the blockchain network.
  • the storage mounted by the data management device and the storage mounted by other data management devices in the distributed data management system are used to form the blockchain.
  • the storage resource pool of the network, the data management device includes:
  • a communication module used to receive data operation requests, which are used to perform input and output IO operations on target data
  • the management module is also configured to obtain the storage addresses of multiple data fragments of the target data from the blockchain network according to the data operation request, and store the storage addresses in the storage address according to the storage addresses of the multiple data fragments.
  • the resource pool performs IO on the target data.
  • the data operation request is a write request
  • the management module is specifically used to:
  • allocation policy allocate storage resources from the storage resource pool to multiple data fragments of the target data, and obtain storage addresses of the multiple data fragments;
  • the management module is also used to:
  • the sharding strategy obtain the sharding algorithm, the number of shards, and the number of copies of each data shard;
  • the management module is specifically used for:
  • each data shard includes multiple copies
  • the management module is specifically used for:
  • the management module is also used to:
  • the data operation request is a read request
  • the management module is specifically used to:
  • the management module is also used to:
  • the management module is specifically used for:
  • the management module is specifically used to:
  • the local hash value is obtained through a hash algorithm.
  • the on-chain hash value is the hash stored in the blockchain network. value;
  • the aggregated data is the target data.
  • the data management device further includes:
  • a fault checking module configured to obtain the first metainformation of the data fragments in storage mounted by the target data management device from the blockchain node corresponding to the target data management device, and mount the data from the target data management device. Obtain the second meta-information of the data fragment from the storage; when the first information does not match the second meta-information, it is determined that a fault has occurred, and the fault information is stored in the distributed ledger of the blockchain network.
  • the data management device further includes:
  • a fault recovery module configured to read fault information from the blockchain network.
  • the fault recovery module will retrieve fault information from other data management devices.
  • the data fragments are obtained from the storage mounted on the device, stored locally, and the updated storage addresses are stored in the distributed ledger of the blockchain network.
  • this application provides a computing device cluster.
  • the cluster of computing devices includes at least one computing device including at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the computing device or a cluster of computing devices executes the data processing method as described in the first aspect or any implementation of the first aspect.
  • the present application provides a computer-readable storage medium in which instructions are stored, and the instructions instruct a computing device or a cluster of computing devices to execute the first aspect or any one of the first aspects. Implement the data processing method described in the method.
  • the present application provides a computer program product containing instructions that, when run on a computing device or a cluster of computing devices, causes the computing device or a cluster of computing devices to execute the first aspect or any one of the first aspects.
  • Reality The data processing method described in the current method.
  • Figure 1 is a schematic architectural diagram of a distributed data management system provided by an embodiment of the present application.
  • Figure 2 is a schematic architectural diagram of a distributed data management system provided by an embodiment of the present application.
  • Figure 3 is a schematic architectural diagram of a distributed data management system in a multi-scenario alliance provided by an embodiment of the present application
  • Figure 4 is a flow chart of a data processing method provided by an embodiment of the present application.
  • Figure 5 is a schematic flow chart of data upload provided by an embodiment of the present application.
  • Figure 6 is a schematic flow chart of data downloading provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of a fault check provided by an embodiment of the present application.
  • Figure 8 is a schematic flowchart of a fault recovery provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a distributed data management system provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • first and second in the embodiments of this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • Blockchain network which can also be referred to as blockchain for short, refers to a peer-to-peer (P2P) network built based on blockchain technology.
  • a blockchain network includes multiple blockchain nodes, each of which is a peer node.
  • multiple blockchain nodes jointly maintain a continuously growing, chained list ledger constructed of ordered data blocks.
  • Each blockchain node stores a copy of the above-mentioned chained list ledger and maintains consistency between copies. Therefore, the chained list ledger is also called the distributed ledger of the blockchain network.
  • Blockchain networks can be divided into public blockchain, private blockchain or consortium blockchain based on the degree of openness of read and write permissions.
  • the public chain is a public blockchain network, with read and write permissions open to all nodes;
  • the private chain is a private blockchain network, with read and write permissions open to a certain node;
  • the alliance chain is an alliance blockchain, with read and write permissions. Permissions are open to nodes that join the alliance (members within the alliance).
  • the distributed ledger of the blockchain network is usually used to store simple data structures such as key-value data and relational data.
  • industry-related data such as video, audio, image and other rich media data or modeling files and other big data, are of great importance to Highly reliable on-chain storage
  • the demand is growing day by day.
  • large-scale data such as rich media data (such as videos, audios, images) or big data (such as modeling files) can be stored in off-chain storage systems, and at the same time, the above data
  • the hash value is uploaded to the chain.
  • users can ensure data consistency by comparing the hash value calculated on the chain with the hash value calculated from the data stored off-chain.
  • there may be stability and security risks in the client, transmission network, storage network, etc. which may lead to problems such as data inconsistency and data tampering, making it difficult to meet business needs.
  • the management system includes multiple data management devices. Each data management device is part of a distributed data management system.
  • the distributed data management system is essentially a distributed storage engine, mainly used to manage the storage of rich media data. Therefore, the distributed data management system can also be called a distributed rich media engine and a distributed data management system.
  • the data management device in is part of the above-mentioned distributed rich media engine.
  • the first data management device among the plurality of data management devices corresponds to the first blockchain node of the blockchain network
  • the second data management device among the plurality of data management devices corresponds to the second blockchain node of the blockchain network.
  • the storage mounted by the first data management device and the storage mounted by the second data management device are used to form a storage resource pool of the blockchain network.
  • the target data management among the multiple data management devices can receive a data operation request, and the data operation request is used to perform an input output (IO) operation on the target data, and then the target data management device can, according to the data operation request, Obtain the storage addresses of multiple data shards (which can also be referred to as shards in some cases) of the target data from the blockchain network, and perform IO on the target data in the storage resource pool based on the storage addresses of the multiple data shards.
  • IO input output
  • the storage resource pool is managed by the distributed data management system, and all interactions with the storage resource pool (such as IO operations on target data) need to be processed by the data management device in the distributed data management system and are processed by
  • the data management device also uploads the storage address of the target data of the IO operation. Even if the client, transmission network, and storage network cause data inconsistency due to stability or security issues, data can be restored based on the storage address of the data copy stored on the chain to ensure data consistency and improve data security, availability, Accessibility.
  • this method uploads relevant information of IO operations to the chain, and can also achieve operation traceability.
  • the distributed data management system 100 includes multiple data management devices 10, and each of the multiple data management devices 10 corresponds to a blockchain network 200.
  • a blockchain node 20 each data management device 10 is mounted with a storage 30.
  • the data management device 10 in the embodiment of the present application supports the management and adaptation of different storage media.
  • the data management device 10 can mount different storage media, including but not limited to mechanical hard disk drive (HDD) or Solid state drive (SDD).
  • the storage 30 mounted on multiple data management devices 10 can be used to form the storage resource pool 300 of the blockchain network.
  • the data management device 10 can also interface with the blockchain client 40 .
  • Blockchain participants such as tenants on the cloud can write large-scale data such as rich media data or big data into the storage resource pool through the blockchain client 40 , or read from the storage resource pool 300 through the blockchain client 40 Large-scale data such as rich media data or big data.
  • the data management device 10 is used to receive a data operation request, for example, a data operation request sent by the tenant through the blockchain client 40.
  • the data operation request is used to perform input and output IO operations on the target data.
  • the data management device 10 Set 10 according to the data operation request, obtain the storage addresses of multiple data fragments of the target data from the blockchain network, and perform IO on the target data according to the storage addresses of the multiple data fragments. For example, when the data operation request is a write request, the data management device 10 can fragment the target data, then determine the storage address of each data fragment, and store each data fragment according to the storage address.
  • the data management device 10 In addition to uploading the hash values of the target data and the hash values of the data fragments, the data management device 10 also uploads the storage addresses of the data fragments. For another example, when the data operation request is a read request, the data management device 10 can obtain the storage address of the data fragments from the blockchain network, obtain the data fragments according to the storage address, and then aggregate the data fragments to obtain the target data. It should be noted that the data management device 10 can separately verify the hash values of the data fragments before aggregation. Specifically, the hash value is calculated based on the data fragments, and the hash value is compared with the hash value on the chain. This enables verification. Similarly, the data management device 10 can also verify the hash value of the aggregated data after aggregation, thereby determining whether the aggregated data is target data.
  • the data management device 10 in the embodiment of the present application also proposes corresponding customized contracts for storage sharding strategies and storage allocation strategies (which can also be referred to as storage Sharding routing) and storage aggregation strategy (aggregation strategy refers to the strategy for aggregating data shards) provide interfaces, such as application programming interfaces (Application Programming Interface, API) for use by each distributed storage engine.
  • the data management device 10 of the distributed data management system 100 can use the smart contract of the blockchain network to reach consensus on the storage sharding strategy, storage allocation strategy, and storage aggregation strategy.
  • the data management device when it performs data IO, it can calculate the shard storage location (storage location identified by the storage address) based on the allocation strategy, combined with the remaining storage storage, shard type, number of shards, bandwidth, number of historical failures, etc. , Reduce data IO time (storage or reading time) and reduce waste of storage space.
  • the data management device 10 traces the IO operation through the smart contract, and submits the storage allocation strategy, storage allocation strategy, storage aggregation strategy and their execution logic to the contract consensus process, and the current storage writing or reading action is approved by the multi-party endorsement results. This ensures data security and prevents data tampering from causing storage inconsistencies or failures. Furthermore, the data management device 10 defines different data fragmentation algorithms through smart contracts, dividing the data into unreadable data fragments, and cannot obtain any data in the storage medium. The data management device 10 reads the data in different storage media. The shards are aggregated and returned to the blockchain client. On the one hand, the sharding method can be expanded and the data shards can be automatically aggregated, simplifying user operations. On the other hand, the data can be divided into unreadable data shards and stored dispersedly in storage managed by different data management devices 10 In the medium, no data management device 10 can obtain the data alone, thereby ensuring data privacy and security.
  • the data management device 10 shown in Figure 1 may be a software device, and the software device may be deployed on other computing devices independent of the blockchain node.
  • the data management device shown in Figure 1 may also be a hardware device.
  • the hardware device may be a computing device that is independent of the blockchain node and has large-scale data management functions such as rich media data.
  • each data management device 10 of the distributed data management system 100 can also be deployed on the blockchain node 20, that is, the block chain node 20.
  • the chain node 20 includes a blockchain kernel and a data management device 10 .
  • the data management device 10 may be a middleware or component, and the middleware or component may be integrated into the blockchain node 20 .
  • the distributed data management system 100 in the embodiment of this application can be applied to industries such as finance, energy, government affairs, aviation, agriculture, people's livelihood, logistics, etc.
  • the distributed data management system 100 can be applied to rich media data storage, file storage Scenarios such as certificates, digital asset certificates, and non-fungible token (NFT) transactions.
  • NFT non-fungible token
  • the distributed data management system 100 can be used as a distributed storage bottom layer to support the metaverse or web3.0.
  • public cloud refers to the cloud services provided by cloud service providers to users through the public Internet (Internet). Users can access the cloud through the Internet and enjoy various services, including but not limited to computing, storage, network, etc.
  • Private cloud is a cloud computing method built by the enterprise to provide services within the enterprise. The private cloud is built for an enterprise to use alone. It can be deployed in the enterprise's data center or uniformly deployed in the computer room of the cloud service provider. .
  • Hybrid cloud is a cloud computing usage that combines private cloud and public cloud.
  • Edge nodes are relative to cloud computing data centers and refer to network nodes with fewer intermediate links between them and the final access users. The edge node can be a computer room or a physical device. Compared with directly accessing the origin site, users have better response capabilities and connection speeds when accessing the edge node.
  • the distributed data management system 100 can also be deployed in different environments in a distributed manner. Referring to the schematic architectural diagram of the distributed data management system 100 shown in Figure 3, multiple data management devices 10 of the distributed data management system 100 can be deployed in public clouds, hybrid clouds, and edge nodes respectively, thereby providing multi-scenario alliances. Data management services.
  • the embodiment of the present application Based on the distributed data management system 100 provided by the embodiment of the present application, the embodiment of the present application also provides a corresponding data processing method.
  • the method includes:
  • the target data management device receives the data operation request sent by the blockchain client.
  • the target data management device may be any data management device 10 in the distributed data management system 100, for example, it may be the above-mentioned first data management device or the second data management device.
  • Data operation requests are used to perform input and output IO operations on target data.
  • the data operation request may be a write request, and the write request is used to write (store) target data.
  • the data operation request can also be a read request, which is used to read the target data. Based on this, the data operation request can include the operation type, such as read or write, to indicate writing or reading the target data.
  • the data operation request also includes meta-information of the target data.
  • the meta-information may be, for example, the name of the target data. Taking the target data as a rich media file as an example, the data operation request includes the operation type and the file name of the rich media file.
  • the target data management device obtains the storage address of the data fragment of the target data from the blockchain network 200 according to the data operation request.
  • the target data is stored in a distributed manner in the form of data shards. Specifically, the target data is stored in a storage resource pool using a distributed storage method. Based on this, the target data management device can first obtain the storage address of the data fragment of the target data from the blockchain network 200 based on the smart contract of the blockchain network 200 according to the data operation request.
  • the following are examples of writing target data and reading target data respectively.
  • the target data management device can obtain the allocation strategy based on the smart contract of the blockchain network according to the data operation request, and then the target data management device obtains the allocation strategy from the
  • the storage resource pool 300 allocates storage resources to multiple data fragments of the target data, and obtains storage addresses of the multiple data fragments.
  • the allocation strategy may be a weight-based allocation strategy.
  • the weight-based allocation strategy may specifically determine the weight of each storage 30 based on the remaining reserves of each storage 30, shard type, number of shards, bandwidth, and the number of historical failures.
  • the shard storage location may be determined based on the weight of each storage 30.
  • the sharding strategy can ensure that each data shard exists on two to three storage media, thereby avoiding data loss caused by individual storage media failures and ensuring data security.
  • the storage address of the data fragments can be generated into an index table and recorded in the blockchain ledger.
  • the storage addresses of multiple data fragments of the target data are also stored in the blockchain network (uplink), and the target data
  • the management device may obtain the storage addresses of multiple data fragments of the target data from the distributed ledger of the blockchain network 200 according to the read request.
  • the target data management device performs IO on the target data according to the storage addresses of multiple data fragments of the target data.
  • the target data management device may write the multiple data fragments into the storage resource pool 300 according to the storage address of the at least one data fragment, and divide the multiple data fragments into the storage resource pool 300 .
  • the storage address of the slice is stored in the distributed ledger of the blockchain network 200.
  • the target data management device can also obtain the sharding strategy based on the smart contract of the blockchain network 200 according to the data operation request.
  • the target data management device can obtain the sharding algorithm, the number of shards, and the number of copies of each data shard according to the sharding policy.
  • the sharding algorithm can be different.
  • the fragmentation algorithm may include one or more of free fragmentation, average fragmentation, fragmentation by duration, and fragmentation by file size.
  • the number of shards can be determined based on the size of the target data and the number of storage nodes in the storage resource pool 300 .
  • the number of copies of each data fragment can be determined based on the reliability requirements of the target data. For example, when the reliability requirements of the target data are high, each data fragment can be stored with three copies, that is, a copy of each data fragment.
  • the quantity can be 3.
  • the target data management device may write each copy of each data fragment into the storage resource pool 300 according to the storage address of each copy of each data fragment in the plurality of data fragments.
  • the target data management device may also store the storage address of each copy of each data fragment to the blockchain network 200 .
  • the target data management device may record the storage address of each copy of each data fragment to the distributed ledger of the blockchain network 200 based on the smart contract.
  • the target data management device may write multiple copies of each data fragment into different types of storage media in the storage resource pool 300 . In this way, even if a certain storage medium fails, failure recovery can be performed based on copies in other storage media.
  • the target data management device can also determine the hash value of the target data and the hash value of each data fragment in the multiple data fragments. At least one of a Greek value and a data attribute of the target data, wherein the data attributes of the target data may include one or more of the creator, creation time, and subject of the target data.
  • the target data management device may store at least one of a hash value of the target data, a hash value of each data fragment in the plurality of data fragments, and a data attribute of the target data to the blockchain network. Similar to the storage address for storing data fragments, the target data management device can store the hash value of the target data, At least one of the hash value of the data block and the data attribute of the target data is recorded to the distributed ledger.
  • the target data management device can obtain the storage addresses of multiple data fragments of the target data from the blockchain network 200 (for example, the distributed ledger of the blockchain network 200) according to the read request, and then The target data management device can obtain multiple data fragments from the storage resource pool 300 according to the storage addresses of the multiple data fragments, and then the target data management device can aggregate the multiple data fragments to obtain the target data.
  • the blockchain network 200 for example, the distributed ledger of the blockchain network 200
  • the target data management device can also obtain the aggregation strategy based on the smart contract of the blockchain network 200 according to the data operation request, and then the target data management device can aggregate multiple data fragments according to the aggregation strategy to obtain the target data.
  • the aggregation strategy corresponds to the sharding strategy.
  • the fragmentation strategy is a strategy of dividing by duration
  • the aggregation strategy can be a strategy of aggregating by duration
  • the target data management device can divide each data into segments based on the start time and end time of the data fragmentation. The slices are sorted in order of start time or end time, and then the sorted data slices are spliced to achieve aggregation of data slices.
  • the target data management device can determine the target path from the multiple paths used to access the multiple copies. It can be the path with the smallest delay or lowest cost. Data fragments are pulled based on this target path, and then the data fragments are aggregated.
  • the target data management device can calculate the weight of each path based on at least one of the remaining reserves of the storage 30 mounted by each data management device, the shard type, the number of shards, the bandwidth, and the number of historical failures. , determine the target path from multiple paths based on the weight.
  • the target data management device can also obtain local hash values and on-chain hash values of multiple data fragments.
  • the local hash value is obtained through a hash algorithm. Specifically, after acquiring the data fragments, the target data management device can use a hash algorithm to perform a hash operation on the contents of the data fragments, thereby obtaining the local hash value.
  • the on-chain hash value is the hash value stored in the blockchain network 200. Specifically, the target data management device triggers a read operation on the blockchain network 200 to read the target data stored in the blockchain network 200. Hash values of multiple data shards. Then the target data management device can compare the above-mentioned local hash value and the hash value on the chain. When the target data management device determines that the local hash value matches the hash value on the chain, for example, the local hash value is consistent with the hash value on the chain. , then start aggregation of multiple data shards to obtain aggregated data.
  • the target data management device can also determine the hash value of the aggregated data, and obtain the hash value of the target data from the blockchain network 200 . Similarly, the target data management device may compare the hash value of the aggregated data with the hash value of the target data stored on-chain. When the hash value of the aggregated data matches the hash value of the target data, the target data management device determines the aggregated data as the target data.
  • the data operation result can be writing success or writing failure. If the data operation result is successful writing, the target data management device can execute other data operation requests. If the data operation result is a write failure, the blockchain client can be instructed to resend the write request to rewrite the target data.
  • the data operation result can be read success or read failure.
  • the data operation result may also include the target data read by the target data management device.
  • the target data management device can instruct the blockchain client to resend the read request to re-read the target data.
  • the target data management device may not return data operation results.
  • the target data management device obtains the first metainformation of the data fragments in the storage 30 mounted by the target data management device from the blockchain network 200.
  • the target data management device can periodically scan the blockchain network 200, specifically, periodically scan the blockchain node 20 corresponding to the target data management device, thereby obtaining the storage 30 mounted on the target data management device.
  • the first meta-information refers to the meta-information stored on the chain.
  • the first meta-information may include one or more of the name, size and hash value of the data fragments stored on the chain.
  • the period in which the target data management device scans the blockchain network 200 can be set based on experience values. For example, the period can be set to 5 minutes (minute, min).
  • the target data management device obtains the second meta-information of the data fragments from the storage 30 mounted on the target data management device.
  • the target data management device may periodically scan the storage 30 (which may also be called local storage) mounted on the target data management device to obtain the second meta-information of the data fragments in the storage 30 mounted on the target data management device.
  • the second meta-information refers to the meta-information stored off-chain, and the second meta-information may include one or more of the name, size, and hash value of the data fragments stored off-chain.
  • S414 The target data management device determines whether the first meta-information and the second meta-information match. If not, execute S416.
  • the target data management device may compare the first meta-information and the second meta-information, thereby determining whether the first meta-information and the second meta-information match.
  • the target data management device may execute S416.
  • S416 The target data management device determines that a fault has occurred and stores the fault information in the blockchain network.
  • the fault information may include the node identifier of the faulty node or the fragment identifier of the faulty data fragment.
  • the node identifier may be one or more of the node name and the IP address of the node, and the fragment identifier of the data fragment may be the fragment name.
  • the fault information may also include meta-information about the data to which the data fragment belongs.
  • the target data management device can record the fault information to the distributed ledger of the blockchain network 200 based on the smart contract of the blockchain network 200 . On the one hand, it can lay the foundation for subsequent fault recovery, and on the other hand, it can realize data operation traceability.
  • S418 The target data management device reads the fault information.
  • the target data management device may periodically read fault information. Among them, the target data management device can access the blockchain node 20 to periodically read the fault information.
  • the period in which the target data management device scans the blockchain network 200 or local storage for fault checking and the period in which fault information is read for fault recovery may be consistent or different.
  • the period in which the target data management device reads fault information and performs fault recovery may be greater than the period in which the fault is checked.
  • the period for the target data management device to scan the blockchain network 200 or local storage for fault checking may be 5 minutes, and the period for the data management device to read fault information for fault recovery may be 5 minutes, or 10 minutes.
  • the target data management device obtains a copy of the data fragment from the storage mounted by other data management devices in the storage resource pool 300 based on the fault information.
  • the target data management device can obtain copies of all data fragments stored in the lost storage medium from storage mounted by other data management devices in the storage resource pool 300.
  • the target data management device can obtain the deleted or tampered data from the storage mounted by other data management devices in the storage resource pool 300. A copy of the data shard.
  • S420 The target data management device locally stores copies of the data fragments.
  • the data management device writes copies of the data fragments into the storage mounted on the target data management device, thereby realizing local storage of the data fragments. It should be noted that when the fault information indicates that the storage medium in the storage 30 mounted by the target data management device is lost, the target data management device can first mount a new storage medium and then write a copy of the data fragment into the target data. Manage device-mounted storage 30.
  • the target data management device stores the updated storage address in the blockchain network.
  • the target data management device can store the updated storage address in the blockchain network 200 , for example, by storing the updated storage address in the distributed ledger of the blockchain network 200 through a smart contract of the blockchain network 200 .
  • S410 to S416 is a specific implementation method of fault check
  • S418 to S424 is a specific implementation method of fault recovery.
  • the data processing method in the embodiment of the present application may not be executed without performing the above-mentioned S410 to S416 or S418. to S424.
  • the embodiment of the present application provides a data processing method.
  • the blockchain node 20 in the blockchain network 200 supports mounting different types of external storage, thereby providing storage capabilities for storing large-scale data such as rich media data and modeling data based on the blockchain network 200 .
  • the blockchain client provides a calling interface that supports users to upload or download large-scale data such as rich media data through the calling interface. Because in the process of uploading or downloading data, the target data management device of the distributed data management system 100 needs to participate and record the relevant information in the upload or download process. Even if the data is tampered with or deleted, the data can be restored in time based on the relevant information stored on the chain, improving the security, availability, accessibility and operational traceability of the data.
  • the storage 30 mounted on each blockchain node 20 in this method can use distributed storage to provide a set of adapted distributed storage resource pools for decentralized systems such as the blockchain network 200 to meet the needs of decentralization. .
  • FIG. 4 introduces the data processing method of the embodiment of the present application. Next, the process of data upload, data download, fault inspection, and fault recovery will be introduced in detail with reference to the accompanying drawing.
  • Step 1 The blockchain client receives the write request and sends the write request to the target data management device.
  • Write requests are used to write target data, which can be large-scale data such as rich media data or big data.
  • the write request is specifically used to write the target data into the storage resource pool formed by the storage mounted on each blockchain node in the blockchain network, that is, to upload the target data to the storage resource pool of the blockchain network. Therefore, the write request The request is also called a data upload request.
  • the write request may include the above target data.
  • the target data can be encapsulated in the payload of a write request and transmitted to the target data management device through the write request, so that the target data management device uploads the target data to the storage resource pool of the blockchain network.
  • Step 2 The target data management device in the distributed data management system verifies the target data and signature according to the write request. name. If the verification passes, go to step 3; if the verification fails, go to step 8.
  • the target data management device may perform an integrity check on the target data.
  • the write request can carry an integrity check code.
  • the target data management device receives the write request, it can calculate the integrity check code based on the target data in the write request, and then compare the integrity check code carried in the write request with The locally calculated integrity check code. When the check codes are consistent, it indicates that the integrity check has passed. Otherwise, it indicates that the integrity check has failed and the target data may have been tampered with.
  • the write request may also include the user's signature.
  • the target data management device may also calibrate the target data. Verify signature.
  • step 2 is an optional step in the embodiment of the present application, and the above step 2 may not be performed when performing the method of the embodiment of the present application. For example, when the entire system is deployed in a trusted environment, step 2 above does not need to be performed.
  • Step 3 The target data management device obtains the sharding strategy from the blockchain network, and segments the target data to obtain multiple data shards.
  • Step 4 The target data management device obtains the allocation strategy from the blockchain network, and determines the storage address of each data fragment in the multiple data fragments according to the allocation policy.
  • Step 5 The target data management device writes the data fragments into the storage resource pool according to the storage of each data fragment.
  • Step 6 The target data management device determines the hash value of each data fragment and the hash value of the target data.
  • steps 3 to 6 please refer to the relevant description of the embodiment shown in Figure 4, and will not be described again here.
  • Step 7 The target data management device generates a transaction based on the hash value of the data fragment, the hash value of the target data, and the storage address of the data fragment, and uploads the transaction to the chain.
  • the target data management device can generate a transaction block based on the hash value of the data fragment, the hash value of the target data, and the storage address of the data fragment.
  • Each blockchain node reaches a consensus on the transaction block, and the transaction block can be Add to the distributed ledger to realize transactions on the chain.
  • the target data management device can also add data attributes to the transaction block, thereby realizing that the data attributes are also stored on the chain.
  • Step 8 The target data management device returns a verification failure notification.
  • the verification failure notification is used to indicate verification failure, and the blockchain client can resend the data upload request to re-upload the target data.
  • Step 1 The blockchain client receives the read request and sends the read request to the target data management device.
  • Read requests are used to read target data, which can be large-scale data such as rich media data or big data.
  • the read request is specifically used to read the target data from the storage resource pool formed by the storage mounted by each blockchain node in the blockchain network, that is, to download the target data from the storage resource pool of the blockchain network 200. Therefore, This read request is also called a data download request.
  • Step 2 The target data management device in the distributed data management system verifies the signature based on the read request, obtains access rights from the blockchain network, and verifies the access rights. If the verification passes, go to step 3; if the verification fails, go to step 7.
  • the access rights of different data can be stored in the distributed ledger of the blockchain network.
  • the target data management device can obtain the access rights of the target data read by the read request and determine whether the current user has the access rights. If it is present, the access permission verification passes; if not, the access permission verification fails.
  • step 3 When the signature verification passes and the access permission verification passes, step 3 can be performed.
  • step: 7 is performed.
  • Step 3 The target data management device obtains the storage address of the data fragment from the blockchain network.
  • Step 4 The target data management device determines the target path from the storage address of each copy of the data fragment, obtains the data fragment from the storage resource pool based on the target path, and then obtains the on-chain hash value and local hash of each data fragment. The value is verified based on the hash value on the chain and the local hash value to ensure the accuracy of data sharding.
  • Step 5 The target data management device aggregates the data shards and then compares the hash value of the aggregated data with the hash value of the target data stored on the chain. When the hash value of the aggregated data is consistent with the hash value of the target data stored on the chain, step 6 is performed.
  • Step 6 The target data management device returns the target data and corresponding transaction data.
  • Step 7 The target data management device returns a verification failure notification.
  • the verification failure notification is used to indicate verification failure, and the blockchain client can resend the data download request to re-download the target data.
  • the blockchain client can use the data acquisition interface (also called the data download interface) to query transaction data through hash values and data attributes.
  • the target data management device of the distributed data management system verifies the signature based on the blockchain system, obtains the shard address from the chain, pulls the data shards and aggregates the data shards, and aggregates the aggregated data
  • the hash value is compared and verified with the hash value of the target data stored on the chain. If the verification passes, the target data will be returned to the blockchain client. If the verification fails, other data shards will be read for aggregation and verification will continue.
  • Step 1 The target data management device periodically accesses the blockchain node to obtain meta-information of data fragments in local storage.
  • the meta-information obtained by the target data management device from the blockchain node is also called first meta-information.
  • the first metainformation includes one or more of the name, size, or hash value of the data fragment.
  • Step 2 The target data management device periodically accesses the local storage to obtain meta-information of the data fragments in the local storage.
  • the meta-information obtained by the target data management device from the local storage is also called second meta-information.
  • the second meta-information includes one or more of the name, size or hash value of the data fragment.
  • Step 3 The target data management device compares the meta-information obtained from the blockchain node and the meta-information obtained from the local storage. When the meta-information is consistent, proceed to step 4; when the meta-information is inconsistent, proceed to step 5.
  • Step 4 The target data management device determines that the storage is normal and records the event log.
  • Step 5 The target data management device determines the storage fault and writes the fault information to the blockchain node of the blockchain network.
  • the target data management device of the distributed data management system can ensure storage reliability by regularly detecting the locally mounted storage medium. Specifically, the target data management device can query the hash value or data attribute of the locally stored data fragment, and obtain its hash value and data attribute from the chain for comparison. If the local storage medium is lost or a certain number of stored fragments If the data is deleted or tampered with, the target data management device can determine a new storage address to store the data fragments, and notify other target data management devices of the fault information and the new storage address on the chain. If the hash value and data attributes are consistent, record the local log event and wait for the next polling time to check the data shards in the storage.
  • Step 1 The target data management device periodically accesses the blockchain node to obtain fault information.
  • Step 2 The target data management device determines whether the local storage of the target data management device is involved based on the fault information. If yes, perform step 3. If not, return to step 1 and wait for the next polling.
  • the fault message includes recommended storage addresses.
  • the storage address belongs to the storage (local storage) mounted on the target data management device, which indicates that the fault information involves the local storage of the target data management device.
  • Step 3 The target data management device determines the storage address based on the allocation policy. If the storage address is consistent with the storage address recommended in the fault information, perform step 4. If the storage address is inconsistent with the storage address recommended in the fault information, return to step 1 and wait for the next polling.
  • Step 4 The target data management device accesses the storage resource pool to pull a copy of the data fragment and verifies the hash value. When the verification passes, it writes the copy of the data fragment into local storage according to the recommended storage address.
  • Step 5 The target data management device updates the storage location to the blockchain node.
  • the distributed data management system can transmit fault information based on the blockchain network, regularly repair data fragmentation through multiple backup mechanisms, and achieve high-availability distributed storage capabilities, by regularly polling the fault information on the chain, and checking Is it recommended to use local storage? If it is recommended to use local storage, calculate the weight and verify the storage location. If the storage addresses are consistent, go to the backup storage node to pull the data shards. After verifying the hash value and data attributes, store it in the local storage. The media then eliminates the fault information and uploads the updated storage address to the chain.
  • the embodiment of the present application also provides a distributed data management system 100 as described above.
  • the distributed data management system 100 is introduced below with reference to the accompanying drawings.
  • the distributed data management system 100 includes multiple data management devices 10 .
  • the first data management device among the plurality of data management devices 10 corresponds to the first blockchain node of the blockchain network
  • the second data management device among the plurality of data management devices 10 corresponds to the second blockchain of the blockchain network. node.
  • the storage mounted by the first data management device and the storage mounted by the second data management device are used to form a storage resource pool of the blockchain network.
  • the target data management device in the plurality of data management devices 10 is used to receive a data operation request, and the data operation request is used to perform input and output IO operations on the target data;
  • the target data management device is also configured to obtain the storage addresses of multiple data fragments of the target data from the blockchain network according to the data operation request. According to the storage addresses of the multiple data fragments, The storage resource pool performs IO on the target data.
  • the data operation request is a write request
  • the target data management device is specifically used to:
  • the smart contract based on the blockchain network obtains the allocation strategy
  • allocation policy allocate storage resources from the storage resource pool to multiple data shards of the target data, and obtain the storage addresses of multiple data shards;
  • the storage address of at least one data fragment multiple data fragments are written into the storage resource pool, and the storage addresses of the multiple data fragments are stored in the distributed ledger of the blockchain network.
  • the target data management device is also used to:
  • the smart contract based on the blockchain network obtains the sharding strategy
  • the sharding strategy obtain the sharding algorithm, the number of shards, and the number of copies of each data shard;
  • the target data management device is specifically used for:
  • the target data is fragmented to obtain multiple data fragments of the target data
  • each copy of each data shard in the multiple data shards is written into the storage resource pool, and the storage address of each copy of each data shard is stored in the zone.
  • the distributed ledger of the blockchain network is not limited to the blockchain network.
  • each data shard includes multiple copies
  • the target data management device is specifically used for:
  • the target data management device is also used to:
  • the data operation request is a read request
  • the target data management device is specifically used to:
  • the read request obtain the storage addresses of multiple data shards of the target data from the distributed ledger of the blockchain network;
  • the target data management device is specifically used for:
  • the target data management device is also used to:
  • the smart contract based on the blockchain network obtains the aggregation strategy
  • the target data management device is specifically used for:
  • the target data management device is specifically used for:
  • the local hash value is obtained through the hash algorithm, and the on-chain hash value is the hash value stored in the blockchain network;
  • the target data management device is also used to:
  • the failure information is stored in the distributed ledger of the blockchain network.
  • the target data management device is also used to:
  • the fault information indicates that the data fragments in the storage mounted on the target data management device have been tampered with, deleted or lost, the data fragments are obtained from the storage mounted on other data management devices and stored locally;
  • the target data management device may be any one of the plurality of data management devices 10 , for example, it may be the above-mentioned first data management device or the second data management device.
  • the structure of the data management device is introduced below. As shown in Figure 9, the data management device 10 includes:
  • the communication module 102 is used to receive data operation requests, which are used to perform input and output IO operations on target data;
  • the management module 104 is also configured to obtain the storage addresses of multiple data fragments of the target data from the blockchain network according to the data operation request. According to the storage addresses of the multiple data fragments, the The storage resource pool performs IO on the target data.
  • the management module 104 is used to implement the storage allocation strategy function shown in Figure 1, Figure 2, and Figure 3. Based on the allocation strategy, the storage address of the data fragment can be determined, and based on the storage address, the storage resource pool pair is The target data is IO.
  • the above-mentioned communication module 102 and management module 104 can be implemented by hardware modules or software modules.
  • the communication module 102 and the management module 104 may be application programs or application program modules running on a computing device or a cluster of computing devices.
  • the communication module 102 can be implemented by a transceiver module such as a network interface card or a transceiver.
  • the management module 104 may be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the data operation request is a write request
  • the management module 104 is specifically used to:
  • allocation policy allocate storage resources from the storage resource pool to multiple data fragments of the target data, and obtain storage addresses of the multiple data fragments;
  • the management module 104 is also used to:
  • the sharding strategy obtain the sharding algorithm, the number of shards, and the number of copies of each data shard;
  • the management module is specifically used for:
  • the management module 104 is also used to implement the storage fragmentation strategy function shown in Figure 1 or Figure 2 and Figure 3. Based on the fragmentation strategy, the target data can be fragmented. Based on each data fragmentation of the target data, Use the storage address of the slice to perform IO on the target data in the storage resource pool.
  • each data shard includes multiple copies
  • the management module 104 is specifically used to:
  • the management module 104 is also used to:
  • the data operation request is a read request
  • the management module 104 is specifically used to:
  • the management module 104 is also used to:
  • the management module 104 is specifically used to:
  • the management module 104 is also used to implement the storage aggregation strategy function shown in Figure 1 or Figure 2 and Figure 3. Based on the aggregation strategy, each data fragment of the target data can be aggregated to restore the target data, This enables IO to be performed on the target data.
  • the management module 104 is specifically used to:
  • the local hash value is obtained through a hash algorithm.
  • the on-chain hash value is the hash stored in the blockchain network. value;
  • the aggregated data is the target data.
  • the management module 104 is also used to implement the data calculation and verification function shown in Figure 1 or Figure 2 or Figure 3 . Specifically, the management module 104 can calculate the local hash value, and then compare the local hash value with the on-chain hash value to implement data calculation verification, thereby ensuring the accuracy of the target data IO.
  • the data management device 10 further includes:
  • the fault checking module 106 is used to obtain the target data from the blockchain node corresponding to the target data management device.
  • the above-mentioned fault checking module 106 can be implemented by a hardware module or a software module.
  • the fault checking module 106 may be an application or application module running on a computing device or cluster of computing devices.
  • the fault checking module 106 may be a device implemented using an application specific integrated circuit (ASIC), or a programmable logic device (PLD), or the like.
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be implemented by a complex program logic device CPLD, a field programmable gate array FPGA, a general array logic GAL, or any combination thereof.
  • the data management device 10 further includes:
  • the fault recovery module 108 is used to read fault information from the blockchain network.
  • the fault information indicates that the data fragments in the storage mounted on the target data management device have been tampered with, deleted or lost, the fault information is retrieved from other data.
  • the data fragments are obtained from the storage mounted on the management device, stored locally, and the updated storage addresses are stored in the distributed ledger of the blockchain network.
  • fault recovery module 108 can be implemented by a hardware module or a software module.
  • the fault recovery module 108 may be an application or application module running on a computing device or cluster of computing devices.
  • the fault recovery module 108 may be a device implemented using an application specific integrated circuit (ASIC), or a programmable logic device (PLD), or the like.
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be implemented by a complex program logic device CPLD, a field programmable gate array FPGA, a general array logic GAL, or any combination thereof.
  • computing device 1000 includes: bus 1002, processor 1004, memory 1006, and communication interface 1008.
  • the processor 1004, the memory 1006 and the communication interface 1008 communicate through the bus 1002.
  • the computing device 1000 may be a computing device in a central cloud, such as a central server, or a computing device in an edge cloud, such as an edge server.
  • the computing device 1000 may also be a lightweight device, such as a smart phone, a smart wearable device and other terminal devices. It should be understood that this application does not limit the number of processors and memories in the computing device 1000.
  • the bus 1002 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 10, but it does not mean that there is only one bus or one type of bus.
  • Bus 1002 may include a path that carries information between various components of computing device 1000 (eg, memory 1006, processor 1004, communications interface 1008).
  • the processor 1004 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1006 may include volatile memory, such as random access memory (RAM).
  • the processor 1004 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • Executable program code is stored in the memory 1006, and the processor 1004 The executable program code is executed to implement the aforementioned data processing method.
  • the memory 1006 stores instructions for the distributed data management system 100 or the data management device 10 to execute the data processing method.
  • the communication interface 1008 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1000 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device 1000.
  • the computing device 1000 may be a server, such as a central server or an edge server. In some embodiments, computing device 1000 may also be a terminal device.
  • the computing device cluster includes at least one computing device 1000 .
  • Instructions for performing the data processing method of the same distributed data management system 100 may be stored in the memory 1006 of one or more computing devices 1000 in the computing device cluster.
  • one or more computing devices 1000 in the computing device cluster may also be used to execute part of the instructions of the distributed data management system 100 for executing the data processing method.
  • a combination of one or more computing devices 1000 may collectively execute instructions of the distributed data management system 100 for performing the data processing method.
  • the memory 1006 in different computing devices 1000 in the computing device cluster can store different instructions for executing part of the functions of the distributed data management system 100 .
  • Figure 12 shows a possible implementation.
  • two computing devices 1000A and 1000B are connected through a communication interface 1008 .
  • Instructions for executing the functions of the communication module 102 and the management module 104 are stored on the memory in the computing device 1000A.
  • Instructions for the functions of fault detection module 106 and fault recovery module 108 are stored on memory in computing device 1000B.
  • the memories 1006 of the computing devices 1000A and 1000B jointly store instructions for the distributed data management system 100 to perform the data processing method.
  • connection method between the computing device clusters shown in Figure 12 can be considered that the data processing method provided by this application needs to scan the distributed ledger maintained by the blockchain node in the blockchain network when performing fault check. When the fault is restored, , need to read the fault information stored in the blockchain node. Therefore, it is considered that the functions implemented by the communication module 102 and the management module 104 are executed by the computing device 1000A, and the functions implemented by the fault checking module 106 and the fault recovery module 108 are executed by the computing device 1000B.
  • computing device 1000A shown in FIG. 12 may also be performed by multiple computing devices 1000.
  • computing device 1000B may also be performed by multiple computing devices 1000 .
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 13 shows a possible implementation. As shown in Figure 13, two computing devices 1000C and 1000D are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • instructions for executing the functions of the communication module 102 and the management module 104 are stored in the memory 1006 of the computing device 1000C.
  • instructions for performing the functions of the fault checking module 106 and the fault recovery module 108 are stored in the memory 1006 in the computing device 1000D.
  • connection method between the computing device clusters shown in Figure 13 can be considering that the data processing method provided by this application needs to scan the distributed ledger maintained by the blockchain node in the blockchain network, or read the storage in the blockchain node. Therefore, it is considered that the functions implemented by the communication module 102 and the management module 104 are executed by the computing device 1000C, and the functions implemented by the fault checking module 106 and the fault recovery module 108 are executed by the computing device 1000D. It should be understood that Figure 13 The functions of computing device 1000C shown in may also be performed by multiple computing devices 1000. Likewise, the functions of computing device 1000D may also be performed by multiple computing devices 1000.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct the computing device to perform the above-mentioned data processing method applied to the distributed data management system 100.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or cluster of computing devices or stored in any available medium.
  • the computer program product is run on at least one computing device (computing device or computing device cluster), at least one computing device is caused to execute the above data processing method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据处理方法,应用于分布式数据管理系统,该系统包括多个数据管理装置。多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,第二数据管理装置对应区块链网络的第二区块链节点。第一数据管理装置挂载的存储和第二数据管理装置挂载的存储,用于形成区块链网络的存储资源池,该方法包括:多个数据管理装置中的目标数据管理装置接收数据操作请求,目标数据管理装置根据数据操作请求,从区块链网络获取目标数据的多个数据分片的存储地址,根据多个数据分片的存储地址在存储资源池对所述目标数据进行IO。与存储资源池的交互均需数据管理装置进行处理,保障数据一致性,提升了数据的安全性、可用性、可访问性。

Description

一种数据处理方法及相关设备
本申请要求于2022年06月30日提交中国国家知识产权局、申请号为202210770817.1、发明名称为“富媒体存储的方法、装置、服务器及存储介质”的中国专利申请的优先权,以及要求于2022年08月16日提交中国国家知识产权局、申请号为202210983123.6、发明名称为“一种数据处理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及区块链技术领域,尤其涉及一种数据处理方法、系统、装置、计算设备集群、计算机可读存储介质、计算机程序产品。
背景技术
区块链(blockchain)技术是利用块链式数据结构来验证和存储数据,利用分布式节点共识算法来生成和更新数据,利用密码学的方式保证数据传输和访问的安全,以及利用由自动化脚本代码组成的智能合约来编程和操作数据的一种去中心化架构与计算范式。
基于区块链技术构建的网络称作区块链网络,区块链网络中的节点共同维护分布式账本,该分布式账本作为存储载体,一般存储键值或关系数据等一系列简单数据结构。随着区块链广泛应用于金融、能源、政务、航空、农业、民生、物流等行业,与行业相关的数据,如视频、音频、图像等富媒体数据或者建模文件等大数据,对高可靠性的链上存储的需求日益增长。
如果将上述富媒体数据或大数据直接上链,将占用大量的链上资源。基于此,业界提出了链上存储与链下存储结合的存储方式。具体地,将富媒体数据或大数据存储到链下的存储系统,同时将上述数据的哈希值上链。如此,用户可以通过获取链上的哈希值,以及从链下的存储系统中获取数据,计算数据的哈希值,将链上的哈希值与链下计算得到的哈希值进行比较,从而保证数据一致性。
然而,客户端、传输网络、存储网络等可能存在稳定性和安全性风险,由此可以导致数据不一致、数据被篡改等问题,难以满足业务需求。
发明内容
本申请提供了一种数据处理方法,该方法通过引入分布式数据管理系统对数据的上传、下载等进行管理,具体是通过分布式数据管理系统中的数据管理装置与各数据管理装置挂载的存储所形成的区块链网络的存储资源池进行交互,实现数据的上传、下载等输入输出操作,并在区块链网络中记录数据分片的存储地址等相关信息。即使客户端、传输网络、存储网络因稳定性或安全性问题导致数据不一致,也可以基于链上存储的数据副本的存储地址进行数据恢复,保障数据一致性,提升了数据的安全性、可用性、可访问性。本申请还提供了上述方法对应的分布式数据管理系统、数据管理装置、计算设备集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供一种数据处理方法。该方法应用于分布式数据管理系统,分布式数据管理系统包括多个数据管理装置。多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,多个数据管理装置中的第二数据管理装置对应区块链网络的第二区块链节点。第一数据管理装置挂载的存储和第二数据管理装置挂载的存储,用于形成区块链网络的存储资源池。
其中,多个数据管理装置中的目标数据管理装置可以接收数据操作请求,数据操作请求用于对目标数据进行输入输出IO操作,然后目标数据管理装置根据数据操作请求,从区块链网络获取目标数据的多个数据分片的存储地址,根据多个数据分片的存储地址在存储资源池对所述目标数据进行IO。
在该方法中,存储资源池由分布式数据管理系统进行管理,所有与存储资源池的交互(如对目标数据进行IO操作)均需分布式数据管理系统中的数据管理装置进行处理,并由数据管理装置将IO操作的目标数据的存储地址也进行上链。即使客户端、传输网络、存储网络因稳定性或安全性问题导致数据不一致,也可以基于链上存储的数据副本的存储地址进行数据恢复,保障数据一致性,提升了数据的安全性、可用性、可访问性。此外,该方法将IO操作的相关信息上链,也可以实现操作可追溯。
在一些可能的实现方式中,数据操作请求为写请求,写请求用于将写入目标数据,也即上传目标数据。相应地,目标数据管理装置可以根据数据操作请求,基于区块链网络的智能合约获取分配策略,然后目标数据管理装置根据该分配策略,从存储资源池为目标数据的多个数据分片分配存储资源,获得多个数据分片的存储地址。目标数据管理装置可以根据至少一个数据分片的存储地址,将多个数据分片写入存储资源池,并将多个数据分片的存储地址存储至所述区块链网络的分布式账本。
该方法为区块链网络提供分布式数据管理系统,通过分布式数据管理系统中的数据管理装置确定分配策略,按照分配策略分散存储目标数据的多个数据分片,满足分布式管理需求,避免集中管理中出现的管理者作恶风险,构造可信体系。
在一些可能的实现方式中,目标数据管理装置可以根据分配策略,结合各个存储的容量、带宽及历史故障记录中的至少一种,确定不同存储资源的权重,基于权重为数据分片分配存储资源,从而获得各数据分片的存储地址。通过按照上述方法确定的存储地址,存储目标数据的数据分片,可以减少数据存储和读取时间,并且减少存储空间浪费。
在一些可能的实现方式中,分布式数据管理系统在上传数据过程中,目标数据管理装置还可以根据数据操作请求,基于区块链网络的智能合约获取分片策略。然后目标数据管理装置可以根据分片策略,获得分片算法、分片数量和每个数据分片的副本数量。相应地,目标数据管理装置在进行IO时,可以根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片,然后根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
该方法通过按照区块链网络中获取的分片策略对目标数据进行切分,获得多个数据分片,然后将多个数据分片分布式地存储在存储资源池,如此可以提高目标数据的存储(上传)或读取(下载)效率。
在一些可能的实现方式中,每个数据分片包括多个副本,即使数据分片的若干副本丢失或被删除、被篡改,也能基于其他副本恢复数据。目标数据管理装置在将数据分片的各个副本写入所述存储资源池时,可以将每个数据分片的多个副本写入存储资源池的不同类型存储介质。如此,即使某种或某些类型的存储介质发生故障,也能够通过其他类型存储介质所存储的副本恢复数据,提高了存储可靠性,保障了数据安全。
在一些可能的实现方式中,数据分片的副本数量等于区块链节点数量。也即,针对目标数据的每个数据分片,目标数据管理装置可以在区块链网络的各个区块链节点所对应的数据管理装置所挂载的存储均存储一个副本,从而达到如同将数据分片存储在区块链网络上的效果,而且无需大量占用区块链网络的链上存储资源,通过较低的存储成本保障了存储可靠性。
在一些可能的实现方式中,目标数据管理装置还可以确定目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个。其中,数据属性可以包括创建者、创建时间、主题中的一个或多个。然后目标数据管理装置可以将目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个存储至区块链网络的分布式账本。
如此,在查询数据时,可以支持根据目标数据的哈希值、数据分片的哈希值、目标数据的数据属性进行数据查询,一方面可以加快查询效率,另一方面,可以保障查询准确度。
在一些可能的实现方式中,数据操作请求可以为读请求,读请求用于将读取目标数据,也即下载目标数据。具体地,目标数据管理装置可以根据读请求,从区块链网络的分布式账本获取目标数据的多个数据分片的存储地址,然后目标数据管理装置根据多个数据分片的存储地址,从存储资源池获取多个数据分片,接着目标数据管理装置对多个数据分片进行聚合,获得所述目标数据。
该方法中,目标数据管理装置借助区块链网络,从存储资源池并发读取多个数据分片,基于多个数据分片获得目标数据,提高了数据读取(下载)效率。而且,该方法通过区块链网络保障了读取数据的一致性。
在一些可能的实现方式中,存储资源池可以存储数据分片的多个副本。相应地,目标数据管理装置在从存储资源池读取数据分片时,可以基于智能合约从区块链网络获取分配策略,并根据分配策略,结合各个存储的容量、带宽及历史故障记录中的至少一种,确定不同存储资源的权重,基于该权重可以从多条路径中确定目标路径。该目标路径可以访问数据分片的多条路径中成本最小或时延最低的路径。目标数据管理装置可以访问目标路径,获得各数据分片。如此可以进一步地缩短目标数据的读取时延,降低目标数据的读取成本。
在一些可能的实现方式中,目标数据管理装置可以根据数据操作请求,基于区块链网络的智能合约获取聚合策略。相应地,目标数据管理装置可以根据该聚合策略对所述多个数据分片进行聚合,获得目标数据。
该方法借助链上存储的聚合策略对数据分片进行聚合,获得目标数据。如果存储资源池中的部分数据分片被篡改、删除或者丢失,可以及时获取数据分片的副本,并进行聚合,从而保障数据的一致性。
在一些可能的实现方式中,目标数据管理装置可以获取多个数据分片的本地哈希值和 链上哈希值。其中,本地哈希值可以通过哈希算法得到,例如由数据管理装置基于本地存储的数据分片的内容,通过哈希算法计算得到。链上哈希值为存储在区块链网络中的哈希值。目标数据管理装置可以先基于本地哈希值或链上哈希值进行校验,从而提前检测出被篡改、删除或丢失的数据分片。
当目标数据管理装置确定本地哈希值与链上哈希值匹配,启动对多个数据分片的聚合,获得聚合数据。然后目标数据管理装置可以确定聚合数据的哈希值,以及从区块链网络获取目标数据的哈希值。目标数据管理装置可以基于聚合数据的哈希值或目标数据的哈希值进行校验。当聚合数据的哈希值与目标数据的哈希值匹配,确定聚合数据为目标数据。
如此,可以通过哈希值校验保障读取的目标数据的准确度。
在一些可能的实现方式中,目标数据管理装置可以从目标数据管理装置对应的区块链节点获取目标数据管理装置挂载的存储中数据分片的第一元信息,以及从目标数据管理装置挂载的存储中获取所述数据分片的第二元信息。当第一信息与第二元信息不匹配,所述目标数据管理装置确定发生故障,存储故障信息至所述区块链网络的分布式账本。
其中,目标数据管理装置可以周期性地扫描区块来拿节点以及该装置挂载的本地存储,通过对区块链节点存储的数据分片的元信息、本地存储的数据分片的元信息进行校验,从而加快故障检查速度,提高检查效率,进而为故障恢复提供帮助。
在一些可能的实现方式中,目标数据管理装置可以从区块链网络读取故障信息。当故障信息表征目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,目标数据管理装置可以从其他数据管理装置挂载的存储中获取数据分片,并进行本地存储,然后目标数据管理装置将更新后的存储地址存储至所述区块链网络的分布式账本。
目标数据管理装置通过读取链上存储的故障信息,并基于与当前装置相关的故障信息,进行故障恢复,保障了数据的一致性。
第二方面,本申请提供一种分布式数据管理系统。所述分布式数据管理系统包括多个数据管理装置;所述多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,所述多个数据管理装置中的第二数据管理装置对应区块链网络的第二区块链节点;所述第一数据管理装置挂载的存储和所述第二数据管理装置挂载的存储,用于形成所述区块链网络的存储资源池;
所述多个数据管理装置中的目标数据管理装置,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
所述目标数据管理装置,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
在一些可能的实现方式中,所述数据操作请求为写请求,所述目标数据管理装置具体用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略;
根据所述分配策略,从所述存储资源池为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址;
根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池, 并将所述多个数据分片的存储地址存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述目标数据管理装置还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分片策略;
根据所述分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
所述目标数据管理装置具体用于:
根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片;
根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,每个数据分片包括多个副本;
所述目标数据管理装置具体用于:
将每个数据分片的多个副本写入所述存储资源池的不同类型存储介质。
在一些可能的实现方式中,所述目标数据管理装置还用于:
确定所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个;
将所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述数据操作请求为读请求,所述目标数据管理装置具体用于:
根据所述读请求,从所述区块链网络的分布式账本获取所述目标数据的多个数据分片的存储地址;
所述目标数据管理装置具体用于:
根据所述多个数据分片的存储地址,从所述存储资源池获取所述多个数据分片;
对所述多个数据分片进行聚合,获得所述目标数据。
在一些可能的实现方式中,所述目标数据管理装置还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取聚合策略;
所述目标数据管理装置具体用于:
根据所述聚合策略对所述多个数据分片进行聚合,获得所述目标数据。
在一些可能的实现方式中,所述目标数据管理装置具体用于:
获取所述多个数据分片的本地哈希值和链上哈希值,所述本地哈希值通过哈希算法得到,所述链上哈希值为存储在区块链网络中的哈希值;
确定所述本地哈希值与所述链上哈希值匹配,启动对所述多个数据分片的聚合,获得聚合数据;
确定所述聚合数据的哈希值,以及从所述区块链网络获取所述目标数据的哈希值,当所述聚合数据的哈希值与所述目标数据的哈希值匹配,确定所述聚合数据为所述目标数据。
在一些可能的实现方式中,所述目标数据管理装置还用于:
从所述目标数据管理装置对应的区块链节点获取所述目标数据管理装置挂载的存储中 数据分片的第一元信息,以及从所述目标数据管理装置挂载的存储中获取所述数据分片的第二元信息;
当所述第一信息与所述第二元信息不匹配,确定发生故障,存储故障信息至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述目标数据管理装置还用于:
从所述区块链网络读取故障信息;
当所述故障信息表征所述目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,从其他数据管理装置挂载的存储中获取所述数据分片,并进行本地存储;
将更新后的存储地址存储至所述区块链网络的分布式账本。
第三方面,本申请提供一种数据管理装置。所述数据管理装置对应区块链网络中的区块链节点,所述数据管理装置挂载的存储和分布式数据管理系统中其他数据管理装置挂载的存储,用于形成所述区块链网络的存储资源池,所述数据管理装置包括:
通信模块,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
管理模块,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
在一些可能的实现方式中,数据操作请求为写请求,管理模块具体用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略;
根据所述分配策略,从所述存储资源池为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址;
根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池,并将所述多个数据分片的存储地址存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述管理模块还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分片策略;
根据所述分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
所述管理模块具体用于:
根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片;
根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,每个数据分片包括多个副本;
所述管理模块具体用于:
将每个数据分片的多个副本写入所述存储资源池的不同类型存储介质。
在一些可能的实现方式中,所述管理模块还用于:
确定所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个;
将所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述数据操作请求为读请求,所述管理模块具体用于:
根据所述读请求,从所述区块链网络的分布式账本获取所述目标数据的多个数据分片的存储地址;
根据所述多个数据分片的存储地址,从所述存储资源池获取所述多个数据分片;
对所述多个数据分片进行聚合,获得所述目标数据。
在一些可能的实现方式中,所述管理模块还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取聚合策略;
所述管理模块具体用于:
根据所述聚合策略对所述多个数据分片进行聚合,获得所述目标数据。
在一些可能的实现方式中,所述管理模块具体用于:
获取所述多个数据分片的本地哈希值和链上哈希值,所述本地哈希值通过哈希算法得到,所述链上哈希值为存储在区块链网络中的哈希值;
确定所述本地哈希值与所述链上哈希值匹配,启动对所述多个数据分片的聚合,获得聚合数据;
确定所述聚合数据的哈希值,以及从所述区块链网络获取所述目标数据的哈希值,当所述聚合数据的哈希值与所述目标数据的哈希值匹配,确定所述聚合数据为所述目标数据。
在一些可能的实现方式中,所述数据管理装置还包括:
故障检查模块,用于从所述目标数据管理装置对应的区块链节点获取所述目标数据管理装置挂载的存储中数据分片的第一元信息,以及从所述目标数据管理装置挂载的存储中获取所述数据分片的第二元信息;当所述第一信息与所述第二元信息不匹配,确定发生故障,存储故障信息至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述数据管理装置还包括:
故障恢复模块,用于从所述区块链网络读取故障信息,当所述故障信息表征所述目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,从其他数据管理装置挂载的存储中获取所述数据分片,并进行本地存储,将更新后的存储地址存储至所述区块链网络的分布式账本。
第四方面,本申请提供一种计算设备集群。所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算设备或计算设备集群执行如第一方面或第一方面的任一种实现方式所述的数据处理方法。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的数据处理方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备或计算设备集群上运行时,使得计算设备或计算设备集群执行上述第一方面或第一方面的任一种实 现方式所述的数据处理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种分布式数据管理系统的架构示意图;
图2为本申请实施例提供的一种分布式数据管理系统的架构示意图;
图3为本申请实施例提供的一种多场景联盟中分布式数据管理系统的架构示意图;
图4为本申请实施例提供的一种数据处理方法的流程图;
图5为本申请实施例提供的一种数据上传的流程示意图;
图6为本申请实施例提供的一种数据下载的流程示意图;
图7为本申请实施例提供的一种故障检查的流程示意图;
图8为本申请实施例提供的一种故障恢复的流程示意图;
图9为本申请实施例提供的一种分布式数据管理系统的结构示意图;
图10为本申请实施例提供的一种计算设备的结构示意图;
图11为本申请实施例提供的一种计算设备集群的结构示意图;
图12为本申请实施例提供的一种计算设备集群的结构示意图;
图13为本申请实施例提供的一种计算设备集群的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
区块链网络,也可以简称为区块链,是指基于区块链技术构建的对等(peer to peer,P2P)网络。区块链网络包括多个区块链节点,每个区块链节点为对等节点。在区块链网络中,多个区块链节点共同维护一个持续增长,由有序数据块所构建的链式列表账本。每个区块链节点存储上述链式列表账本的副本,并保持副本之间的一致性,因此,链式列表账本也称作区块链网络的分布式账本。
区块链网络可以根据读写权限的开放程度不同,分为公有链(public blockchain)、私有链(Private Blockchain)或联盟链(Consortium Blockchain)。公有链即为公有的区块链网络,读写权限对所有节点开放;私有链即为私有的区块链网络,读写权限对某个节点开放;联盟链即为联盟区块链,读写权限对加入联盟的节点(联盟内成员)开放。
区块链网络的分布式账本通常用于存储键值数据、关系数据等简单数据结构。随着区块链技术广泛应用于金融、能源、政务、航空、农业、民生、物流等行业,与行业相关的数据,如视频、音频、图像等富媒体数据或者建模文件等大数据,对高可靠性的链上存储 的需求日益增长。
考虑到链上存储需要占用大量的资源,可以将富媒体数据(如视频、音频、图像)或大数据(如建模文件)等大规模数据存储到链下的存储系统,同时将上述数据的哈希值上链。如此,用户可以通过比较链上的哈希值以及链下存储的数据计算得到的哈希值,保证数据一致性。然而,客户端、传输网络、存储网络等可能存在稳定性和安全性风险,由此可以导致数据不一致、数据被篡改等问题,难以满足业务需求。
有鉴于此,本申请实施例提供了一种数据处理方法。该方法可以应用于分布式数据管理系统。管理系统包括多个数据管理装置。每个数据管理装置为分布式数据管理系统的一部分。其中,分布式数据管理系统实质是一种分布式存储引擎,主要用于对富媒体数据的存储进行管理,因此,分布式数据管理系统也可以称作分布式富媒体引擎,分布式数据管理系统中的数据管理装置为上述分布式富媒体引擎的一部分。多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,多个数据管理装置中的第二数据管理装置对应区块链网络中的第二区块链节点。第一数据管理装置挂载的存储和第二数据管理装置挂载的存储用于形成区块链网络的存储资源池。
具体地,多个数据管理装置中的目标数据管理可以接收数据操作请求,该数据操作请求用于对目标数据进行输入输出(input output,IO)操作,然后目标数据管理装置可以根据数据操作请求,从区块链网络获取目标数据的多个数据分片(有些情况下也可以简称为分片)的存储地址,根据多个数据分片的存储地址在存储资源池对目标数据进行IO。
在该方法中,存储资源池由分布式数据管理系统进行管理,所有与存储资源池的交互(如对目标数据进行IO操作)均需分布式数据管理系统中的数据管理装置进行处理,并由数据管理装置将IO操作的目标数据的存储地址也进行上链。即使客户端、传输网络、存储网络因稳定性或安全性问题导致数据不一致,也可以基于链上存储的数据副本的存储地址进行数据恢复,保障数据一致性,提升了数据的安全性、可用性、可访问性。此外,该方法将IO操作的相关信息上链,也可以实现操作可追溯。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对本申请实施例的系统架构进行介绍。
参见图1所示的分布式数据管理系统的架构示意图,分布式数据管理系统100包括多个数据管理装置10,多个数据管理装置10中的每个数据管理装置10分别对应区块链网络200的一个区块链节点20,每个数据管理装置10挂载有存储30。需要说明,本申请实施例的数据管理装置10支持对不同存储介质的纳管和适配,例如数据管理装置10可以挂载不同存储介质,包括但不限于机械硬盘(hard disk drive,HDD)或者固态硬盘(solid state drive,SDD)。多个数据管理装置10挂载的存储30可以用于形成区块链网络的存储资源池300。
在图1的示例中,数据管理装置10还可以对接区块链客户端40。区块链的参与方如云上的租户可以将富媒体数据或大数据等大规模数据通过区块链客户端40写入存储资源池,或者通过区块链客户端40,从存储资源池300读取富媒体数据或大数据等大规模数据。
具体地,数据管理装置10用于接收数据操作请求,例如是租户通过区块链客户端40发送的数据操作请求,该数据操作请求用于对目标数据进行输入输出IO操作,数据管理装 置10根据该数据操作请求,从区块链网络获取目标数据的多个数据分片的存储地址,根据多个数据分片的存储地址对所述目标数据进行IO。例如,数据操作请求为写请求时,数据管理装置10可以对目标数据进行分片,然后确定各个数据分片的存储地址,按照该存储地址存储各个数据分片。数据管理装置10除了将目标数据的哈希值、数据分片的哈希值上链,还将数据分片的存储地址也进行上链。又例如,数据操作请求为读请求时,数据管理装置10可以从区块链网络获取数据分片的存储地址,根据该存储地址获取数据分片,然后对数据分片进行聚合获得目标数据。需要说明,数据管理装置10可以在聚合前对数据分片的哈希值分别进行校验,具体是根据数据分片计算哈希值,并将该哈希值与链上哈希值进行比较,从而实现校验。类似地,数据管理装置10还可以在聚合后对聚合数据的哈希值进行校验,从而确定聚合数据是否为目标数据。
针对存储资源分散、各自管理导致的数据易丢失、易篡改等问题,本申请实施例的数据管理装置10还提出了相应的定制化合约对存储分片策略、存储分配策略(也可以称作存储分片路由)、存储聚合策略(聚合策略是指聚合数据分片的策略)提供接口,例如是应用程序编程接口(Application Programming Interface,API)供各个分布式存储引擎使用。分布式数据管理系统100的数据管理装置10可以利用区块链网络的智能合约,对存储分片策略、存储分配策略、存储聚合策略进行共识。如此数据管理装置在进行数据IO时,可以基于分配策略,结合存储的剩余储量,分片类型,分片数量,带宽,历史故障次数等计算出分片存储位置(通过存储地址标识的存储位置),减少数据IO时间(存储或读取时间)和减少存储空间浪费。
并且,数据管理装置10通过智能合约追溯IO操作,且将存储分配策略、存储分配策略、存储聚合策略及其执行逻辑交由合约共识处理,由多方背书结果认可当前存储写入或读取动作,保障了数据安全性,避免数据被篡改导致存储不一致或者故障。进一步地,数据管理装置10通过智能合约定义不同的数据分片算法,将数据分成无法读取的数据分片,在存储介质中也无法获取任何数据,由数据管理装置10读取不同存储介质中的分片并聚合后返回给区块链客户端。一方面可以扩展分片方式,并自动对数据分片进行聚合,简化了用户操作,另一方面,通过将数据切分为不可读取的数据分片分散存储在不同数据管理装置10管理的存储介质中,任何一个数据管理装置10无法单独获取到数据,由此保障了数据隐私安全。
图1所示的数据管理装置10可以是软件装置,该软件装置可以部署在独立于区块链节点的其他计算设备上。图1所示的数据管理装置也可以是硬件装置,例如该硬件装置可以为独立于区块链节点的、具有富媒体数据等大规模数据管理功能的计算设备。
在一些可能的实现方式中,参见图2所示的分布式数据管理系统100的架构示意图,分布式数据管理系统100的各个数据管理装置10也可以部署在区块链节点20,也即区块链节点20中包括区块链内核和数据管理装置10。其中,数据管理装置10可以是中间件或组件,该中间件或组件可以集成到区块链节点20中。
本申请实施例的分布式数据管理系统100可以应用于金融、能源、政务、航空、农业、民生、物流等行业。例如,分布式数据管理系统100可以应用于富媒体数据存证、文件存 证、数字资产存证、非同质化代币(Non-fungible token,NFT)交易等场景。而且,该分布式数据管理系统100可以作为分布式存储底层,支持元宇宙或web3.0。
分布式数据管理系统100应用于上述场景中时,支持部署在私有云、公有云、混合云或边缘节点中。其中,公有云是指云服务提供商通过公共互联网(Internet)为用户提供的云服务,用户可以通过Internet访问云并享受各类服务,包括并不限于计算、存储、网络等。私有云是企业自己建设的为企业内部提供服务的一种云计算使用方式,私有云为一个企业单独使用而构建,可部署在企业的数据中心中,也可统一部署在云服务提供商的机房。混合云是将私有云和公有云结合的一种云计算使用方式。边缘节点是相对于云计算数据中心的,指与最终接入的用户之间具有较少中间环节的网络节点。边缘节点可以是某个机房或者某个物理设备,相对于直接访问源站而言,用户访问边缘节点时有更好的响应能力和连接速度。
在一些可能的实现方式中,分布式数据管理系统100也可以分布式地部署在不同环境中。参见图3所示的分布式数据管理系统100的架构示意图,分布式数据管理系统100的多个数据管理装置10可以分别部署在公有云、混合云、边缘节点中,从而实现为多场景联盟提供数据管理服务。
基于本申请实施例提供的分布式数据管理系统100,本申请实施例还提供了相应的数据处理方法。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对本申请实施例的数据处理方法进行介绍。
参见图4所示的数据处理方法的流程图,该方法包括:
S402:目标数据管理装置接收区块链客户端发送的数据操作请求。
目标数据管理装置可以是分布式数据管理系统100中的任意一个数据管理装置10,例如可以是上述第一数据管理装置,或第二数据管理装置。
数据操作请求用于对目标数据进行输入输出IO操作。其中,数据操作请求可以是写请求,写请求用于写入(存储)目标数据。数据操作请求也可以是读请求,读请求用于读取目标数据。基于此,数据操作请求中可以包括操作类型,如读或写,用于指示写入或读取目标数据。数据操作请求中还包括目标数据的元信息,该元信息例如可以是目标数据的名称。以目标数据为富媒体文件示例说明,数据操作请求中包括操作类型和富媒体文件的文件名。
S404:目标数据管理装置根据数据操作请求,从区块链网络200获取目标数据的数据分片的存储地址。
在本实施例中,目标数据以数据分片形式分散存储,具体是采用分布式存储方式存储在存储资源池。基于此,目标数据管理装置可以根据数据操作请求,基于区块链网络200的智能合约,先从区块链网络200获取目标数据的数据分片的存储地址。下面分别对写入目标数据和读取目标数据的情况进行示例说明。
当数据操作请求为写请求,目标数据管理装置可以根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略,然后目标数据管理装置根据所述分配策略,从所述 存储资源池300为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址。
其中,分配策略可以是基于权重的分配策略。基于权重的分配策略具体可以是基于各存储30的剩余储量,分片类型,分片数量,带宽,历史故障次数确定各存储30的权重,基于各存储30的权重可以确定分片存储位置。其中,分片策略可以保障每个数据分片在两到三个存储介质上存在,由此避免个别存储介质故障导致数据丢失,保障数据安全。其中,数据分片的存储地址可以被生成索引表记录进区块链账本。
当数据操作请求为读请求,由于读请求所要读取的目标数据在写入过程中,该目标数据的多个数据分片的存储地址也被存储至区块链网络(上链),目标数据管理装置可以根据所述读请求,从所述区块链网络200的分布式账本获取所述目标数据的多个数据分片的存储地址。
S406:目标数据管理装置根据目标数据的多个数据分片的存储地址,对目标数据进行IO。
当数据操作请求为写请求,目标数据管理装置可以根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池300,并将所述多个数据分片的存储地址存储至所述区块链网络200的分布式账本。
进一步地,在基于所述区块链网络200的智能合约获取分配策略之前,目标数据管理装置还可以根据数据操作请求,基于所述区块链网络200的智能合约获取分片策略,相应地,目标数据管理装置可以根据分片策略,获得分片算法、分片数量和每个数据分片的副本数量。根据数据类型不同,分片算法可以是不同的。例如,数据类型为视频文件时,分片算法可以包括自由分割,平均分割,按时长分割,按文件大小分割中的一种或多种。分片数量可以根据目标数据的大小、存储资源池300中存储节点的数量确定。每个数据分片的副本数量可以根据目标数据的可靠性需求确定,例如,目标数据的可靠性需求较高时,每个数据分片可以采用三副本存储,也即每个数据分片的副本数量可以为3。
相应地,目标数据管理装置可以根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入存储资源池300。为了便于后续读取,目标数据管理装置还可以将每个数据分片的各个副本的存储地址存储至区块链网络200。其中,目标数据管理装置可以基于智能合约将每个数据分片的各个副本的存储地址记录至区块链网络200的分布式账本。
进一步地,针对数据分片的多个副本,目标数据管理装置可以将每个数据分片的多个副本写入存储资源池300的不同类型存储介质。如此,即使某种存储介质发生故障,也可以基于其他存储介质中的副本进行故障恢复。
在本实施例中,为了便于后续数据读取或数据查询过程中对数据进行校验,目标数据管理装置还可以确定目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个,其中,目标数据的数据属性可以包括目标数据的创建者、创建时间、主题中的一个或多个。目标数据管理装置可以将目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个存储至区块链网络。与存储数据分片的存储地址类似,目标数据管理装置可以基于智能合约,将目标数据的哈希值、 数据分块的哈希值、目标数据的数据属性中的至少一个记录至分布式账本。
当数据操作请求为读请求,目标数据管理装置可以根据读请求,从区块链网络200(例如是区块链网络200的分布式账本)获取目标数据的多个数据分片的存储地址,然后目标数据管理装置可以根据多个数据分片的存储地址,从存储资源池300获取多个数据分片,接着目标数据管理装置可以对多个数据分片进行聚合,获得目标数据。
其中,目标数据管理装置还可以根据数据操作请求,基于区块链网络200的智能合约获取聚合策略,然后目标数据管理装置可以根据聚合策略对多个数据分片进行聚合,获得目标数据。其中,聚合策略是与分片策略相对应。以目标数据为视频示例说明,分片策略为按时长分割的策略时,聚合策略可以为按时长聚合的策略,目标数据管理装置可以基于数据分片的起始时间和结束时间,将各数据分片按照起始时间或结束时间顺序排序,然后将排序后的各数据分片进行拼接,从而实现数据分片的聚合。
需要说明的是,当目标数据的多个数据分片中每个数据分片具有多个副本时,目标数据管理装置可以从用于访问多个副本的多条路径中确定目标路径,该目标路径可以是时延最小或成本最低的路径,根据该目标路径拉取数据分片,进而实现对数据分片进行聚合。其中,目标数据管理装置在确定目标路径时,可以基于各数据管理装置挂载的存储30的剩余储量、分片类型,分片数量,带宽,历史故障次数中的至少一个计算出各路径的权重,基于该权重从多条路径中确定目标路径。
在一些可能的实现方式中,目标数据管理装置还可以获取多个数据分片的本地哈希值和链上哈希值。其中,本地哈希值通过哈希算法得到,具体地,目标数据管理装置可以在获取到数据分片后,利用哈希算法对数据分片的内容进行哈希运算,从而得到本地哈希值。链上哈希值为存储在区块链网络200中的哈希值,具体地,目标数据管理装置触发对区块链网络200的读操作,以读取区块链网络200中存储的目标数据的多个数据分片的哈希值。然后目标数据管理装置可以比较上述本地哈希值和链上哈希值,当目标数据管理装置确定本地哈希值与链上哈希值匹配,例如是本地哈希值与链上哈希值一致,则启动对多个数据分片的聚合,获得聚合数据。
进一步地,目标数据管理装置还可以确定聚合数据的哈希值,以及从区块链网络200获取目标数据的哈希值。类似地,目标数据管理装置可以比较聚合数据的哈希值和链上存储的目标数据的哈希值。当聚合数据的哈希值与目标数据的哈希值匹配,目标数据管理装置确定聚合数据为目标数据。
S408:目标数据管理装置返回数据操作结果。
当数据操作请求为写请求时,数据操作结果可以是写入成功或写入失败。数据操作结果为写入成功,则目标数据管理装置可以执行其他数据操作请求。数据操作结果为写入失败,则可以指示区块链客户端重新发送写请求,以重新写入目标数据。
当数据操作请求为读请求时,数据操作结果可以是读取成功或读取失败。数据操作结果为读取成功时,数据操作结果中还可以包括目标数据管理装置读取到的目标数据。数据操作结果为读取失败时,目标数据管理装置可以指示区块链客户端重新发送读请求,以重新读取目标数据。
需要说明的是,上述S408为本申请实施例的可选步骤,例如数据操作请求为写请求时, 目标数据管理装置也可以不返回数据操作结果。
S410:目标数据管理装置从区块链网络200获取该目标数据管理装置挂载的存储30中数据分片的第一元信息。
具体地,目标数据管理装置可以周期性地扫描区块链网络200,具体是周期性地扫描该目标数据管理装置对应的区块链节点20,从而获取该目标数据管理装置挂载的存储30中数据分片的第一元信息。其中,第一元信息是指链上存储的元信息,第一元信息可以包括链上存储的数据分片的名称、大小、哈希值中的一种或多种。
其中,目标数据管理装置扫描区块链网络200的周期可以根据经验值设置,例如该周期可以设置为5分钟(minute,min)。
S412:目标数据管理装置从该目标数据管理装置挂载的存储30中获取数据分片的第二元信息。
目标数据管理装置可以周期性扫描该目标数据管理装置挂载的存储30(也可以称作本地存储),从而获取该目标数据管理装置挂载的存储30中数据分片的第二元信息。其中,第二元信息是指链下存储的元信息,第二元信息可以包括链下存储的数据分片的名称、大小、哈希值中的一种或多种。
S414:目标数据管理装置确定第一元信息和第二元信息是否匹配。若否,则执行S416。
具体地,目标数据管理装置可以比较第一元信息和第二元信息,从而确定第一元信息和第二元信息是否匹配。当第一元信息和第二元信息不匹配,则表征发生故障,例如是本地存储的存储介质丢失或数据分片被删除、篡改,目标数据管理装置可以执行S416。
S416:目标数据管理装置确定发生故障,存储故障信息至区块链网络。
故障信息可以包括故障节点的节点标识或者故障的数据分片的分片标识。其中,节点标识可以是节点名称、节点的IP地址中的一个或多个,数据分片的分片标识可以是分片名称。其中,当数据分片被删除或篡改时,故障信息还可以包括数据分片所归属数据的元信息。
目标数据管理装置可以基于区块链网络200的智能合约,将故障信息记录至区块链网络200的分布式账本。一方面可以为后续故障恢复奠定基础,另一方面可以实现数据操作追溯。
S418:目标数据管理装置读取故障信息。
目标数据管理装置可以周期性地读取故障信息。其中,目标数据管理装置可以访问区块链节点20,从而周期性地读取故障信息。
需要说明的是,目标数据管理装置进行扫描区块链网络200或本地存储进行故障检查的周期与读取故障信息进行故障恢复的周期可以是一致的,也可以是不同的。例如,目标数据管理装置读取故障信息进行故障恢复的周期可以大于故障检查的周期。
在一些示例中,目标数据管理装置扫描区块链网络200或本地存储进行故障检查的周期可以为5min,数据管理装置读取故障信息进行故障恢复的周期可以为5min,或者是10min。
S420:目标数据管理装置根据故障信息,从存储资源池300中其他数据管理装置挂载的存储获取数据分片的副本。
当故障信息表征目标数据管理装置挂载的存储30中存储介质丢失时,目标数据管理装 置可以从存储资源池300中其他数据管理装置挂载的存储,获取丢失的存储介质存储的所有数据分片的副本。
当故障信息表征目标数据管理装置挂载的存储30中某个数据分片被删除或篡改时,目标数据管理装置可以从存储资源池300中其他数据管理装置挂载的存储,获取被删除或篡改的数据分片的副本。
S420:目标数据管理装置将数据分片的副本进行本地存储。
数据管理装置将数据分片的副本写入该目标数据管理装置挂载的存储,从而实现数目标据分片的本地存储。需要说明的是,当故障信息表目标征数据管理装置挂载的存储30中存储介质丢失时,目标数据管理装置可以先挂载新的存储介质,然后将数据分片的副本写入该目标数据管理装置挂载的存储30。
S424:目标数据管理装置将更新后的存储地址存储至区块链网络。
在进行故障恢复的过程中,数据分片的存储地址会相应更新。目标数据管理装置可以将更新后的存储地址存储至区块链网络200,例如是通过区块链网络200的智能合约将更新后的存储地址存储至区块链网络200的分布式账本。
需要说明的是,上述S410至S416为故障检查的一种具体实现方式,S418至S424为故障恢复的一种实现方式,执行本申请实施例的数据处理方法也可以不执行上述S410至S416或S418至S424。
基于上述内容描述,本申请实施例提供了一种数据处理方法。在该方法中,区块链网络200中的区块链节点20支持挂载不同类型的外部存储,从而提供基于区块链网络200存储富媒体数据、建模数据等大规模数据的存储能力。区块链客户端提供调用接口,支持用户通过该调用接口上传或者下载富媒体数据等大规模数据。由于上传或下载数据过程中,均需分布式数据管理系统100的目标数据管理装置参与,并对上传或下载过程中的相关信息上链记录。即使数据被篡改或删除,也可以基于链上存储的相关信息及时恢复数据,提升数据的安全性、可用性、可访问性与操作可追溯性。而且,该方法中各区块链节点20挂载的存储30可以采用分布式存储,为区块链网络200这种去中心化系统提供一套适配的分布式存储资源池完成去中心化的需求。
图4对本申请实施例的数据处理方法进行介绍,接下来将结合附图对数据上传、数据下载、故障检查、故障恢复的流程进行详细介绍。
首先,参见图5所示的数据上传的流程示意图,具体包括如下步骤:
步骤1:区块链客户端接收写请求,向目标数据管理装置发送该写请求。
写请求用于写入目标数据,目标数据可以是富媒体数据或大数据等大规模数据。其中,写请求具体用于将目标数据写入区块链网络中各区块链节点挂载的存储形成的存储资源池,也即将目标数据上传至区块链网络的存储资源池,因此,该写请求也称作数据上传请求。
需要说明的是,写请求中可以包括上述目标数据。例如,目标数据可以封装在写请求的负载中,通过写请求传输至目标数据管理装置,以便目标数据管理装置上传目标数据至区块链网络的存储资源池。
步骤2:分布式数据管理系统中的目标数据管理装置根据写请求,校验目标数据及签 名。验证通过,执行步骤3;验证不通过,执行步骤8。
具体地,考虑到目标数据在传输过程中可能被篡改,目标数据管理装置可以对目标数据进行完整性校验。例如,写请求中可以携带完整性校验码,目标数据管理装置接收到写请求,可以根据写请求中的目标数据计算完整性校验码,然后比较写请求中携带的完整性校验码和本地计算的完整性校验码,当校验码一致,则表征完整性校验通过,否则表征完整性校验不通过,目标数据可能被篡改。
此外,目标数据在传输过程中,还可以存在中间人攻击(Man-in-the-middle attack,MITM),基于此,写请求中还可以包括用户的签名,相应地,目标数据管理装置还可以校验签名。
需要说明的是,上述步骤2为本申请实施例的可选步骤,执行本申请实施例的方法也可以不执行上述步骤2。例如,整个系统部署在可信环境中时,也可以不执行上述步骤2。
步骤3:目标数据管理装置从区块链网络获取分片策略,对目标数据进行切分获得多个数据分片。
步骤4:目标数据管理装置从区块链网络获取分配策略,根据分配策略确定多个数据分片中每个数据分片的存储地址。
步骤5:目标数据管理装置根据每个数据分片的存储,将数据分片写入存储资源池。
步骤6:目标数据管理装置确定各数据分片的哈希值、目标数据的哈希值。
步骤3至步骤6的具体实现可以参见图4所示实施例相关内容描述,在此不再赘述。
步骤7:目标数据管理装置根据数据分片的哈希值、目标数据的哈希值、数据分片的存储地址生成交易,将该交易上链。
目标数据管理装置可以根据数据分片的哈希值、目标数据的哈希值、数据分片的存储地址生成交易区块,各区块链节点针对该交易区块达成共识,可以将该交易区块添加至分布式账本,从而实现交易上链。需要说明的是,目标数据管理装置还可以将数据属性也加入交易区块,从而实现将数据属性也进行上链存储。
步骤8:目标数据管理装置返回验证失败通知。
验证失败通知用于指示验证失败,区块链客户端可以重新发送数据上传请求,以重新上传目标数据。
然后,参见图6所示的数据下载的流程示意图,具体包括如下步骤:
步骤1:区块链客户端接收读请求,向目标数据管理装置发送该读请求。
读请求用于读取目标数据,目标数据可以是富媒体数据或大数据等大规模数据。其中,读请求具体用于从区块链网络中各区块链节点挂载的存储形成的存储资源池中读取目标数据,也即从区块链网络200的存储资源池下载目标数据,因此,该读请求也称作数据下载请求。
步骤2:分布式数据管理系统中的目标数据管理装置根据读请求,校验签名并从区块链网络获取访问权限,对访问权限进行校验。验证通过,执行步骤3;验证不通过,执行步骤7。
其中,目标数据管理装置进行签名校验的具体实现可以参见图5所示实施例相关内容 描述,在此不再赘述。区块链网络的分布式账本中可以存储不同数据的访问权限,目标数据管理装置可以获取读请求所读取的目标数据的访问权限,确定当前用户是否具有该访问权限。若具有,则访问权限验证通过,若不具有,则访问权限验证不通过。
当签名验证通过,且访问权限验证通过,则可以执行步骤3,当签名验证不通过或访问权限验证不通过,则执行步骤:7。
步骤3:目标数据管理装置从区块链网络获取数据分片的存储地址。
步骤4:目标数据管理装置从数据分片的各个副本的存储地址确定目标路径,基于该目标路径存储资源池中获取数据分片,然后获取各个数据分片的链上哈希值和本地哈希值,基于链上哈希值和本地哈希值进行校验,保障数据分片的准确性。
步骤5:目标数据管理装置聚合数据分片,然后将聚合数据的哈希值与链上存储的目标数据的哈希值进行比较。当聚合数据的哈希值与链上存储的目标数据的哈希值一致,则执行步骤6。
步骤6:目标数据管理装置返回目标数据以及对应的交易数据。
步骤7:目标数据管理装置返回验证失败通知。
验证失败通知用于指示验证失败,区块链客户端可以重新发送数据下载请求,以重新下载目标数据。
在该方法中,区块链客户端可使用数据获取接口(也称作数据下载接口),通过哈希值、数据属性查询交易数据。当发起数据查询后,分布式数据管理系统的目标数据管理装置基于区块链系统校验签名,并从链上获取分片地址,拉取数据分片并对数据分片进行聚合,将聚合数据的哈希值与链上存储的目标数据的哈希值进行对比验证,验证通过则返回目标数据至区块链客户端,验证不通过则读取其他数据分片进行聚合,继续进行验证。
接着,参见图7所示的故障检查的流程示意图,具体包括如下步骤:
步骤1:目标数据管理装置周期性地访问区块链节点,获取本地存储中数据分片的元信息。
目标数据管理装置从区块链节点获取的元信息也称作第一元信息。其中,第一元信息包括数据分片的名称、大小或哈希值中的一种或多种。
步骤2:目标数据管理装置周期性地访问本地存储,获取本地存储中数据分片的元信息。
目标数据管理装置从本地存获取的元信息也称作第二元信息。其中,第二元信息包括数据分片的名称、大小或哈希值中的一种或多种。
步骤3:目标数据管理装置比较从区块链节点获取的元信息和从本地存储获取的元信息。当元信息一致,执行步骤4,当元信息不一致,执行步骤5。
步骤4:目标数据管理装置确定存储正常,记录事件日志。
步骤5:目标数据管理装置确定存储故障,向区块链网络的区块链节点写入故障信息。
分布式数据管理系统的目标数据管理装置可以通过定时检测本地挂载存储介质保证存储可靠性。具体地,目标数据管理装置可以查询本地存储的数据分片的哈希值或数据属性,并从链上获取其哈希值、数据属性进行对比。如果本地存储介质丢失或者存储的某分片数 据被删除或者篡改,目标数据管理装置可以确定一个新的存储地址来存储此数据分片,并将故障信息和新的存储地址上链通知其他目标数据管理装置。如果,哈希值、数据属性一致,则记录本地日志事件并等待下次轮询时间检查存储中的数据分片。
然后,参见图8所示的故障恢复的流程示意图,具体包括如下步骤:
步骤1:目标数据管理装置周期性地访问区块链节点,获取故障信息。
步骤2:目标数据管理装置基于故障信息确定是否涉及该目标数据管理装置的本地存储。若是,则执行步骤3,若否,则返回步骤1等待下次轮询。
故障信息中包括推荐的存储地址。该存储地址归属于该目标数据管理装置所挂载的存储(本地存储),则表明故障信息涉及该目标数据管理装置的本地存储。
步骤3:目标数据管理装置基于分配策略确定存储地址。若存储地址与故障信息中推荐的存储地址一致,则执行步骤4,若存储地址与故障信息中推荐的存储地址不一致,则返回步骤1等待下次轮询。
步骤4:目标数据管理装置访问存储资源池拉取数据分片的副本,并校验哈希值,当校验通过,根据推荐的存储地址将该数据分片的副本写入本地存储。
步骤5:目标数据管理装置更新存储位置至区块链节点。
在该方法中,分布式数据管理系统可以基于区块链网络传递故障信息,通过多备份机制,定时修复数据分片,达成高可用分布式存储能力,通过定时轮询链上故障信息,并检查是否推荐使用本地存储,如果推荐使用本地存储,则计算权重验证存储位置,如果存储地址一致,就去有备份的存储节点拉取数据分片,验证哈希值、数据属性后,存入本地存储介质,然后消除故障信息,将更新的存储地址上链。
基于本申请实施例提供的数据处理方法,本申请实施例还提供了一种如前述的分布式数据管理系统100。下面结合附图对分布式数据管理系统100进行介绍。
参见图9所示的分布式数据管理系统100的结构示意图,分布式数据管理系统100包括多个数据管理装置10。多个数据管理装置10中的第一数据管理装置对应区块链网络的第一区块链节点,多个数据管理装置10中的第二数据管理装置对应区块链网络的第二区块链节点。第一数据管理装置挂载的存储和第二数据管理装置挂载的存储,用于形成区块链网络的存储资源池。
多个数据管理装置10中的目标数据管理装置,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
目标数据管理装置,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
在一些可能的实现方式中,数据操作请求为写请求,目标数据管理装置具体用于:
根据数据操作请求,基于区块链网络的智能合约获取分配策略;
根据分配策略,从存储资源池为目标数据的多个数据分片分配存储资源,获得多个数据分片的存储地址;
根据至少一个数据分片的存储地址,将多个数据分片写入存储资源池,并将多个数据分片的存储地址存储至区块链网络的分布式账本。
在一些可能的实现方式中,目标数据管理装置还用于:
根据数据操作请求,基于区块链网络的智能合约获取分片策略;
根据分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
目标数据管理装置具体用于:
根据分片算法、分片数量,对目标数据进行分片,获得目标数据的多个数据分片;
根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入存储资源池,并将每个数据分片的各个副本的存储地址存储至区块链网络的分布式账本。
在一些可能的实现方式中,每个数据分片包括多个副本;
目标数据管理装置具体用于:
将每个数据分片的多个副本写入存储资源池的不同类型存储介质。
在一些可能的实现方式中,目标数据管理装置还用于:
确定目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个;
将目标数据的哈希值、多个数据分片中每个数据分片的哈希值、目标数据的数据属性中的至少一个存储至区块链网络的分布式账本。
在一些可能的实现方式中,数据操作请求为读请求,目标数据管理装置具体用于:
根据读请求,从区块链网络的分布式账本获取目标数据的多个数据分片的存储地址;
目标数据管理装置具体用于:
根据多个数据分片的存储地址,从存储资源池获取多个数据分片;
对多个数据分片进行聚合,获得目标数据。
在一些可能的实现方式中,目标数据管理装置还用于:
根据数据操作请求,基于区块链网络的智能合约获取聚合策略;
目标数据管理装置具体用于:
根据聚合策略对多个数据分片进行聚合,获得目标数据。
在一些可能的实现方式中,目标数据管理装置具体用于:
获取多个数据分片的本地哈希值和链上哈希值,本地哈希值通过哈希算法得到,链上哈希值为存储在区块链网络中的哈希值;
确定本地哈希值与链上哈希值匹配,启动对多个数据分片的聚合,获得聚合数据;
确定聚合数据的哈希值,以及从区块链网络获取目标数据的哈希值,当聚合数据的哈希值与目标数据的哈希值匹配,确定聚合数据为目标数据。
在一些可能的实现方式中,目标数据管理装置还用于:
从目标数据管理装置对应的区块链节点获取目标数据管理装置挂载的存储中数据分片的第一元信息,以及从目标数据管理装置挂载的存储中获取数据分片的第二元信息;
当第一信息与第二元信息不匹配,确定发生故障,存储故障信息至区块链网络的分布式账本。
在一些可能的实现方式中,目标数据管理装置还用于:
从区块链网络读取故障信息;
当故障信息表征目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,从其他数据管理装置挂载的存储中获取数据分片,并进行本地存储;
将更新后的存储地址存储至区块链网络的分布式账本。
其中,目标数据管理装置可以为多个数据管理装置10中的任意一个装置,例如可以是上述第一数据管理装置,或者是第二数据管理装置。下面对数据管理装置的结构进行介绍。如图9所示,数据管理装置10包括:
通信模块102,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
管理模块104,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
需要说明的是,管理模块104用于实现图1或图2、图3所示的存储分配策略功能,基于该分配策略,可以确定数据分片的存储地址,基于该存储地址在存储资源池对目标数据进行IO。
上述通信模块102、管理模块104可以通过硬件模块实现或通过软件模块实现。
当通过软件实现时,通信模块102、管理模块104可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块。
当通过硬件实现时,通信模块102可以通过网络接口卡、收发器一类的收发模块实现。管理模块104可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
在一些可能的实现方式中,数据操作请求为写请求,管理模块104具体用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略;
根据所述分配策略,从所述存储资源池为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址;
根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池,并将所述多个数据分片的存储地址存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述管理模块104还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取分片策略;
根据所述分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
所述管理模块具体用于:
根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片;
根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
在该方法中,管理模块104还用于实现图1或图2、图3所示的存储分片策略功能,基于该分片策略,可以对目标数据进行分片,基于目标数据的各个数据分片的存储地址在存储资源池对目标数据进行IO。
在一些可能的实现方式中,每个数据分片包括多个副本;
所述管理模块104具体用于:
将每个数据分片的多个副本写入所述存储资源池的不同类型存储介质。
在一些可能的实现方式中,所述管理模块104还用于:
确定所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个;
将所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个存储至所述区块链网络的分布式账本。
在一些可能的实现方式中,所述数据操作请求为读请求,所述管理模块104具体用于:
根据所述读请求,从所述区块链网络的分布式账本获取所述目标数据的多个数据分片的存储地址;
根据所述多个数据分片的存储地址,从所述存储资源池获取所述多个数据分片;
对所述多个数据分片进行聚合,获得所述目标数据。
在一些可能的实现方式中,所述管理模块104还用于:
根据所述数据操作请求,基于所述区块链网络的智能合约获取聚合策略;
所述管理模块104具体用于:
根据所述聚合策略对所述多个数据分片进行聚合,获得所述目标数据。
在该方法中,管理模块104还用于实现图1或图2、图3所示的存储聚合策略功能,基于该聚合策略,可以对目标数据的各个数据分片进行聚合,以还原目标数据,从而实现对目标数据进行IO。
在一些可能的实现方式中,所述管理模块104具体用于:
获取所述多个数据分片的本地哈希值和链上哈希值,所述本地哈希值通过哈希算法得到,所述链上哈希值为存储在区块链网络中的哈希值;
确定所述本地哈希值与所述链上哈希值匹配,启动对所述多个数据分片的聚合,获得聚合数据;
确定所述聚合数据的哈希值,以及从所述区块链网络获取所述目标数据的哈希值,当所述聚合数据的哈希值与所述目标数据的哈希值匹配,确定所述聚合数据为所述目标数据。
在该方法中,管理模块104还用于实现图1或图2、图3所示的数据计算校验功能。具体地,管理模块104可以计算本地哈希值,然后比较本地哈希值和链上哈希值,从而实现数据计算校验,由此保障目标数据IO的准确性。
在一些可能的实现方式中,所述数据管理装置10还包括:
故障检查模块106,用于从所述目标数据管理装置对应的区块链节点获取所述目标数 据管理装置挂载的存储中数据分片的第一元信息,以及从所述目标数据管理装置挂载的存储中获取所述数据分片的第二元信息;当所述第一信息与所述第二元信息不匹配,确定发生故障,存储故障信息至所述区块链网络的分布式账本。
上述故障检查模块106可以通过硬件模块实现或通过软件模块实现。
当通过软件实现时,故障检查模块106可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块。
当通过硬件实现时,故障检查模块106可以是利用专用集成电路ASIC实现、或可编程逻辑器件PLD实现的设备等。其中,上述PLD可以是复杂程序逻辑器件CPLD、现场可编程门阵列FPGA、通用阵列逻辑GAL或其任意组合实现。
在一些可能的实现方式中,所述数据管理装置10还包括:
故障恢复模块108,用于从所述区块链网络读取故障信息,当所述故障信息表征所述目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,从其他数据管理装置挂载的存储中获取所述数据分片,并进行本地存储,将更新后的存储地址存储至所述区块链网络的分布式账本。
类似地,上述故障恢复模块108可以通过硬件模块实现或通过软件模块实现。
当通过软件实现时,故障恢复模块108可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块。
当通过硬件实现时,故障恢复模块108可以是利用专用集成电路ASIC实现、或可编程逻辑器件PLD实现的设备等。其中,上述PLD可以是复杂程序逻辑器件CPLD、现场可编程门阵列FPGA、通用阵列逻辑GAL或其任意组合实现。
本申请还提供一种计算设备1000。如图10所示,计算设备1000包括:总线1002、处理器1004、存储器1006和通信接口1008。处理器1004、存储器1006和通信接口1008之间通过总线1002通信。计算设备1000可以是中心云中的计算设备,如中心服务器,或者是边缘云中的计算设备,如边缘服务器。计算设备1000也可以是轻量级设备,如智能手机、智能穿戴设备等终端设备。应理解,本申请不限定计算设备1000中的处理器、存储器的个数。
总线1002可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1002可包括在计算设备1000各个部件(例如,存储器1006、处理器1004、通信接口1008)之间传送信息的通路。
处理器1004可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器1006可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1004还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。存储器1006中存储有可执行的程序代码,处理器1004 执行该可执行的程序代码以实现前述数据处理方法。具体的,存储器1006上存有分布式数据管理系统100或数据管理装置10用于执行数据处理方法的指令。
通信接口1008使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1000与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备1000。该计算设备1000可以是服务器,例如是中心服务器、边缘服务器。在一些实施例中,计算设备1000也可以是终端设备。
如图11所示,所述计算设备集群包括至少一个计算设备1000。计算设备集群中的一个或多个计算设备1000中的存储器1006中可以存有相同的分布式数据管理系统100用于执行数据处理方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1000也可以用于执行分布式数据管理系统100用于执行数据处理方法的部分指令。换言之,一个或多个计算设备1000的组合可以共同执行分布式数据管理系统100用于执行数据处理方法的指令。
需要说明的是,计算设备集群中的不同的计算设备1000中的存储器1006可以存储不同的指令,用于执行分布式数据管理系统100的部分功能。
图12示出了一种可能的实现方式。如图12所示,两个计算设备1000A和1000B通过通信接口1008实现连接。计算设备1000A中的存储器上存有用于执行通信模块102、管理模块104的功能的指令。计算设备1000B中的存储器上存有用于故障检测模块106和故障恢复模块108的功能的指令。换言之,计算设备1000A和1000B的存储器1006共同存储了分布式数据管理系统100用于执行数据处理方法的指令。
图12所示的计算设备集群之间的连接方式可以是考虑到本申请提供的数据处理方法在进行故障检查时需要扫描区块链网络中区块链节点维护的分布式账本,在故障恢复时,需要读取区块链节点中存储的故障信息。因此,考虑将通信模块102、管理模块104实现的功能交由计算设备1000A执行,故障检查模块106和故障恢复模块108实现的功能由计算设备1000B执行。
应理解,图12中示出的计算设备1000A的功能也可以由多个计算设备1000完成。同样,计算设备1000B的功能也可以由多个计算设备1000完成。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图13示出了一种可能的实现方式。如图13所示,两个计算设备1000C和1000D之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1000C中的存储器1006中存有执行通信模块102、管理模块104的功能的指令。同时,计算设备1000D中的存储器1006中存有执行故障检查模块106和故障恢复模块108的功能的指令。
图13所示的计算设备集群之间的连接方式可以是考虑到本申请提供的数据处理方法需要扫描区块链网络中区块链节点维护的分布式账本,或者读取区块链节点中存储的故障信息,因此考虑将通信模块102、管理模块104实现的功能交由计算设备1000C执行,故障检查模块106和故障恢复模块108实现的功能由计算设备1000D执行。应理解,图13 中示出的计算设备1000C的功能也可以由多个计算设备1000完成。同样,计算设备1000D的功能也可以由多个计算设备1000完成。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于分布式数据管理系统100的数据处理方法。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备或计算设备集群上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备(计算设备或计算设备集群)上运行时,使得至少一个计算设备执行上述数据处理方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims (24)

  1. 一种数据处理方法,其特征在于,应用于分布式数据管理系统,所述分布式数据管理系统包括多个数据管理装置;所述多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,所述多个数据管理装置中的第二数据管理装置对应区块链网络的第二区块链节点;所述第一数据管理装置挂载的存储和所述第二数据管理装置挂载的存储,用于形成所述区块链网络的存储资源池;所述方法包括:
    所述多个数据管理装置中的目标数据管理装置接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
    所述目标数据管理装置根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
  2. 根据权利要求1所述的方法,其特征在于,所述数据操作请求为写请求,所述目标数据管理装置根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,包括:
    所述目标数据管理装置根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略;
    所述目标数据管理装置根据所述分配策略,从所述存储资源池为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址;
    所述目标数据管理装置根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO,包括:
    所述目标数据管理装置根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池,并将所述多个数据分片的存储地址存储至所述区块链网络的分布式账本。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    所述目标数据管理装置根据所述数据操作请求,基于所述区块链网络的智能合约获取分片策略;
    所述目标数据管理装置根据所述分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
    所述目标数据管理装置根据所述多个数据分片的存储地址对所述目标数据进行IO,包括:
    所述目标数据管理装置根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片;
    所述目标数据管理装置根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
  4. 根据权利要求3所述的方法,其特征在于,每个数据分片包括多个副本;
    所述目标数据管理装置将每个数据分片的各个副本写入所述存储资源池,包括:
    所述目标数据管理装置将每个数据分片的多个副本写入所述存储资源池的不同类型存 储介质。
  5. 根据权利2至4任一项所述的方法,其特征在于,所述方法还包括:
    所述目标数据管理装置确定所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个;
    所述目标数据管理装置将所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个存储至所述区块链网络的分布式账本。
  6. 根据权利要求1所述的方法,其特征在于,所述数据操作请求为读请求,所述目标数据管理装置根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,包括:
    所述目标数据管理装置根据所述读请求,从所述区块链网络的分布式账本获取所述目标数据的多个数据分片的存储地址;
    所述目标数据管理装置根据所述多个数据分片的存储地址对所述目标数据进行IO,包括:
    所述目标数据管理装置根据所述多个数据分片的存储地址,从所述存储资源池获取所述多个数据分片;
    所述目标数据管理装置对所述多个数据分片进行聚合,获得所述目标数据。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    所述目标数据管理装置根据所述数据操作请求,基于所述区块链网络的智能合约获取聚合策略;
    所述目标数据管理装置对所述多个数据分片进行聚合,获得所述目标数据,包括:
    所述目标数据管理装置根据所述聚合策略对所述多个数据分片进行聚合,获得所述目标数据。
  8. 根据权利要求6所述的方法,其特征在于,所述目标数据管理装置对所述多个数据分片进行聚合,获得所述目标数据,包括:
    所述目标数据管理装置获取所述多个数据分片的本地哈希值和链上哈希值,所述本地哈希值通过哈希算法得到,所述链上哈希值为存储在区块链网络中的哈希值;
    所述目标数据管理装置确定所述本地哈希值与所述链上哈希值匹配,启动对所述多个数据分片的聚合,获得聚合数据;
    所述目标数据管理装置确定所述聚合数据的哈希值,以及从所述区块链网络获取所述目标数据的哈希值,当所述聚合数据的哈希值与所述目标数据的哈希值匹配,确定所述聚合数据为所述目标数据。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:
    所述目标数据管理装置从所述目标数据管理装置对应的区块链节点获取所述目标数据管理装置挂载的存储中数据分片的第一元信息,以及从所述目标数据管理装置挂载的存储中获取所述数据分片的第二元信息;
    当所述第一信息与所述第二元信息不匹配,所述目标数据管理装置确定发生故障,存储故障信息至所述区块链网络的分布式账本。
  10. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:
    所述目标数据管理装置从所述区块链网络读取故障信息;
    当所述故障信息表征所述目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,所述目标数据管理装置从其他数据管理装置挂载的存储中获取所述数据分片,并进行本地存储;
    所述目标数据管理装置将更新后的存储地址存储至所述区块链网络的分布式账本。
  11. 一种分布式数据管理系统,其特征在于,所述分布式数据管理系统包括多个数据管理装置;所述多个数据管理装置中的第一数据管理装置对应区块链网络的第一区块链节点,所述多个数据管理装置中的第二数据管理装置对应区块链网络的第二区块链节点;所述第一数据管理装置挂载的存储和所述第二数据管理装置挂载的存储,用于形成所述区块链网络的存储资源池;
    所述多个数据管理装置中的目标数据管理装置,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
    所述目标数据管理装置,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
  12. 根据权利要求11所述的系统,其特征在于,所述数据操作请求为写请求,所述目标数据管理装置具体用于:
    根据所述数据操作请求,基于所述区块链网络的智能合约获取分配策略;
    根据所述分配策略,从所述存储资源池为所述目标数据的多个数据分片分配存储资源,获得所述多个数据分片的存储地址;
    根据所述至少一个数据分片的存储地址,将所述多个数据分片写入所述存储资源池,并将所述多个数据分片的存储地址存储至所述区块链网络的分布式账本。
  13. 根据权利要求12所述的系统,其特征在于,所述目标数据管理装置还用于:
    根据所述数据操作请求,基于所述区块链网络的智能合约获取分片策略;
    根据所述分片策略,获得分片算法、分片数量和每个数据分片的副本数量;
    所述目标数据管理装置具体用于:
    根据所述分片算法、分片数量,对所述目标数据进行分片,获得所述目标数据的多个数据分片;
    根据多个数据分片中每个数据分片的各个副本的存储地址,将每个数据分片的各个副本写入所述存储资源池,并将每个数据分片的各个副本的存储地址存储至所述区块链网络的分布式账本。
  14. 根据权利要求13所述的系统,其特征在于,每个数据分片包括多个副本;
    所述目标数据管理装置具体用于:
    将每个数据分片的多个副本写入所述存储资源池的不同类型存储介质。
  15. 根据权利12至14任一项所述的系统,其特征在于,所述目标数据管理装置还用于:
    确定所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个;
    将所述目标数据的哈希值、所述多个数据分片中每个数据分片的哈希值、所述目标数据的数据属性中的至少一个存储至所述区块链网络的分布式账本。
  16. 根据权利要求11所述的系统,其特征在于,所述数据操作请求为读请求,所述目标数据管理装置具体用于:
    根据所述读请求,从所述区块链网络的分布式账本获取所述目标数据的多个数据分片的存储地址;
    所述目标数据管理装置具体用于:
    根据所述多个数据分片的存储地址,从所述存储资源池获取所述多个数据分片;
    对所述多个数据分片进行聚合,获得所述目标数据。
  17. 根据权利要求16所述的系统,其特征在于,所述目标数据管理装置还用于:
    根据所述数据操作请求,基于所述区块链网络的智能合约获取聚合策略;
    所述目标数据管理装置具体用于:
    根据所述聚合策略对所述多个数据分片进行聚合,获得所述目标数据。
  18. 根据权利要求16所述的系统,其特征在于,所述目标数据管理装置具体用于:
    获取所述多个数据分片的本地哈希值和链上哈希值,所述本地哈希值通过哈希算法得到,所述链上哈希值为存储在区块链网络中的哈希值;
    确定所述本地哈希值与所述链上哈希值匹配,启动对所述多个数据分片的聚合,获得聚合数据;
    确定所述聚合数据的哈希值,以及从所述区块链网络获取所述目标数据的哈希值,当所述聚合数据的哈希值与所述目标数据的哈希值匹配,确定所述聚合数据为所述目标数据。
  19. 根据权利要求11至18任一项所述的系统,其特征在于,所述目标数据管理装置还用于:
    从所述目标数据管理装置对应的区块链节点获取所述目标数据管理装置挂载的存储中数据分片的第一元信息,以及从所述目标数据管理装置挂载的存储中获取所述数据分片的第二元信息;
    当所述第一信息与所述第二元信息不匹配,确定发生故障,存储故障信息至所述区块链网络的分布式账本。
  20. 根据权利要求11至18任一项所述的系统,其特征在于,所述目标数据管理装置还用于:
    从所述区块链网络读取故障信息;
    当所述故障信息表征所述目标数据管理装置挂载的存储中数据分片被篡改、被删除或丢失,从其他数据管理装置挂载的存储中获取所述数据分片,并进行本地存储;
    将更新后的存储地址存储至所述区块链网络的分布式账本。
  21. 一种数据管理装置,其特征在于,所述数据管理装置对应区块链网络中的区块链节点,所述数据管理装置挂载的存储和分布式数据管理系统中其他数据管理装置挂载的存储,用于形成所述区块链网络的存储资源池,所述数据管理装置包括:
    通信模块,用于接收数据操作请求,所述数据操作请求用于对目标数据进行输入输出IO操作;
    管理模块,还用于根据所述数据操作请求,从所述区块链网络获取所述目标数据的多个数据分片的存储地址,根据所述多个数据分片的存储地址在所述存储资源池对所述目标数据进行IO。
  22. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令;所述至少一个处理器执行所述计算机可读指令,以使得所述计算设备集群执行如权利要求1至10中任一项所述的方法。
  23. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至10任一项所述的方法。
  24. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至10任一项所述的方法。
PCT/CN2023/081418 2022-06-30 2023-03-14 一种数据处理方法及相关设备 WO2024001304A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210770817 2022-06-30
CN202210770817.1 2022-06-30
CN202210983123.6 2022-08-16
CN202210983123.6A CN117376364A (zh) 2022-06-30 2022-08-16 一种数据处理方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024001304A1 true WO2024001304A1 (zh) 2024-01-04

Family

ID=89382604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081418 WO2024001304A1 (zh) 2022-06-30 2023-03-14 一种数据处理方法及相关设备

Country Status (1)

Country Link
WO (1) WO2024001304A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726435A (zh) * 2024-02-18 2024-03-19 盛银消费金融有限公司 一种影像数据管理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990883A (zh) * 2019-11-22 2020-04-10 金蝶软件(中国)有限公司 数据访问方法、装置、计算机可读存储介质和计算机设备
CN111104386A (zh) * 2019-11-04 2020-05-05 北京海益同展信息科技有限公司 一种文件存储方法、终端及存储介质
CN112148797A (zh) * 2020-09-29 2020-12-29 中国银行股份有限公司 基于区块链的分布式数据存取方法、装置及存储节点
CN113094334A (zh) * 2021-03-22 2021-07-09 四块科技(天津)有限公司 基于分布式存储的数字服务方法、装置、设备及储存介质
WO2022088807A1 (zh) * 2020-10-30 2022-05-05 深圳壹账通智能科技有限公司 基于区块链的分布式文件存储方法、系统、服务器及客户端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104386A (zh) * 2019-11-04 2020-05-05 北京海益同展信息科技有限公司 一种文件存储方法、终端及存储介质
CN110990883A (zh) * 2019-11-22 2020-04-10 金蝶软件(中国)有限公司 数据访问方法、装置、计算机可读存储介质和计算机设备
CN112148797A (zh) * 2020-09-29 2020-12-29 中国银行股份有限公司 基于区块链的分布式数据存取方法、装置及存储节点
WO2022088807A1 (zh) * 2020-10-30 2022-05-05 深圳壹账通智能科技有限公司 基于区块链的分布式文件存储方法、系统、服务器及客户端
CN113094334A (zh) * 2021-03-22 2021-07-09 四块科技(天津)有限公司 基于分布式存储的数字服务方法、装置、设备及储存介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726435A (zh) * 2024-02-18 2024-03-19 盛银消费金融有限公司 一种影像数据管理方法及系统
CN117726435B (zh) * 2024-02-18 2024-04-26 盛银消费金融有限公司 一种影像数据管理方法及系统

Similar Documents

Publication Publication Date Title
US11928029B2 (en) Backup of partitioned database tables
US11657164B2 (en) Decentralized policy publish and query system for multi-cloud computing environment
US11327949B2 (en) Verification of database table partitions during backup
US11914712B1 (en) Blockchain based secure naming and update verification
US11797489B2 (en) System and method for using local storage to emulate centralized storage
US20180322017A1 (en) Restoring partitioned database tables from backup
US9971823B2 (en) Dynamic replica failure detection and healing
US10481821B2 (en) Replication protocol with consensus for a decentralized control plane in a computer system
US10922303B1 (en) Early detection of corrupt data partition exports
US11182403B2 (en) Systems and methods of launching new nodes in a blockchain network
US20070234331A1 (en) Targeted automatic patch retrieval
CN111737104B (zh) 区块链网络服务平台及其测试用例共享方法、存储介质
BR112017005646B1 (pt) Funções de partição composta
US11531712B2 (en) Unified metadata search
US9600486B2 (en) File system directory attribute correction
US11442752B2 (en) Central storage management interface supporting native user interface versions
WO2024001304A1 (zh) 一种数据处理方法及相关设备
US10725971B2 (en) Consistent hashing configurations supporting multi-site replication
CN117376364A (zh) 一种数据处理方法及相关设备
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
WO2023207529A1 (zh) 数据处理方法、装置及设备、介质、产品
US11797493B1 (en) Clustered file system for distributed data storage and access
US11836071B2 (en) Method and apparatus creating test environments for blockchain systems
WO2024230140A1 (zh) 一种数据管理方法及相关设备
CN118939472A (zh) 一种数据管理方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23829495

Country of ref document: EP

Kind code of ref document: A1