CN117390118A

CN117390118A - Quick data retrieval method and system based on block chain

Info

Publication number: CN117390118A
Application number: CN202311386638.9A
Authority: CN
Inventors: 郑泽桂
Original assignee: Guangzhou Harsh Aviation Technology Co ltd
Current assignee: Guangzhou Harsh Aviation Technology Co ltd
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-01-12

Abstract

The invention discloses a block chain-based quick data retrieval method and a system, which relate to the technical field of block chains, wherein the system comprises a block chain structure design module, a database module, a mechanism setting module, a similarity calculation module and a matching output module; the technical key points are as follows: different information in the transaction data is respectively created into corresponding databases, so that better query capability can be provided, encrypted data management is facilitated, expansion of the corresponding databases can be performed on the basis of not affecting the performance of other databases, similarity calculation and verification processing are sequentially performed through a Pearson correlation coefficient calculation method and visual analysis, accuracy of calculation results is ensured, a similarity evaluation value Pgxs is obtained on the basis, and a matching result which can accurately know that output conditions are met can be obtained through comparison with a similarity threshold value, so that accuracy of data retrieval work is further improved.

Description

Quick data retrieval method and system based on block chain

Technical Field

The invention relates to the technical field of blockchains, in particular to a method and a system for quickly retrieving data based on a blockchain.

Background

The block chain technology is a distributed account book technology, which realizes the decentralization data storage and transaction verification by chaining transaction records in the form of blocks, wherein each block contains a plurality of transaction records and is protected and verified by a cryptography algorithm; a blockchain network is composed of a plurality of nodes, each node stores a complete blockchain copy, and the consistency of the network is ensured through a consensus algorithm.

In the chinese application of the application publication No. CN113468549a, a method for searching a blockchain-based encryption information certificate is disclosed, which comprises: encrypting the storage position of the credential information by adopting an asymmetric encryption technology; adding the encrypted storage location information to the credential information; calculating the hash value of the credential information by adopting an SHA-256 hash function; recording the credential information and the corresponding hash value into the intelligent contract on the chain; for the up-chain credit certificate, constructing a hash retrieval tree in the intelligent contract according to the hash value of the certificate information, wherein each node on the hash retrieval tree correspondingly stores one certificate information; acquiring the credential information according to the hash value, verifying the credential information, and decrypting the verified storage position information;

in the Chinese invention application with the application publication number of CN110232080A, a quick search method based on block chain is disclosed; the method comprises the following steps: the method comprises the steps that a database management model based on a block chain technology verifies the design, the database management model is applied to an electronic archive stage after an electronic file is archived and enters a database, the database management model based on the block chain technology searches the design, a primary database is divided into a plurality of secondary databases according to different fields, and target field information is the field corresponding to a target data block; searching a target database in the target secondary database by means of keywords, determining the target data block, and eliminating non-target data blocks.

In the above application, the retrieved data can be encrypted respectively, so that the data is prevented from being tampered randomly, finer management is provided for data retrieval, and the retrieval efficiency is improved by excluding a large number of non-target data blocks.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a block chain-based rapid data retrieval method and a block chain-based rapid data retrieval system, different information in transaction data is respectively created into corresponding databases, so that on one hand, better query capability can be provided, encryption type data management is convenient, on the other hand, expansion of the corresponding databases can be performed on the basis of not influencing the performance of other databases, similarity calculation and verification processing can be performed sequentially through a pearson correlation coefficient calculation method and visual analysis, the accuracy of a calculation result is ensured, a similarity evaluation value Pgxs is obtained on the basis, the accurate matching result meeting the output condition can be obtained through comparison with a similarity threshold value, and then the accuracy of data retrieval work is further improved, so that the problems in the background technology are solved.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme:

a blockchain-based fast data retrieval system, comprising:

the block chain structure design module is used for designing a block chain structure and comprises the steps of defining the attribute of a block, determining the data format stored in the block and determining a hash algorithm, wherein each block comprises transaction data and corresponding hash values in the block chain, the transaction data comprises a plurality of transactions, and specific transactions comprise: insurance contract information, claim information, and user information;

the database module is used for creating a comprehensive information database, storing transaction data in a blockchain, organizing and storing the comprehensive information database according to different attributes, wherein the comprehensive information database comprises a corresponding user database, an insurance contract database and a claim information database, and the creation processes of the databases are the same;

the mechanism setting module is used for accelerating the searching of the data by using an index structure, and the index structure maps the transaction data and the corresponding hash value to the corresponding block according to the attribute appointed by the rule engine;

the similarity calculation module comprises a data extraction unit, an analysis calculation unit and a threshold comparison unit, wherein the data extraction unit is used for extracting data attributes of the search content, the analysis calculation unit is used for generating a similarity evaluation value Pgxs, and the threshold comparison unit is used for setting a similarity threshold value compared with the similarity evaluation value Pgxs, so that a corresponding result is obtained after comparison;

and the matching output module is used for obtaining a result meeting the output condition after the similarity evaluation value Pgxs is compared with the similarity threshold value, marking the corresponding block as a matching result and outputting the matching result.

Further, the block chain structure design module specifically performs the following steps:

s101, defining basic attributes of a block: including index of block, timestamp, hash value of previous block, transaction number and random number;

s102, determining a data format stored in the block: determining a specific data format stored in each block, wherein the blocks comprise insurance contract information, claim information and user information;

s103, determining a hash algorithm: selecting SHA-256 as a hash algorithm, and performing encryption calculation on the block to generate a unique hash value;

s104, block linking: by including in each block the hash value of the previous block.

Through the steps, a specific block chain structure can be designed for supporting data storage and transaction in the insurance industry, the structure ensures the safety, the integrity and the non-tamper property of data, and meanwhile, the traceability and the sharing property are provided, and the specific design is adjusted and optimized according to actual requirements and scenes.

Further, the specific steps in creating the user database are as follows:

s201, defining user data attribute: determining specific attributes of each piece of user information data includes: name, ID card number, contact information and insurance authentication information;

s202, data encryption and privacy protection: encrypting and desensitizing user information data in insurance industry;

s203, storing and organizing user data: each user information data is used as a transaction and stored in one or more blocks, and the user information data is stored by adopting a NoSQL database;

s204, hash and link of user data: carrying out hash calculation on each piece of user information, storing a corresponding hash value in each block, and linking the hash values of the user information data in each block;

s205, data access control and rights management: and setting up a permission unit in the database module, wherein the permission unit is used for an authorized user to access and modify user data.

Through the steps, the creation of the user database combines the characteristics of the blockchain, ensures that the user data is stored in the non-tamperable blockchain, provides high data security and confidentiality, and is suitable for the requirements of the insurance industry on the confidentiality of the user information

Further, the specific steps of the indexing mechanism are as follows:

s301, setting an index structure: the index structure comprises a hash table and a B+ tree, the hash table is used as a primary index structure, the B+ tree is used as an alternative index structure of the hash table, and if the system does not respond within 0.5s, the system is switched to the B+ tree to be used as the index structure;

s302, selecting attributes: determining the attribute needing to establish the index through a rule engine;

s303, data mapping and storage: transaction data and corresponding hash values are mapped into corresponding chunks according to a set index structure and stored in a blockchain.

Through the steps set by the mechanism, the index structure and the data attribute are combined, a fast data retrieval system based on the blockchain can be realized, the retrieval efficiency and the user experience of the data are improved through the design, and the usability and the functionality of the system are enhanced.

Further, in the analysis and calculation unit, the data attribute of the search content A1 is first associated with a database,then, the similarity between the data attributes is obtained by using a Pearson correlation coefficient calculation method to form a similarity data set Xsd _i I.e. { Xsd ₁ 、Xsd ₂ 、...、Xsd _i Finally, a data analysis model is built, and the data analysis model is based on the output similarity data set Xsd _i Generating a similarity evaluation value Pgxs; the formula according to which the similarity evaluation value Pgxs is generated is as follows:

in the formula, i represents the number of data attributes in the search content A1, and K represents a constant correction coefficient.

Further, after obtaining the similarity between the data attributes by pearson correlation coefficient calculation, the similarity data set Xsd is subjected to visual analysis _i And verifying each similarity result in the database, if the trend exists in the scatter diagram, outputting the corresponding similarity result, and if the trend does not exist in the scatter diagram, carrying out similarity calculation again.

Further, comparing the similarity threshold value with a similarity evaluation value Pgxs to obtain the following result: if the similarity evaluation value Pgxs reaches or exceeds the similarity threshold, the output condition is satisfied, and if the similarity evaluation value Pgxs does not reach the similarity threshold, the output condition is not satisfied.

A fast data retrieval method based on block chain comprises the following steps:

step one, designing a block chain structure, which comprises defining attributes of blocks, determining data formats stored in the blocks and determining a hash algorithm, wherein each block comprises transaction data and corresponding hash values in the block chain, the transaction data comprises a plurality of transactions, and the specific transactions comprise: insurance contract information, claim information, and user information;

creating a comprehensive information database, storing transaction data in a blockchain, wherein the comprehensive information database can be organized and stored according to different attributes and comprises a corresponding user database, an insurance contract database and a claim information database, and the creation processes of the databases are the same;

step three, using an index structure to accelerate the searching of the data, wherein the index structure maps the transaction data and the corresponding hash value to the corresponding block according to the attribute appointed by the rule engine;

step four, extracting the data attributes of the search content, then obtaining the similarity between the data attributes by using a pearson correlation coefficient calculation method, and forming a similarity data set Xsd after verification _i Constructing a data analysis model according to the output similarity data set Xsd _i Generating a similarity evaluation value Pgxs, setting a similarity threshold value compared with the similarity evaluation value Pgxs in a threshold value comparison unit, and obtaining a corresponding result after comparison;

and fifthly, obtaining a result meeting the output condition after the similarity evaluation value Pgxs is compared with the similarity threshold value, marking the corresponding block as a matching result, and outputting the matching result.

(III) beneficial effects

The invention provides a block chain-based rapid data retrieval method and a block chain-based rapid data retrieval system, which have the following beneficial effects:

different information in transaction data is respectively created into corresponding databases, so that better query capability can be provided, encrypted data management is facilitated, expansion of the corresponding databases can be performed on the basis of not affecting the performance of other databases, similarity calculation can be performed on data attributes of search contents and data attributes in the corresponding databases when similarity calculation is performed on the data attributes, similarity calculation and verification processing are sequentially performed through a pearson correlation coefficient calculation method and visual analysis, accuracy of calculation results is ensured, similarity evaluation values Pgxs are obtained on the basis, matching results meeting output conditions can be accurately obtained through comparison with similarity thresholds, and therefore efficient and accurate data search work is achieved.

Drawings

FIG. 1 is an overall flow chart of a blockchain-based fast data retrieval method of the present invention;

FIG. 2 is a block chain architecture diagram of a block chain based fast data retrieval system according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1: referring to fig. 2, the present invention provides a fast data retrieval system based on blockchain, comprising:

the block chain structure design module designs a block chain structure, comprising defining the attribute of the block, determining the data format stored in the block and determining a hash algorithm, wherein each block comprises a certain amount of transaction data and corresponding hash values in the block chain, the transaction data comprises a plurality of transactions, and the specific transactions comprise: insurance contract information, claim information, and user information;

the block chain structure design module specifically comprises the following steps:

s101, defining basic attributes of a block: these attributes include the index of the chunk, the timestamp, the hash value of the previous chunk, the number of transactions, and the random number; these attributes are used to help determine the uniqueness and chaining of each tile;

s102, determining a data format stored in the block: determining a specific data format stored in each block, wherein the block comprises insurance contract information, claim information and user information in the insurance industry of the application according to specific application scenes and requirements, and the data can be designed according to the requirement of a data structure and stored in a data field of the block;

s103, determining a hash algorithm: the method comprises the steps of selecting a proper hash algorithm to carry out encryption calculation on a block to generate a unique hash value, wherein common hash algorithms comprise SHA-256 and SHA-3, determining the proper hash algorithm can ensure the integrity and the safety of data, and ensuring that any tampering of block data can cause the change of the hash value, and determining SHA-256 as the hash algorithm in the application;

s104, block linking: the method is realized by including the hash value of the previous block in each block, the link mechanism ensures the integrity and the non-tamper property of the blockchain, since any modification to the blocks can cause the hash values of all subsequent blocks in the whole chain to change, a concurrency control mechanism can be designed in the blockchain structure according to the need, and a distributed consistency algorithm is adopted to ensure that the operation of all nodes on the blockchain can reach the consensus, and the specific design mode and the content are not repeated herein;

The system comprises a database module, a block chain module and a data processing module, wherein the database module creates a comprehensive information database, transaction data are stored in the block chain, each piece of user information, insurance contract information and claim information can be used as a transaction and stored in one or more blocks, the comprehensive information database can be organized and stored according to different attributes in the block chain, the comprehensive information database comprises a corresponding user database, insurance contract database and claim information database, the creation process of each database is the same, and the specific steps of creating the user database are as follows:

s201, defining user data attribute: determining specific attributes of each piece of user information data includes: name, identification number, contact information, and insurance authentication information, which may be defined and designed according to specific insurance business requirements, specific attributes for insurance contract data include: the insurance amount, the insurance period, and the insurance policy number, specific attributes for claim information include: claim amount, contact information of claim processor, and claim number;

s202, data encryption and privacy protection: the user information data in the insurance industry relates to sensitive information, and in order to protect user privacy and data security, encryption and desensitization processing are needed to be carried out on the user information data, and the user information data is realized by using a cryptography algorithm and a data protection technology;

s203, storing and organizing user data: each user information data is used as a transaction and stored in one or more blocks, the user information data is selected to be stored by a NoSQL database, the NoSQL database can be MongoDB or Cassandra, the NoSQL database provides more flexible storage and inquiry modes, and the NoSQL database can be adapted according to a data model and service requirements;

s204, hash and link of user data: in order to ensure the integrity and the non-tamper property of the data, carrying out hash calculation on each piece of user information, and storing a corresponding hash value in each block, and realizing the traceability and the consistency of the data by linking the hash values of the user information data in each block;

s205, data access control and rights management: corresponding data access control and authority management measures are set according to the data privacy and compliance requirements of the insurance industry, and an authority unit is built in a database module and used for an authorized user to access and modify user data, so that the safety and confidentiality of the data are ensured.

Through the steps, the creation of the user database combines the characteristics of the blockchain, ensures that the user data is stored in the non-tamperable blockchain, provides high data security and confidentiality, and meets the requirements for user information confidentiality in the insurance industry.

The mechanism setting module is used for selecting a proper mechanism for realizing quick data retrieval; an index structure (e.g., hash table and b+ tree) is used to accelerate the lookup of data, which maps transaction data and corresponding hash values to corresponding chunks according to attributes specified in the rules engine, thereby enabling fast data retrieval.

The specific steps of setting the indexing mechanism after the user database is built are as follows:

s301, setting an index structure: according to specific requirements and system design, an index structure is selected to accelerate data retrieval, the index structure comprises a hash table and a B+ tree, the hash table is suitable for quick key value searching, the B+ tree is suitable for range searching and orderly traversing, the hash table is used as a primary index structure, the B+ tree is used as an alternative index structure of the hash table, if the system does not respond within 0.5s, the B+ tree is switched to be used as an index structure, a proper index structure is selected according to actual conditions to optimize data retrieval efficiency, and a single type index structure is not used, so that the advantage of high efficiency of the hash table index is firstly utilized, and the advantage of comprehensive index of the B+ tree is utilized;

s302, selecting attributes: the attributes that need to be indexed are determined by the rules engine. According to the data characteristics and the retrieval requirements in the insurance industry, key attributes are selected as indexes so as to quickly locate related blocks; for example, the index can be performed according to the attributes such as the customer name, insurance policy number or claim number, so as to quickly locate the relevant data;

s303, data mapping and storage: mapping the transaction data and the corresponding hash values into corresponding blocks according to the set index structure, and storing the transaction data and the corresponding hash values in a block chain;

The similarity calculation module comprises a data extraction unit, an analysis calculation unit and a threshold comparison unit;

the steps performed in the similarity calculation module are as follows:

s401, extracting data attributes of the search content A1 through a data extraction unit;

s402, determining that the data attribute of the search content A1 belongs to a specific database in the comprehensive information databases in the analysis and calculation unit firstly, and thenSimilarity between data attributes is obtained by using a pearson correlation coefficient calculation method, and a similarity data set Xsd is formed _i I.e. { Xsd ₁ 、Xsd ₂ 、...、Xsd _i And to the similarity dataset Xsd by visual analysis _i Verifying each similarity result in the database, outputting the corresponding similarity result if the trend exists in the scatter diagram, and repeating similarity calculation if the trend does not exist in the scatter diagram, constructing a data analysis model and according to the output similarity dataset Xsd _i The similarity evaluation value Pgxs is generated according to the following formula:

in the formula, i represents the number of data attributes in the search content A1, K represents a constant correction coefficient, and the specific value can be 1.357 according to actual requirements.

S403, setting a similarity threshold in the threshold comparison unit, and comparing the similarity threshold with the similarity evaluation value Pgxs, wherein if the similarity evaluation value Pgxs reaches or exceeds the similarity threshold, the output condition is met, and if the similarity evaluation value Pgxs does not reach the similarity threshold, the output condition is not met.

Specifically, different information in the transaction data is respectively created into corresponding databases, so that better query capability can be provided, encryption type data management is facilitated, expansion of the corresponding databases can be performed on the basis of not affecting the performance of other databases, similarity calculation can be performed on the data attributes of the search content and the data attributes in the corresponding databases when similarity calculation is performed on the data attributes, similarity calculation and verification processing are sequentially performed through a Pearson correlation coefficient calculation method and visual analysis, accuracy of calculation results is ensured, a similarity evaluation value Pgxs is obtained on the basis, and a matching result meeting output conditions can be obtained through comparison with a similarity threshold value, so that efficient and accurate data search work is realized.

Example 2: referring to fig. 1, a fast data retrieval method based on a blockchain includes the following steps:

step five, obtaining the result meeting the output condition after comparing the similarity evaluation value Pgxs with the similarity threshold value, marking the corresponding block as a matching result, and outputting the matching result

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims

1. A blockchain-based fast data retrieval system, characterized by: comprising the following steps:

the database module is used for creating a comprehensive information database, storing transaction data in a blockchain, organizing and storing the comprehensive information database according to different attributes, wherein the comprehensive information database comprises a corresponding user database, an insurance contract database and a claim information database, and the creation process of each database is the same;

2. A blockchain-based fast data retrieval system as in claim 1 wherein: the block chain structure design module specifically comprises the following steps:

3. A blockchain-based fast data retrieval system as in claim 1 wherein: the specific steps in creating the user database are as follows:

4. A blockchain-based fast data retrieval system as in claim 1 wherein: the specific steps of the index mechanism setting are as follows:

5. A blockchain-based fast data retrieval system as in claim 1 wherein: in the analysis and calculation unit, the data attribute of the search content A1 is first associated with a database,then, the similarity between the data attributes is obtained by using a Pearson correlation coefficient calculation method to form a similarity data set Xsd _i I.e. { Xsd ₁ 、Xsd ₂ 、...、Xsd _i Finally, a data analysis model is built, and the data analysis model is based on the output similarity data set Xsd _i A similarity evaluation value Pgxs is generated.

6. The blockchain-based fast data retrieval system of claim 5, wherein: the formula according to which the similarity evaluation value Pgxs is generated is as follows:

7. The blockchain-based fast data retrieval system of claim 6, wherein: after similarity between data attributes is obtained by pearson correlation coefficient calculation, the similarity data set Xsd is subjected to visual analysis _i And verifying each similarity result in the database, if the trend exists in the scatter diagram, outputting the corresponding similarity result, and if the trend does not exist in the scatter diagram, carrying out similarity calculation again.

8. A blockchain-based fast data retrieval system as in claim 1 wherein: comparing the similarity threshold value with a similarity evaluation value Pgxs to obtain the following result: if the similarity evaluation value Pgxs reaches or exceeds the similarity threshold, the output condition is satisfied, and if the similarity evaluation value Pgxs does not reach the similarity threshold, the output condition is not satisfied.

9. A fast data retrieval method based on a blockchain, using the system of any of claims 1 to 8, characterized in that: the method comprises the following steps:

creating a comprehensive information database, storing transaction data in a blockchain, organizing and storing the comprehensive information database according to different attributes, wherein the comprehensive information database comprises a corresponding user database, an insurance contract database and a claim information database, and the creation processes of the databases are the same;