CN108399099A - File security stores and content protecting method - Google Patents
File security stores and content protecting method Download PDFInfo
- Publication number
- CN108399099A CN108399099A CN201810186569.XA CN201810186569A CN108399099A CN 108399099 A CN108399099 A CN 108399099A CN 201810186569 A CN201810186569 A CN 201810186569A CN 108399099 A CN108399099 A CN 108399099A
- Authority
- CN
- China
- Prior art keywords
- data
- block
- unit
- datanode
- division
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4812—Task transfer initiation or dispatching by interrupt, e.g. masked
- G06F9/4818—Priority circuits therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/484—Precedence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of storages of file security and content protecting method, this method to include:The data of the division unit from DataNode are received, and the data received are put into the queue named with pending Business Name;Priority based on each pending business opens pending business, and pending business is sent to the computing unit of DataNode;The result data from computing unit is received, and is transmitted to division unit;The data received are divided according to data block name, and calculation result data are transferred to corresponding cloud storage service device by the division unit.The present invention proposes a kind of storage of file security and content protecting method, realizes the efficient real-time processing of the big data set of real-time change.
Description
Technical field
The present invention relates to cloud storage, more particularly to a kind of file security storage and content protecting method.
Background technology
With the rapid development of information technology, the explosive growth of data scale is brought in mass data source, to big data into
Thus row complicated calculations have pushed and have changed to big data cloud computing system considerably beyond the processing capacity of single computer
Into.After the big data for carrying out complicated calculations will be needed to be divided into fritter in cloud computing system, divides and transfer to more DataNode parallel
Processing, and the integration of local calculation result is obtained into final result.However in the big data environment of isomery, there are real-time Transmissions
, persistently generate, unstructured data.Such as the monitoring data that sensor generates in real time, social networks generate real-time logical
Letter data.The big data that vary always in face of these, if efficient real-time processing cannot be carried out to it, by miss data
The key message carried in block.Existing cloud computing system can not integrate the data from multiple heterogeneous data sources, including numerical value
Calculating, data mining and model prediction provide being provided as a result, storage also can not be shared across different server for user's care in real time
Source.It cannot be satisfied multi-path environment and cloud computing system multinode access storage demand;Include to access conflict prevent and
The realization of resources balance.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes a kind of file security storage and content protecting
Method, including:
The data of the division unit from DataNode are received, and the data received are put into pending Business Name
In the queue of name;
Priority based on each pending business opens pending business, and pending business is sent to DataNode's
Computing unit;
The result data from computing unit is received, and is transmitted to division unit;
The data received are divided according to data block name, and calculation result data are transferred to pair by the division unit
The cloud storage service device answered;
The computing unit is used for the pending business based on unlatching, calculates input data, and export processing
Data block afterwards.
Preferably, the division unit makes data transmission and the isolation of internal logical calculated of DataNode, will input number
It is divided according to by data block name;
According to the incidence relation of data and the pending business of current DataNode, safeguard that one is in ready for all
The layering queue of the pending business of state.
Preferably, each DataNode further includes data backup unit, when center scheduling node receive computing unit processing,
When calculating the result data finished, data backup unit is sent to by corresponding channel, data backup unit presses result data
Star's result data is stored on blade disk, according to the incidence relation of result data and cloud storage service device, by result data
It is sent in the shared drive queue named with cloud storage service device, is sent by the division unit is unified.
Preferably, it into the shared of row information and is multiplexed by explicit inter-process communication methods between each unit.
The present invention compared with prior art, has the following advantages:
The present invention proposes a kind of storage of file security and content protecting method, realizes the big data set of real-time change
Efficient real-time processing.
Description of the drawings
Fig. 1 is the flow chart of file security storage and content protecting method according to the ... of the embodiment of the present invention.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention
It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right
Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of storage of file security and content protecting method.Fig. 1 is according to of the invention real
Apply file security storage and the content protecting method flow chart of example.The present invention includes more for the cloud computing system of big data processing
A DataNode.Each DataNode includes:
The data received are divided according to data block name, and are passed by channel corresponding with data block name by division unit
Give relay unit;It is additionally operable to the result data from relay unit being transferred to corresponding cloud storage service device;
Relay unit receives and comes from the ready-portioned data of division unit, and the data received are put into pending industry
It is engaged in the queue of name nominating, the priority based on each pending business opens pending business, and pending business is sent
To computing unit;The result data from computing unit is received, and is transmitted to division unit;
Computing unit is used for the pending business based on unlatching, calculates the data from relay unit, and in
Data block after unit output processing.
Wherein, division unit realizes the data forwarding between DataNode and extraneous node.Division unit makes DataNode
Data transmission and the isolation of internal logical calculated.Specifically, divide input data by data block name in division unit, and
Transfer data to relay unit.Relay unit is according to the incidence relation of data and the pending business of current DataNode, dimension
One layering queue for being directed to all pending business in ready state of shield.Relay unit is true according to the load of DataNode
Surely start how many a business, and selection respective numbers, the highest priority pending initiation of services from layering queue.This
Outside, relay unit also transfers data to the computing unit for executing pending business, and receives that treated by computing unit
Result data.
In the cloud computing system including above-mentioned DataNode, all to the division of input or result data, fusion and processing
It completes in memory, to ensure the accuracy of system-computed result, it is preferable that each DataNode further includes data backup list
Member.When scheduling node receives computing unit processing, calculates the result data finished when center, number is sent to by corresponding channel
According to backup units, data backup unit is stored in by result data star's result data on blade disk, according to result data and
Result data is sent in the shared drive queue named with cloud storage service device by the incidence relation of cloud storage service device, by
Division unit is unified to be sent.
By explicit inter-process communication methods into the shared and multiplexing of row information between aforementioned four unit, by mutual
Cooperation together constitutes the node of cloud computing system.
In addition, relay unit also further monitors the request of client on port, connection is established, and connection is distributed
It is executed to suitable computing unit.Multiple connection requests of each DataNode management clients, utilize I/O multiplex interfaces.It draws
Subdivision, relay unit and data backup unit are all to manage multiple event sources by I/O multiplex interfaces, and pass through channel mode
Coupling.
The channel of relay unit and data backup unit by I/O multiplex interface management for data transmission between each unit
Port.All unit parallel processings and asynchronous execution logic.Relay unit is also initialized and is supervised when DataNode starts
Listen work.Relay unit monitors designated port and receives the connection request from external node and initialize the line of computing unit
Journey.Relay unit determines the data connection for the business that is packaged into which being distributed to according to the loading condition of the thread of each computing unit
One thread executes.
Cloud computing system uses the strategy of adaptive load balancing, determines that relay unit starts how many a threads, Yi Jijie
Which thread is the new business received, which be put into, executes.Specifically, the load of relay unit real time monitoring DataNode, when CPU is accounted for
When being higher than threshold value with rate, thread is randomly choosed, thread is closed after its business processing terminates, reduces the concurrent of DataNode
Amount;Connection is assigned to the thread at least connected up, the mode of distribution is will to connect the industry that the business that is packaged into is sent to
It is engaged in queue.
Division unit is used for the data transmission of peer node, including receives data from client and to cloud storage service
The result data of device push, division unit make the data transmission of DataNode and upper layer application logic be kept completely separate.In order to manage
Multiple I/O data sources, division unit use I/O multiplex interface models.The both sides of data transmission are participated in formally transmission data block
Agreement of preceding progress, cloud storage service device notify the position that the previous secondary data block transmission of client terminates.Division unit according to
The characteristic of I/O multiplex interface asynchronous read and writes realizes a data transmission state machine, support is provided for the breakpoint transmission of data.
Per thread starts division unit in initialization, when the business for having relay unit to send in the service queue of thread
When, the connectivity port in division unit taking-up business is added in the I/O multiplex interface event loops of oneself.Division unit from
In connection read data and by data title divide, surely belong to Mr. Yu's data data block for the first time be divided unit receive when,
Division unit establishes the corresponding channel of data block name, to write mark opening channel and transmit data;Data block name is led to simultaneously
It crosses socket and is sent to relay unit, after relay unit receives data block name, to read to indicate that the opening data block name is corresponding logical
Road receives the data that the division that division unit is sent finishes.
Relay unit loads the quantity for determining the business operator started according to DataNode, and it is excellent according to business to start order
What first grade calculated, the determination of priority includes importance, the operation conditions of DataNode of the business in entire business.Relaying
Unit obtains the incidence relation of data block and external treatment business by server in real time, and the data block received is put into business
In the corresponding queue of name.
The data backup unit is assigned to the certain mistake of every data block after result data is stored in blade disk
Time phase simultaneously periodically deletes them from blade disk.When the transmission speed of client is more than the processing speed of cloud storage service device
When spending, data packet accumulated in the buffering area of client kernel lead to not send when, data backup unit is in cloud storage service
Caching is formed in device.
Cloud computing system node realizes positioning in division unit and sends agreement, to the transmission of location data block last time
Position, and the data of taking-up corresponding position realize the recovery of data block from backup units.It is single with relaying similar to division unit
The mode of data transmission is completed in member cooperation, and data backup unit monitors a specified port and be added to I/O multiplexings and connects for a long time
In mouth handle, when receiving the result data name of relay unit transmission, to read to indicate opening channel and describe channel file
Symbol is added in I/O multiplex interfaces cycle.Data backup unit continues the result data that reading process finishes, deposit from channel
With in the blade disk array of result data name.Data block in blade disk array is stored in the form of key-value pair, and key assignments is several
According to the timestamp of block, when facilitating re-transmission from hard disk queue rapidly locating block.
Data backup unit will be existed by the incidence relation of result data and cloud storage service device on inquiry server
The data backed up in blade disk array are sent in the shared drive queue named with cloud storage service device, are divided to allow
Consistent behavior is presented when sending and receiving data for unit, and cloud storage service device name is packaged into business and put by data backup unit
Into service queue, the cloud storage service device name in division unit taking-up business takes out number from corresponding shared drive queue
According to, and according to the configuration transmission data of the cloud storage service device.
The cloud computing system of the present invention uses virtual address mechanism to allow each DataNode in cloud computing system to access knife
Piece disk array.Once some DataNode breaks down, blade disk array accesses gateway can be between each DataNode
It switches over, to provide the high availability of blade disk array access.Meanwhile using the virtual address balance policy based on feedback,
Virtual address is reasonably allocated to each DataNode of cloud computing system, ensures the processing capacity and Service Quality of blade disk array
Amount.
Specific blade data of magnetic disk array access method includes the following contents:
(1) the access path list for accessing blade disk array is provided, every access path includes virtual address, port and leads to
Road ID.Corresponding logic magnetic disc is obtained by every access path.Wherein, which is blade disk array in represent layer
Logical mappings.
The logical unit number information of the same disk array of all nodes of cloud computing system is completely the same;All represent layers
Access path list is completely the same;Each disk array is corresponded with unique virtual address.Specifically, logical unit number adds
It is added in the disk array of all DataNode of cloud computing system, each DataNode allows multiple disk arrays, but same magnetic
Disk array has and only there are one logical unit number.Arbitrary represent layer may have access to arbitrary disk array and logical unit number information.
Virtual address mechanism is realized by cloud computing system component manager.The component manager includes cloud platform data pipe
Manage device, local data manager and message manager.
Wherein, cloud platform data management system is for making a response to the various events of cloud computing system and decision.Wherein, institute
The event of stating includes that the establishment, deletion, link of virtual address are abnormal.Local data manager is for providing virtual address and block storage operation
Metadata.The logical relation of relevant metadata operation is stored by cloud platform data with virtual address and block in location resource allocation
Manager carries out decision.The local metadata of cloud platform data management system configuration.
Message manager is based on the message transmission and cloud between cloud platform data management system and local data manager
Member relation management in calculation system.
Specifically, in cloud storage system, virtual address is managed in the following ways:
Step S11, data block vector is created using local metadata:<Virtual address, blade disk array, logical unit number
>。
Step S12, the attribute of data block vector is set.Specifically, data block vector attribute includes the boot sequence of resource
Deng.
If S13, attribute setup failed, data block vector is deleted;If attribute is arranged successfully, by data block DUAL PROBLEMS OF VECTOR MAPPING
To represent layer;If data block DUAL PROBLEMS OF VECTOR MAPPING to represent layer fails, data block vector is deleted;If resource vector maps to represent layer
Success, then update resource vector database information.
Wherein, resource vector database purchase is in the data backup unit.
(2) it is based on feedback and virtual address is assigned to each DataNode of cloud computing system so that cloud computing system is each
DataNode equally loadeds;Iteration executes following operation when wherein distributing virtual address:
Step S21, setting can distribute virtual address minimal redundancy amount Mmin。
Step S22, according to formula Mi=Mn+k1*Δt*C/L-k2*Rn/ C calculates the void of each DataNode of cloud computing system
Address redundancy amount;
Wherein MnThe load redundancy of DataNode is passed to when being reached for last time timestamp by the relay unit of cloud computing system
Amount, k1* Δ t*C/L is the load that DataNode is completed in period Δ t, k2*RnIt is cloud computing system that/C, which is in period Δ t,
The new request of relay unit addition and increased load.Wherein, k1、k2To predefine coefficient;RnIt is increased in the Δ t periods
Number of requests;C is the performance of DataNode;L is the present load of DataNode;Δ t is that current time is arrived with last time timestamp
Up to when time difference.
That step S23, chooses cloud computing system meets condition Mi> MminAll DataNode, MiFor cloud computing system section
The virtual address amount of redundancy of point;If the DataNode for meeting the condition is not present in cloud computing system, virtual earth can be distributed by resetting
Location minimal redundancy amount, until choosing to the DataNode for meeting the condition.
Step S24, the DataNode of selection is added to candidate collection.
Step S25, the weights of each DataNode in candidate collection are calculated.
Specifically, the weights of each DataNode in candidate collection are calculated according to formula W=C/L.
Step S26, the DataNode of maximum weight in candidate collection is chosen.
Step S27, the load changing value of the DataNode of maximum weight in candidate collection is calculated.
Specifically ,-k1* Δ t*C/L+k/C is the load changing value of the DataNode of maximum weight in the Δ t times, and k is certainly
Defined parameters.Therefore, the present load of the DataNode of maximum weight is Li-k1* Δ t*C/L+k/C, wherein LiFor time last time
The load value of the DataNode of maximum weight is passed to when stamp reaches by the relay unit of cloud computing system.
Step S28, it according to the load changing value of the DataNode of maximum weight in candidate collection, changes and is weighed in candidate collection
It is worth the virtual address amount of redundancy of maximum DataNode.
Specifically, according to formula M=Mi+k1* Δ t*C/L-k/C changes the DataNode of maximum weight in candidate collection
Virtual address amount of redundancy;Wherein MiFor the virtual address amount of redundancy of the DataNode of maximum weight ,-k1* Δ t*C/L+k/C is the Δ t times
The virtual address of the DataNode of maximum weight is superfluous in the load changing value of the DataNode of interior maximum weight, that is, Δ t times
The changing value of surplus.Therefore, the virtual address amount of redundancy of the DataNode of maximum weight is revised as the DataNode's of maximum weight
Former virtual address amount of redundancy and in the Δ t times load changing value of the DataNode of maximum weight difference.
In the big data cloud computing system of the present invention, N number of blade disk is located at component manager side, and each blade
Disk array is divided into equal-sized disc, and the number of the disc in each blade disk sorts from low to high according to address;
Respectively there is N number of blade disk blade array ID, each disc to be identified with disc, and the disc mark is by the blade battle array
Row ID combines to obtain with the number of the disc;
The memory space of the parallel computation task of cloud computing system is made of target blade disk;The target blade disk
Anabolic process is as follows:Component manager monitors the distribution state of the disc and the temperature of N number of blade disk;Institute
Component manager is stated after receiving the request of parallel computation task creation, the storage for the parallel computation task that determination will create is empty
Between demand;Distribution state according to the disc determines the disc in unallocated state;From the disc in unallocated state
Middle to select M disc as target blade disk, the memory space of the M disc is needed more than or equal to the memory space
It asks;
The M disc is each located on different blade disks;The component manager responds the parallel computation task
Request to create builds the parallel computation task in the target blade disk;
If the parallel computation task has data storage requirement in the process of running, the target blade magnetic is obtained first
The mark of disk sends temperature inquiry request to the target blade disk, the target is carried in the temperature inquiry request
The mark of blade disk;
The parallel computation task receives the temperature that the corresponding blade disk of the M disc returns;
The parallel computation task will need the data stored to be divided into less than M target data, according to the M disc
The temperature of corresponding blade disk is respectively stored into each disk in the target blade disk from low to high, by each target data
Piece.From system level, realization process includes:
101:Component manager monitors the distribution state of disc and the temperature of N number of blade disk;The temperature is used
The data throughout or data throughout that the current either comprehensive historical data of blade disk counts account for corresponding blade disk
Data storage capacities ratio.
102:The component manager is after receiving the request of parallel computation task creation, parallel meter that determination will create
The memory space requirements of calculation task;Distribution state according to the disc determines the disc in unallocated state;From in not
Select M disc as target blade disk in the disc of distribution state, the memory space of the M disc is greater than or equal to institute
State memory space requirements;The M disc is each located on different blade disks;
Since the possibility that different parallel computation tasks is used is different, pass through the selectivity point to disc
It is balanced for the first time with that can reach.
103:The component manager responds parallel computation task creation request structure in the target blade disk
Build parallel computation task;Parallel computation task knows oneself assigned target blade disk and these target blade disks
Location.
104:If having data storage requirement during the parallel computation task run, the target blade is obtained first
The mark of disk sends temperature inquiry request to the target blade disk, the mesh is carried in the temperature inquiry request
Mark the mark of blade disk.
Which knife each disc corresponds to respectively in parallel computation task side needs to preserve the target blade disk
Piece disk;Based on this, parallel computation job enquiry temperature may not necessarily be inquired via component manager.
105:The parallel computation task receives the temperature that the corresponding blade disk of the M disc returns;
106:The parallel computation task will need the data stored to be divided into less than or equal to M/2 target data, press
According to the corresponding blade disk of the M disc temperature from low to high, each target data is respectively stored into the target blade
Each disc in disk.The mark building form for especially setting blade disk in embodiments of the present invention, facilitates subsequent blades magnetic
The lookup of disk;In addition, enabling parallel computation task to be assigned to more in the blade disk assigning process of parallel computation task
Suitable blade disk, it is possible to reduce congestion;In addition, it will require the data of storage are divided, according to the temperature of blade disk
Data distribution is carried out again, can improve the safety of data storage.
The hexadecimal values that the blade array ID is P, the disc are identified as Q hexadecimal values;Often
The memory space of a disc is R;The method further includes:
The parallel computation task determines the specified virtual earth of the accessing operation after determination needs to carry out accessing operation
Location;Each disc that the target blade disk is contained by it according to the blade array ID where each disc from low to high successively
Sequence composition, the virtual address are that starting virtual address serial number obtains with the initial address of the target blade disk;Institute
It states and is stored with address mapping table in parallel computation task, the list item of described address mapping table includes:Virtual disk number, disc mark
Know;
Virtual address described in the parallel computation task computation and the ratio rounding of the R obtain the virtual disk of the virtual address
Number, calculates the virtual address and the ratio remainder of the R obtains offset;
The parallel computation task searches described address mapping table and obtains the table that the virtual disk comprising the virtual address is numbered
, and determine that the disc mark for including in the list item is identified as target disk;
The parallel computation task intercepts preceding P of the disc mark as target blade array ID, to the target
The corresponding blade disks of blade array ID send read request, and disc mark and the offset are included in the read request
Amount makes the disc identify corresponding disc and returns to the offset corresponding physical address described in the starting location offset of the disc
Data.
After the parallel computation task is created, the method further includes:If the parallel computation task need by
It deletes, then the distribution state for each disc for including in the target blade disk is arranged to unallocated state, does not delete described
The data that each disc for including in target blade disk has been written to.Each disc for including in the target blade disk
Distribution state is arranged to after unallocated state, and the method further includes:When creating new parallel computation task next time, institute
The disc for stating new parallel computation required by task obtains in a random basis, and two are less equal than in the disc got
Disc belongs to the disc for including in the target blade disk.
The division unit passes through one group of division vector F also according to data block size, blade number of disks and loading condition
Data block is dynamically subjected to division processing;For less than selection threshold value T small documents or system in can use blade number of disks N
When=l, carries out variable division using byte partition strategy and handle;And under the premise of available blade number of disks is 1, for super
The file of selection threshold value is crossed, then is respectively handled using reconstruct partition strategy;When data block is divided processing, will uniformly it draw
Divided data is stored into each blade disk;The memory space of each blade disk is made full use of while reducing file metadata amount.
When dividing data block using byte partition strategy, the remaining sub-block for dividing generation encrypts, is transmitted to phase after backup
In the blade disk answered, and dividing the file division information generated in storing process, key information, file storage catalogue information will
It preserves into the encrypted area of local flash memory chip.When dividing data using reconstruct partition strategy, cross-assignment function will be called
fcAnd reconstruction of function frTo dividing data cross reconstruction processing, will be passed parallel after each data block coding redundancy of reconstruct, encryption
It transports in corresponding blade disk, and divides the file division information generated in storing process, key information, file storage catalogue letter
Breath will be preserved into the encrypted area of local flash memory chip.
In byte partition strategy, when using blade data in magnetic disk block Block, client will be according to data block Block's
Size size can be divided into byte sub-blocks and remaining sub-block two parts with blade number of disks W, and wherein byte sub-blocks are by taking out
A small amount of byte in user file is taken to form, and remaining sub-block is made of remaining file data after extracting a small amount of byte.Data
After block divides, client will be transmitted to after remaining sub-block encrypted backup in the corresponding blade disk in distal end, and is deposited in division
Storing up the file control information generated in the process will together store into the encrypted area of local flash memory chip.Byte partition strategy divides
Data block Block divisions are handled by following two processes:
(1) determine that the value range of position sequence Array is 1~Size according to the size Size of data block Block, then
The default size r that position sequence Array is determined according to the size Size of data block Block, then generates within the scope of 1~Size
The random number of respective numbers finally sorts each element value of generation as each element in position sequence Array successively by size.
The default size d of position sequence Array is determined according to the size Size of data block Block, can be used in foundation system
The quantity N of blade disk generates the seed E of N number of ascending arrangement in identified value range 1-Sizei(i∈
{ 1,2,3 ..., N }), this group of seed is referred to as seed sequence S;The position-order that size is k is generated finally by cyclical function f
Arrange Array, wherein k<d.Cyclical function f=fs+fj;It is input with seed sequence S and blade number of disks N, with position sequence
Each element p in Arrayji(i, j ∈ { 1,2,3 ... N }) is output, pjiIndicate i-th of position element in jth cycle;fs=Ei
It is a constant, is the element in seed sequence S;fj=(j-1) × N indicates the number of cycle.Detailed process includes:
Step 1:The seed sequence S that will at random be generated within the scope of 1-size:{E1, E2、E3、…、ENAnd blade disk
Quantity N enters first circulation as input value, then cyclical function f (Ei, N) and=p1i;Cyclical function once will all give birth to per operation
At position number of elements be compared with d, if the two is equal, directly exit cycle, while by generated position element
Position sequence Array outputs are generated after sequence;If calculating to f (EN, N) when, the quantity for generating position element is less than d and cycle letter
Number f (EN, N)<Size then carries out second circulation;Conversely, then returned generated position element value as final result, until
The Array generations of this position sequence terminate, wherein element in each position element value i.e. seed sequence S.
Step 2:In second circulation, as i=1, cyclical function p21, work as p21> size or the position elements that oneself generates
When prime number amount is equal to d, then cycle is exited, and generates position sequence Array outputs after generated position element is sorted;Work as p21
< size and oneself generate position number of elements be less than d when, then calculate f (E2, N) and=p22。
Work as p22>When Size and generated position number of elements are less than d, then by the seed E in seed sequence S2And its
Subsequent each seed is deleted, and regenerates seed sequence S:{E1, while exiting this and being recycled into subsequent cycle;If
The position number of elements of generation then exits cycle equal to d, and generates position sequence Array after generated position element is sorted
Output.And so on, as i=N, cyclical function f (EN, N) and=p2N, in p2N>Under the premise of Size, if generated position elements
Prime number is less than d then by the seed E in seed sequence SNIt deletes, and regenerates seed sequence S:{E1、E2、E3、…、EN-1,
Simultaneously this is exited to be recycled into subsequent cycle;Cycle is exited if the position number of elements that oneself generates is equal to d, and oneself is generated
The sequence of position element after generate position sequence Array outputs.
Step 3:Since recycling second, handle each time completely the same;Cyclical function operation is primary, will just generate
Position element pjiCompared with being carried out once with data block size Size.Work as pji<When Size, then by generated position number of elements
It is compared with d, recycles and continue if this quantity is less than d;If this quantity is equal to d, cycle is exited, and oneself is generated
Position sequence Array outputs are generated after the element sequence of position.Work as pjiWhen >=Size, also by oneself generate position number of elements with
D is compared, by current seed E in seed sequence S if this quantity is less than diAnd its subsequent each seed is deleted, and is laid equal stress on
Newly-generated seed sequence S, while exiting this cycle and carrying out into subsequent cycle;Cycle is exited if this quantity is equal to d, and oneself is raw
At the sequence of position element after generate position sequence Array outputs.
(2) after position sequence Array is successfully generated, by according to the value of each position element in this position sequence Array, according to
The byte of corresponding position in secondary extraction original document, and by the byte arranged in sequence of extraction form byte sub-blocks, byte sub-blocks with
Position sequence Array occurs in pairs, and the two is stored together into local flash memory chip;Extract byte after remaining data then
Referred to as remaining sub-block, this block store in blade disk at the far end.
The thought of the reconstruction strategy is:When using multi-blade data in magnetic disk block Block, if the size of data block Block
Size is more than selection threshold value T, and data block Block is just divided into the data block of multiple same sizes and is after the completion of division processing
It unites and is about in the transmission of data blocks to multiple available blade disks of each same size;By parallel transmission to improve file access
Efficiency.
The principle of foundation is when data block Block is evenly dividing, in the premise for improving multi-blade disk parallel access efficiency
Under reduce amount of metadata to the greatest extent.When data block Block is divided storage, segment processing is carried out to it first, that is, basis can
Data block Block is divided equally with the quantity of blade disk, each section generated after segment processing referred to as sub-block, and each sub-block
Size is P, P=Size/N.Then determine that each sub-block suitably divides threshold value by dividing vector F, after dividing threshold value determination,
Client recycles the division threshold value to carry out division processing to each sub-block, after division processing each sub-block all will include one or
The more than one memory block of person, and the size of each memory block is Lj;Data block Block divides after treatment, passes through friendship
Pitch partition function fcAnd reconstruction of function frBy each memory block combined crosswise of generation at disk block, protected between each disk block and blade disk
Hold corresponding mapping relations;The size B of disk blockiIt is random length using memory block as base unit and its length, is defaulted as N number of
When memory block, such as insufficient N number of memory block, then it is combined as unit of the quantity of actual storage block;It will be each finally by network
In disk block transmitted in parallel to corresponding blade disk.
Data block Block divisions processing after the completion of, client by after each disk block coding redundancy of generation and encryption simultaneously
Row is transmitted in each blade disk of distal end, is at the same time stored file control information together to the encryption of local flash memory chip
Qu Zhong.When user obtains required data block Block, client will read the file control information in flash chip and each blade magnetic
Disk establishes communication connection, each disk block needed for Parallel download;At the same time each disk block decryption of acquisition is assembled into use by client
Data block Block needed for family.
It reconstructs partition strategy and storage is divided to data block Block by following two stages:
(1) it after data block Block segment processings, is divided as defined in vector F most if each sub-block size P generated is less than
Small division threshold value, then client carry out division processing to each section using byte partition strategy;If each sub-block size P generated is more than
Minimum division threshold value as defined in vector F is divided, then client is by dividing vector F to determine optimum division threshold value Zl, then again
Division processing is carried out to each sub-block using this threshold value, the detailed process of each partition processing is:
1. defining one group of division vector F={ Z0, Z1, Z2, Z3..., Zt..., Zs), wherein Z0<Z1<Z2<Z3..., Zt,<Zs
And each ZtIt is positive integer;The division threshold value Z in vector F is divided by this groupt, each sub-block can flexibly be divided.
2. after data block Block segment processings, client will be by dividing vector F with the most suitable division threshold of determination
Value Zt.First, client calculates each sub-block of generation with each of division vector F division threshold value successively, different
It divides threshold value and different division numbers S can be obtained;When the size P of each sub-block can be by ZtWhen dividing exactly, then division numbers S=P/Zt,
When each sub-block can not be by ZtWhen dividing exactly, then division numbers For downward floor operation;Then, client will
Each division numbers S for calculating gained is compared with available blade number of disks N successively;If wherein there is division numbers S≤N
When, then it is optimum division threshold value Z to take division threshold values of the S closest to blade number of disks N whent;If gained division numbers S is big
When blade number of disks N, then it is optimum division threshold value Z to take division threshold values of the SmodN closest to N whent;Finally, client
Utilize optimum division threshold value ZtDivision processing is carried out to each sub-block.
3. when the size P of each sub-block is no more than the minimum division threshold value Z divided in vector F0When, call byte partition strategy
When carrying out dividing processing, the default size r of each position sequence is equal;As each partition after treatment, visitor
Family end will be transmitted in corresponding blade disk after each remaining subblock coding redundancy processing and store.
When the size P of each sub-block is more than the minimum division threshold value Z divided in vector F0When, it is 2. determined using step best
Divide threshold value ZtDivision processing is carried out to each sub-block.After the processing of each partition, each sub-block will be divided into one with
On memory block;If the quantity that each sub-block includes memory block is n, data block Block is also just divided into n × N number of memory block,
Each memory block generated uses chunk respectively1, chunk2..., chunkn×NIt indicates, and the intersection of any two memory block is sky;
Therefore, the union of all memory blocks is data block Block, i.e. chunk1∪chunk2∪…∪chunkn×N=Block.It is drawing
Divide in processing procedure, if each sub-block can be by optimum division threshold value ZtDivide exactly, then the size L of each memory block generatedjExactly draw
Divide threshold value ZtSize;If each sub-block can not be by optimum division threshold value ZtDivide exactly, then except last in the memory block that each sub-block generates
Other than one piece, the size L of remaining each memory blockjNamely divide threshold value ZtValue, and in each sub-block the last one memory block it is big
Small is P- (n-1) × Zt;Effect as dividing vector F may make that memory block is evenly obtained by division.
(3) data block Block is divided after treatment by client, calls cross-assignment function fcAnd reconstruction of function frIt will
The n that file Block is included × N number of memory block combined crosswise is at disk block.Detailed process is:To being stored included in each sub-block
Block uniformly carries out serializing processing, if the last one memory block and remaining memory block differ in size in each sub-block, successively will
The sequence number of the last one memory block is set to n × N- (N-i) in each sub-block, and wherein i is the ID of each sub-block;It is deposited after serializing processing
Each memory block will possess unique sequence number A, A ∈ { l, 2,3 ..., n × N } in storage data block Block;To then own
Memory block pass through cross-assignment function fcCarry out out of order processing, wherein fc={ A } mod N, { A } is by data block Block packets
The sequence number sets of each memory block contained, N are the quantity that storage can use blade disk;The sequence number sets { A } of memory block pass through
Function fcN groups storage set of blocks will be obtained after calculating.
After the out of order processing of all memory blocks, reconstruction of function f is recycledrEach group of storage set of blocks is reconstructed respectively
Processing, passes through reconstruction of function frAfter processing, every group of storage set of blocks is all by the disk block comprising identical quantity;Wherein reconstruction of function fr
=Ti/ N, TiRepresent the quantity of memory block in i-th group of storage set of blocks;Finally, each group is stored into each disk included in set of blocks
Block is transmitted in respective blade disk, and the group number for storing set of blocks is corresponding with available blade number of disks;Between each group
It is interacted parallel with corresponding blade disk.
When receiving request relevant with cloud computing system business from the user, provide a user based on http protocol
Api interface, user submit itself and the relevant request of cloud computing system business by calling the api interface.When user needs to adopt
When being serviced with cloud computing system, need that the algorithm of oneself is first packaged into engine mirror image of increasing income according to certain specification, and will
It is uploaded to engine mirror image warehouse of increasing income, and calls api interface later to submit its request.After user has submitted request, come
It will be received from user and the relevant request of cloud computing system business.
In DataNode determinations, when receiving above-mentioned request, according to the virtual address of the DataNode of cloud computing system
Information determines the more than one DataNode for carrying out above-mentioned cloud computing system business.Specifically, all in physical machine
The summation of the distribution virtual resource of calculating task is more than the virtual resource in physical machine restriction range, and can have physical machine limit
Determine the virtual money after the summation for the actual use virtual resource that the virtual resource in range removes all common parallel calculating tasks
The situation of distribution virtual resource of the source surplus not less than DataNode.
In the storage management to the blade disk, cloud computing system of the invention simulates multiple independent blade disks
At a logic magnetic disc, by the division of corresponding file, encryption and transmission mechanism, at the local security for realizing high in the clouds data block
Reason improves managerial ability of the user to possessed data
By flash chip load logic disk, the validated user for only possessing flash chip can load logic disk
Obtain required service.By the way that secure storage management mechanism is established in data block division and encryption.It is described by data block
Division is handled, it is ensured that any one blade disk will not all store the complete information of user file, ensure the privacy of user data
Property.File division information is preserved to user terminal, transmission of data blocks to each blade disk.
The flash chip that user is possessed is the mark of user's legal identity, and after authentication passes through, terminal device is by root
According to the volume file load logic disk specified in flash chip load logic disk;User is synchronously completed by logic magnetic disc to more
The data management of a blade disk.When user is by logic magnetic disc data block, user terminal first divides file destination, mesh
Mark file is divided into one or more memory blocks;Each memory block is then encrypted to enhance the confidentiality of storage data, finally will
Each cryptographic block is transmitted in multiple blade disks.The document control letter generated in file destination division and encryption process
Breath is preserved into the encrypted area of flash chip, by detaching the control information and date object of data itself to realize number
According to the transfer of block control.
When user reads the data of cloud computing system by logic magnetic disc, terminal reads corresponding text in flash chip first
The control information of part then downloads corresponding data block from each blade disk parallel, finally decrypts and verify the complete of each data block
Whole property;If data integrity validation success, is reconstructed into required file by each data block assembly and is presented to use with plaintext version
Family;If data integrity validation fails, redundant data block is downloaded from corresponding blade disk, restores loss or damage data.
In conclusion the present invention proposes a kind of storage of file security and content protecting method, real-time change is realized
The efficient real-time processing of big data set.
Obviously, it should be appreciated by those skilled in the art each units or each step of, the above-mentioned present invention can be with general
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed
Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored
It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's
Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing
Change example.
Claims (4)
1. a kind of file security storage and content protecting method, which is characterized in that including:
The data of the division unit from DataNode are received, and the data received are put into and are named with pending Business Name
Queue in;
Priority based on each pending business opens pending business, and pending business is sent to the calculating of DataNode
Unit;
The result data from computing unit is received, and is transmitted to division unit;
The data received are divided according to data block name, and calculation result data are transferred to corresponding by the division unit
Cloud storage service device;
The computing unit, be used for the pending business based on unlatching, input data is calculated, and export processing after
Data block.
2. according to the method described in claim 1, it is characterized in that, the division unit makes the data transmission of DataNode and interior
The logical calculated in portion is isolated, and input data is divided by data block name;
According to the incidence relation of data and the pending business of current DataNode, safeguard that one is in ready state for all
Pending business layering queue.
3. according to the method described in claim 1, it is characterized in that, further including:
Each DataNode further includes data backup unit, and when center, scheduling node receives computing unit processing, calculating finishes
When result data, data backup unit is sent to by corresponding channel, data backup unit presses result data star's number of results
According to being stored on blade disk, according to the incidence relation of result data and cloud storage service device, result data is sent to cloud
In the shared drive queue of storage server name, sent by the division unit is unified.
4. according to the method described in claim 1, it is characterized in that, passing through explicit inter-process communication methods between each unit
Into the shared and multiplexing of row information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186569.XA CN108399099A (en) | 2018-03-07 | 2018-03-07 | File security stores and content protecting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186569.XA CN108399099A (en) | 2018-03-07 | 2018-03-07 | File security stores and content protecting method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108399099A true CN108399099A (en) | 2018-08-14 |
Family
ID=63091511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810186569.XA Pending CN108399099A (en) | 2018-03-07 | 2018-03-07 | File security stores and content protecting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108399099A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862119A (en) * | 2019-03-15 | 2019-06-07 | 深圳市网心科技有限公司 | Memory capacity sharing method, device, service server, user terminal and system |
CN113918092A (en) * | 2021-09-18 | 2022-01-11 | 中国长城科技集团股份有限公司 | Method and system for allocating storage space |
CN115587393A (en) * | 2022-08-17 | 2023-01-10 | 广州红海云计算股份有限公司 | Distributed performance data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106453360A (en) * | 2016-10-26 | 2017-02-22 | 上海爱数信息技术股份有限公司 | Distributed block storage data access method and system based on iSCSI (Internet Small Computer System Interface) protocol |
CN106681834A (en) * | 2016-12-28 | 2017-05-17 | 上海优刻得信息科技有限公司 | Distributed calculating method and management device and system |
CN106970830A (en) * | 2017-03-22 | 2017-07-21 | 佛山科学技术学院 | The storage controlling method and virtual machine of a kind of distributed virtual machine |
CN107046510A (en) * | 2017-01-13 | 2017-08-15 | 广西电网有限责任公司电力科学研究院 | A kind of node and its system of composition suitable for distributed computing system |
-
2018
- 2018-03-07 CN CN201810186569.XA patent/CN108399099A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106453360A (en) * | 2016-10-26 | 2017-02-22 | 上海爱数信息技术股份有限公司 | Distributed block storage data access method and system based on iSCSI (Internet Small Computer System Interface) protocol |
CN106681834A (en) * | 2016-12-28 | 2017-05-17 | 上海优刻得信息科技有限公司 | Distributed calculating method and management device and system |
CN107046510A (en) * | 2017-01-13 | 2017-08-15 | 广西电网有限责任公司电力科学研究院 | A kind of node and its system of composition suitable for distributed computing system |
CN106970830A (en) * | 2017-03-22 | 2017-07-21 | 佛山科学技术学院 | The storage controlling method and virtual machine of a kind of distributed virtual machine |
Non-Patent Citations (1)
Title |
---|
王帅: "面向多云盘的终端透明加密存储系统研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862119A (en) * | 2019-03-15 | 2019-06-07 | 深圳市网心科技有限公司 | Memory capacity sharing method, device, service server, user terminal and system |
CN113918092A (en) * | 2021-09-18 | 2022-01-11 | 中国长城科技集团股份有限公司 | Method and system for allocating storage space |
CN115587393A (en) * | 2022-08-17 | 2023-01-10 | 广州红海云计算股份有限公司 | Distributed performance data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111868676B (en) | Servicing I/O operations in a cloud-based storage system | |
JP7304118B2 (en) | Secure, consensual endorsements for self-monitoring blockchains | |
US11916886B2 (en) | In-flight data encryption/decryption for a distributed storage platform | |
CN102971724B (en) | The method and apparatus relevant with the management based on modular virtual resource in data center environment | |
US20210263667A1 (en) | Multi-cloud orchestration as-a-service | |
CN108769146B (en) | Data transmission method and device based on block chain and block chain system | |
CN103810061B (en) | A kind of High Availabitity cloud storage method | |
US20120078948A1 (en) | Systems and methods for searching a cloud-based distributed storage resources using a set of expandable probes | |
US11113244B1 (en) | Integrated data pipeline | |
CN112835977B (en) | Database management method and system based on block chain | |
CN108399099A (en) | File security stores and content protecting method | |
CN113875206A (en) | Private virtual network replication of cloud databases | |
CN113994324B (en) | Block chain system with efficient world state data structure | |
CN107204998B (en) | Method and device for processing data | |
CN107465717B (en) | Password on-demand service method, device and equipment | |
CN109597903A (en) | Image file processing apparatus and method, document storage system and storage medium | |
KR101428649B1 (en) | Encryption system for mass private information based on map reduce and operating method for the same | |
CN108388658A (en) | Data file reliable storage method | |
CN104954452B (en) | Cipher card resource dynamic control method under a kind of virtualized environment | |
Alikhan et al. | Dingo optimization based network bandwidth selection to reduce processing time during data upload and access from cloud by user | |
CN116703601B (en) | Data processing method, device, equipment and storage medium based on block chain network | |
CN108334291A (en) | The method for establishing mobile terminal trusted context | |
CN108228099A (en) | A kind of method and device of data storage | |
Liu et al. | OverlapShard: Overlap-based sharding mechanism | |
KR102423284B1 (en) | Data distributed storage system based on Inter Planetary File System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180814 |
|
RJ01 | Rejection of invention patent application after publication |