CN106649385B - Data reordering method and device based on HBase database - Google Patents
Data reordering method and device based on HBase database Download PDFInfo
- Publication number
- CN106649385B CN106649385B CN201510733850.7A CN201510733850A CN106649385B CN 106649385 B CN106649385 B CN 106649385B CN 201510733850 A CN201510733850 A CN 201510733850A CN 106649385 B CN106649385 B CN 106649385B
- Authority
- CN
- China
- Prior art keywords
- data
- line unit
- unit value
- cutting
- cutting data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000005520 cutting process Methods 0.000 claims abstract description 289
- 238000003860 storage Methods 0.000 claims description 15
- 230000006399 behavior Effects 0.000 claims description 6
- 230000000052 comparative effect Effects 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 239000000203 mixture Substances 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000000151 deposition Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- VIKNJXKGJWUCNN-XGXHKTLJSA-N norethisterone Chemical compound O=C1CC[C@@H]2[C@H]3CC[C@](C)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=C1 VIKNJXKGJWUCNN-XGXHKTLJSA-N 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of data reordering method and device based on HBase database.Wherein, this method comprises: by multiple clustered nodes of the pending data according to cutting to HBase database, wherein each clustered node is performed both by the line unit value sortord of HBase database after obtaining cutting data;The ranking results for reading each clustered node obtain multiple ranking results, wherein after each clustered node execution line unit value sortord is ranked up cutting data, obtain a ranking results;Determine that the collection of multiple ranking results is combined into the ranking results of pending data evidence.The technical issues of present application addresses data sorting low efficiencys in the prior art.
Description
Technical field
This application involves computer field, in particular to a kind of data reordering method based on HBase database and
Device.
Background technique
The sequence of data is more applied in big data statistics at present, such as is ranked up to website visiting amount, can be with
It checks the maximum website rank order of amount of access, then carries out some decisions.Wherein, it is ranked up in the data of small data quantity
In the case of, using current many quick sort algorithms, sequence will be a very simple thing.But when data volume arrives greatly
After certain rank, simple sequence also becomes complexity originally.For example, when thering are the data of 100G to need to sort,
Data simply can not be read in memory and then carry out one-machine sequencing by system, because having 100G without any server
Memory can never be also ranked up with its full memory even if there is the server of this very large memory.
Sortord constructed by current Technical Architecture, can be by existing frame come the task of data sorting point
It goes to calculate on cloth to each node of cluster.I.e. the data cutting of 100G, it is distributed in each node of cluster, passes through frame
The United Dispatching of frame, reads data on each node, is then ranked up calculating, finally by knot sorted on each node
Fruit merges, and then whole result is output in file system.
There are two problems for above-mentioned sortord:
First problem is the merging of result: sorted ranking results needs merge on each node.Because
Data itself are unordered, therefore are also that milli is irregular, and such problems is exactly to merge between the ranking results of each node
It process or very slowly or needs to introduce new distribution mechanism, combined ranking results data opposite have
Minor sort and merging again after the distribution again of sequence, regardless of any scheme all can be relatively slower.
Second Problem is checking for result: after i.e. entire data set all sorts well, needs to save into file, and in this way
It has resulted in checking the inconvenience of ranking results, can not quickly check the ranking results for being arbitrarily designated section.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the present application provides a kind of data reordering method and device based on HBase database, at least to solve
In the prior art the technical issues of data sorting low efficiency.
According to the one aspect of the embodiment of the present application, a kind of data reordering method based on HBase database is provided, it should
Method includes: in multiple clustered nodes by pending data according to cutting to the HBase database, wherein each cluster
Node is performed both by the line unit value sortord of the HBase database after obtaining cutting data;Read each cluster section
The ranking results of point, obtain multiple ranking results, wherein each clustered node executes the line unit value sortord
After being ranked up to cutting data, the ranking results are obtained;And determine that the collection of multiple ranking results is combined into
The ranking results of the pending data evidence.
Further, after each clustered node execution line unit value sortord is ranked up cutting data,
Obtaining the ranking results includes: that clustered node Ai executes the line unit value sortord to cutting to the cluster section
The cutting data Di of point Ai is ranked up, and obtains ranking results Ri, wherein it is in the HBase database that i, which successively takes 1 to n, n,
The quantity of clustered node, clustered node A1 to clustered node An constitute multiple clustered nodes of the HBase database, cutting number
The pending data evidence is constituted according to D1 to cutting data Dn, determines that the collection of multiple ranking results is combined into the pending data evidence
Ranking results include: the clustered node Ai by the data key values of the cutting data Di to storing to the HBase data
Library obtains the ranking results of the pending data evidence, wherein the data key values of the cutting data Di are to for the cutting data
The key-value pair that the mark of Di and the total amount of data of the cutting data Di form.
Further, the clustered node Ai is by the data key values of the cutting data Di to storing to the HBase number
It include: the line unit value whether the inquiry HBase database has stored the cutting data Di according to library, wherein the cutting number
The negative for the total amount of data that line unit value according to Di is the cutting data Di;Described cut has been stored in the HBase database
In the case where the line unit value of divided data Di, the cutting data Di is stored to first object and is arranged, wherein the first object column
For any one column in the line unit value affiliated column family of the row of the cutting data Di;And in the HBase database
In the case where the line unit value of the not stored cutting data Di, institute is stored according to the stored line unit value of the HBase database
State cutting data Di key-value pair.
Further, in the HBase database in the case where line unit value of the not stored cutting data Di, according to
It includes: the successively cutting number that the stored line unit value of the HBase database, which stores the cutting data Di key-value pair,
According to the size of the line unit value and the line unit value stored into the HBase database of Di;By the line unit of the cutting data Di
Value is inserted into the target line in the HBase database, wherein goal behavior the first row key assignments next line of the row or
Second line unit value lastrow of the row, the first row key assignments and the second line unit value are to have deposited in the HBase database
The line unit value of storage, the first row key assignments are the line unit value less than the cutting data Di, and with the cutting data Di's
The smallest line unit value of line unit value difference value, the second line unit value are line unit value greater than the cutting data Di, and with it is described
The smallest line unit value of line unit value difference value of cutting data Di;The cutting data Di is stored to corresponding with the target line
Two target columns, wherein second target is classified as any one column in the affiliated column family of the target line;And it updates and has stored
To the line unit value in the HBase database.
Further, the method also includes: received by query interface in the HBase database from the user
Inquiry instruction, wherein the inquiry instruction is that inquiry has been stored into the HBase database between any two line unit value
The instruction of cutting data corresponding to line unit value;And it is shown in the HBase database in a manner of adding default mark
Cutting data corresponding to the line unit value inquired.
According to the another aspect of the embodiment of the present application, a kind of data sorting device based on HBase database is additionally provided,
The device includes: cutting unit, for by multiple clustered nodes of the pending data according to cutting to the HBase database,
In, each clustered node is performed both by the line unit value sortord of the HBase database after obtaining cutting data;It reads
Unit is taken, for reading the ranking results of each clustered node, obtains multiple ranking results, wherein is each described
After the clustered node execution line unit value sortord is ranked up cutting data, the ranking results are obtained;With
And determination unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence.
Further, the reading unit includes: sorting subunit, executes the line unit value sequence for clustered node Ai
Mode is ranked up the cutting data Di of cutting to the clustered node Ai, obtains ranking results Ri, wherein i successively take 1 to
N, n are the quantity of clustered node in the HBase database, and clustered node A1 to clustered node An constitutes the HBase data
Multiple clustered nodes in library, cutting data D1 to cutting data Dn constitute the pending data evidence, and the determination unit includes: to deposit
Subelement is stored up, for the clustered node Ai by the data key values of the cutting data Di to storing to the HBase database,
Obtain the ranking results of the pending data evidence, wherein the data key values of the cutting data Di are to for the cutting data Di
Mark and the cutting data Di total amount of data form key-value pair.
Further, the storing sub-units include: enquiry module, for inquiring whether the HBase database has been deposited
Store up the line unit value of the cutting data Di, wherein the line unit value of the cutting data Di is that the data of the cutting data Di are total
The negative of amount;First memory module, the feelings of the line unit value for having stored the cutting data Di in the HBase database
Under condition, the cutting data Di is stored to first object and is arranged, wherein the first object is classified as the row of the cutting data Di
Any one column belonging to key assignments is of the row in column family pair;And second memory module, in the HBase database not
In the case where the line unit value for storing the cutting data Di, according to the stored line unit value storage of the HBase database
Cutting data Di key-value pair.
Further, second memory module includes: Comparative sub-module, for the successively cutting data Di's
The size of line unit value and the line unit value stored into the HBase database;It is inserted into submodule, is used for the cutting data
The line unit value of Di is inserted into the target line in the HBase database, wherein the goal behavior the first row key assignments is of the row
Next line or the second line unit value lastrow of the row, the first row key assignments and the second line unit value are the HBase data
Stored line unit value in library, the first row key assignments are line unit value greater than the cutting data Di, and with the cutting
The smallest line unit value of line unit value difference value of data Di, the second line unit value are the line unit value less than the cutting data Di, and
And the smallest line unit value of line unit value difference value with the cutting data Di;Sub-module stored, for depositing the cutting data Di
Corresponding second target column of target line described in Chu Zhiyu, wherein second target is classified as in the affiliated column family of the target line
Any one column;And submodule is updated, for updating the line unit value stored into the HBase database.
Further, described device further include: receiving unit, for passing through the query interface in the HBase database
Receive inquiry instruction from the user, wherein the inquiry instruction is that inquiry has been stored into the HBase database any two
The instruction of cutting data corresponding to line unit value between a line unit value;And display unit, for add default mark
Mode shows cutting data corresponding to the line unit value inquired in the HBase database.
In the embodiment of the present application, using by pending data according to cutting to multiple clustered nodes of the HBase database
In, wherein each clustered node is performed both by the line unit value sequence side of the HBase database after obtaining cutting data
Formula;The ranking results for reading each clustered node obtain multiple ranking results, wherein each clustered node
It executes after the line unit value sortord is ranked up cutting data, obtains the ranking results;And determination is more
The collection of a ranking results is combined into the mode of the ranking results of the pending data evidence, by by pending data according to cutting extremely
In multiple clustered nodes where HBase database, since HBase database has the line unit value sequence side for capableing of auto-sequencing
Formula, thus pending data according to after cutting to multiple clustered nodes, it can be achieved that automatically sequence;Then multiple clustered nodes are read
In ranking results, the cutting data for completion of having sorted in each clustered node are subjected to entirety according to line unit value sortord again
Sequence, obtains multiple ranking results, wherein the set of multiple ranking results is the ranking results of pending data evidence, the application
The pending data needed in the prior art by each clustered node is omitted using the sortord of HBase database row key assignments
The link being ranked up according to can just treat sorting data after merging has achieved the purpose that shorten the data sorting time, thus real
Show and do not needed to merge the pending data evidence in each database the technical effect that can be achieved with data sorting, and then has solved
In the prior art the technical issues of data sorting low efficiency, the performance of data sorting is improved.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is the flow chart according to a kind of data reordering method based on HBase database of the embodiment of the present application;And
Fig. 2 is the schematic diagram according to a kind of data sorting device based on HBase database of the embodiment of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection
It encloses.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Firstly, making description below to the technical term that the present embodiment is related to:
HBase is a PostgreSQL database distributed, towards column, the distributed storage system of a structural data
System.HBase is different from general relational database, it is the database for being suitable for unstructured data storage.Another
Unlike HBase it is per-column rather than based on capable mode.
Rowkey is the data unique identification of HBase.HBase saves data and sorts according to rowkey, and HBase inquires data
It is also based on rowkey, the whole data for specifying single rowkey is perhaps directly acquired or scanning starts rowkey to end
The entire data interval of rowkey.
Family is the mark of HBase physically separate data, needs to predefine when building table, when most of
Single family is only used, i.e., inquiry is facilitated physically and without isolation for same data line.
Column is the column name of HBase, and HBase is unstructured database, that is, is not needed when creating HBase table
Pre-defined column can be used at any time and be added at any time.
Value is the data value that HBase is finally saved, and can find preservation by rowkey+family+column
Value.
According to the embodiment of the present application, a kind of embodiment of the method for data reordering method based on HBase database is provided,
It should be noted that step shown in the flowchart of the accompanying drawings can be in the department of computer science of such as a group of computer-executable instructions
It is executed in system, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein
Sequence execute shown or described step.
Fig. 1 is such as schemed according to a kind of flow chart of data reordering method based on HBase database of the embodiment of the present application
Shown in 1, this method comprises the following steps S102 to step S106:
Step S102, will be in multiple clustered nodes of the pending data according to cutting to HBase database, wherein each cluster
Node is performed both by the line unit value sortord of HBase database after obtaining cutting data.
Specifically, pending data evidence can be amount of access of certain website in a certain period, such as the visit of www.baidu.com
The amount of asking, the amount of access etc. of www.google.comde can also be volumes of searches of certain keyword in a certain period, such as " war of resistance
Review troops live streaming " in September 3 volumes of searches in 2015, " Beijing restricted driving " was in September 3 volumes of searches in 2015.It needs to illustrate
It is that pending data further includes any number for needing to sort according to the volumes of searches not just for above-mentioned website visiting amount and keyword
According to.
Line unit value (rowkey) sortord as possessed by HBase database, when cutting pending data evidence to HBase
When multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to the line unit value sortord
It is ranked up.
Step S104 reads the ranking results of each clustered node, obtains multiple ranking results, wherein each cluster section
After point execution line unit value sortord is ranked up cutting data, a ranking results are obtained.
Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value,
To a ranking results, the ranking results being successively read in each clustered node arrive multiple ranking results.It needs to illustrate
It is that multiple ranking results are stored in multiple clustered nodes in HBase database, for example, when cutting data are certain website
Can be when amount of access, in clustered node a amount of access be 10000 to amount of access be 1000 webpage ranking results, cluster section
Can be in point b amount of access be 999 to amount of access be 100 webpage ranking results, can be for amount of access in clustered node c
99 to amount of access be 0 webpage ranking results.Wherein, after carrying out whole sequence to cutting data, obtained multiple sequences
As a result quantity can be chosen according to the actual needs of user.
Step S106 determines that the collection of multiple ranking results is combined into the ranking results of pending data evidence.It is distributed in multiple clusters
The collection of multiple ranking results in node is combined into the ranking results of pending data evidence.
In the embodiment of the present application, by by pending data according to multiple clustered nodes where cutting to HBase database
In, since HBase database has the line unit value sortord for capableing of auto-sequencing, pending data is according to cutting to multiple collection
, it can be achieved that automatically sequence after group node;Then the ranking results in multiple clustered nodes are read, it will be in each clustered node
The cutting data of completion of having sorted carry out whole sequence according to line unit value sortord again, obtain multiple ranking results, wherein more
The set of a ranking results is the ranking results of pending data evidence, and the application uses the sequence side of HBase database row key assignments
Formula be omitted need in the prior art by the pending data in each clustered node according to merge after can just treat sorting data into
Row sequence link, achieved the purpose that shorten the data sorting time, thus realize do not need in each database wait arrange
Ordinal number solves data sorting low efficiency in the prior art according to merging the technical effect that can be achieved with data sorting
Technical problem improves the performance of data sorting.
Optionally, after each clustered node execution line unit value sortord is ranked up cutting data, one is obtained
Ranking results include the following steps S1041:
Step S1041, clustered node Ai execute line unit value sortord to the cutting data Di of cutting to clustered node Ai
It is ranked up, obtains ranking results Ri, wherein it is the quantity of clustered node in HBase database, cluster that i, which successively takes 1 to n, n,
Node A1 to clustered node An constitute HBase database multiple clustered nodes, cutting data D1 to cutting data Dn constitute to
Sorting data.
Specifically, when cutting pending data is according to the clustered node A1 to clustered node An into HBase database, cutting number
It is stored in each clustered node according to Di with the form of data key values pair is unordered.Then by the data key values of cutting data Di
It is right, it is ranked up according to line unit value sortord, after each clustered node is according to line unit value sortord, obtains one
Ranking results Ri.It is assumed that cutting data Di can be the amount of access of multiple websites in a certain period, if domain name is
The amount of access of the website of www.baidu.com is 10000, then the domain name www.baidu.com of the website and 10000 groups of amount of access
At data key values pair, the data key values are to being expressed as (www.baidu.com 10000);Similarly, if domain name is
The amount of access of the website of www.google.com is 1000, then the domain name www.google.com of the website and 1000 groups of amount of access
At data key values pair, it is expressed as (www.google.com 1000).
It should be noted that user can be according to the surplus of each clustered node during above-mentioned pending data is according to cutting
Remaining memory space determines the data volume of the cutting data Di being sliced into each clustered node.
The ranking results for determining that the collection of multiple ranking results is combined into pending data evidence include step S1061, clustered node Ai
By the data key values of cutting data Di to storing the ranking results for obtaining pending data evidence to HBase database, wherein cutting
Key-value pair of the data key values of data Di to the total amount of data composition of mark and cutting data Di for cutting data Di.
Specifically, the data key values pair of the cutting data Di in ranking results Ri are read respectively, and according to fixed format
By the data key values of cutting data Di to storing into HBase database.It should be noted that being illustrated in above-mentioned steps
When cutting data Di is website in the amount of access of a certain period, domain name (i.e. cutting of the data key values to that can be the website
The mark of data Di) and the website a certain period amount of access (i.e. the total amount of data of cutting data Di) form.Similarly, if
When cutting data are the volumes of searches of a certain keyword, then the data key values are to can also be keyword (the i.e. mark of cutting data Di
Know) and the keyword a certain period volumes of searches, that is, cutting data Di total amount of data) form.It to sum up describes, cutting data
The data key values of Di form the total amount of data of mark and cutting data Di by cutting data Di.
When cutting data Di is the amount of access of certain website in a certain amount of time, to the cutting data Di in clustered node Ai
When carrying out whole sequence, the data key values of the cutting data Di after whole sequence need to store to HBase orderly
In multiple clustered nodes in database, the ranking results of multiple clustered node compositions are the ranking results of pending data evidence.
Where it is assumed that there is 4 clustered nodes, it can store the cutting data that amount of access is 100000~10000 in clustered node 1, collection
Can store in group node 2 amount of access be 9999~1000 cutting data, clustered node 3 can store amount of access be 999~
100 cutting data, clustered node 4 can store the cutting data that amount of access is 99~0.Clustered node 1 to clustered node 4 is
Clustered node in HBase database, wherein the quantity of clustered node can be chosen according to user's actual needs.
Optionally, in step S1061 clustered node Ai by the data key values of cutting data Di to storing to HBase database
Include the following steps S1 to step S5:
Whether step S1, inquiry HBase database have stored the line unit value of cutting data Di, wherein cutting data Di's
Line unit value is the negative of the total amount of data of cutting data Di.
In the case where having stored the line unit value of cutting data Di in HBase database, cutting data Di is deposited by step S3
Storage to first object arranges, wherein first object be classified as cutting data Di line unit value it is of the row belonging to it is any in column family pair
One column.
Step S5, in HBase database in the case where the line unit value of not stored cutting data Di, according to HBase data
The stored line unit value in library stores cutting data Di key-value pair.
Specifically, in the embodiment of the present application, using line unit value sortord to the cutting data Di in clustered node Ai into
Therefore row sequence when the cutting data Di in clustered node Ai is carried out whole sequence, first obtains the line unit of cutting data Di
Value rowkey, and the line unit value rowkey for whether having stored cutting data Di is inquired in HBase database, wherein line unit value
Rowkey is the negative of the data key values centering total amount of data of cutting data Di.
It is assumed that the data key values of a certain cutting data then should to for (www.baidu.com 10000) in cutting data Di
The line unit value of cutting data is -10000.If line unit value -10000 are inquired in HBase database, by data key values
Domain name " www.baidu.com " in (www.baidu.com 10000) is stored into HBase database, is specifically deposited
The position of storage is any family in column family belonging to line unit value -10000 is of the row.If not inquired in HBase database
Line unit value -10000, then by comparing the size of the line unit value -10000 and the line unit value stored into HBase database
The line unit value -10000 is inserted into the cluster-specific node in HBase database.
It is rowkey:-10000, family:f for having stored the cutting data Di storage form into database
Column:www.baidu.com value:1, wherein family is the mark of HBase physically separate data, and value is
The column name of HBase, when rowkey:-10000 is stored to certain a line, family:f, column:www.baidu.com and
Value:1 stores any family into the affiliated column family of the row of line unit value -10000, and family:f, column:
Www.baidu.com and value:1 are stored in same row.
Further, in step S5, in HBase database in the case where the line unit value of not stored cutting data Di, root
Include the following steps S51 to step S57 according to the stored line unit value storage cutting data Di key-value pair of HBase database:
Step S51 successively compares the big of the line unit value of cutting data Di and the line unit value stored into HBase database
It is small.
Step S53, the target line line unit value of cutting data Di being inserted into HBase database, wherein goal behavior
The first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and the second line unit value are
Stored line unit value in HBase database, the first row key assignments are line unit value less than cutting data Di, and with cutting data
The smallest line unit value of line unit value difference value of Di, the second line unit value are line unit value greater than cutting data Di, and with cutting data
The smallest line unit value of line unit value difference value of Di.
Step S55 stores cutting data Di to the second target column corresponding with target line, wherein the second target is classified as
Any one column in the affiliated column family of target line.
Step S57, update have stored the line unit value into HBase database.
Wherein, it after updating stored line unit value every time, is deposited in next time according to line unit value stored in HBase database
When storing up not stored cutting data Di key-value pair, corresponding the first row key assignments and the second line unit value can be reaffirmed.One is cut
The key-value pair of divided data Di, if its first row key assignments and the second line unit value all exist, the first row key assignments is less than second
Line unit value.
Specifically, if the line unit value stored in HBase database is -100000, -50000 and -40000, and not stored number
Value be -10000 line unit value, at this time by numerical value be -10000 line unit value respectively with numerical value be -100000, -50000 with -
40000 line unit value is compared.
When cutting data Di is the amount of access of website or is the volumes of searches of certain keyword, user is practical to be preferred to check
Amount of access is former websites, or checks that volumes of searches is former keywords, at this point, with negative in HBase database
Several forms saves line unit value, to arrange line unit value according to sequence from small to large, i.e., arranges according to descending sequence
Column amount of access.
For example, the line unit value for being respectively -100000, -50000 and -40000 with numerical value by the line unit value that numerical value is -10000
It is compared, is respectively less than -10000, and -40000 and -10000 by comparing it is found that -100000, -50000 and -40000
Difference is minimum, therefore -10000 will be inserted into the next line of the row of line unit value -40000, wherein line unit value -40000 namely
For the first row key assignments.In the description in above-mentioned steps S1061 it is found that if in clustered node 1 store amount of access be 100000~
10000 cutting data, line unit value -40000 and -10000 should be stored in clustered node 1, therefore, by numerical value be -
10000 line unit value is inserted into the line unit value next line of the row that numerical value is -40000, and the row for being -10000 by numerical value
Any one column where the line unit value that it is -10000 to numerical value that the cutting data Di of key assignments, which is stored, in the affiliated column family of target line,
In, the line unit value that numerical value is -40000 is the first row key assignments.By above-mentioned sort method, the amount of access to website can be completed
Carry out descending sequence.
In another example if the line unit value stored in HBase database is -5000, -4000 and -1000, and not stored numerical value
For -10000 line unit value, by comparing it is found that -5000, -4000 and -1000 be all larger than -10000, and -10000 with -
5000 difference is minimum, therefore -10000 should be inserted into -5000 lastrow of the row (that is, target line) of line unit value,
In, line unit value -5000 is also the second line unit value.But in the description in above-mentioned steps S1061 it is found that if in clustered node 2
Store amount of access be 9999~1000 cutting data, at this time by numerical value be -10000 line unit value and numerical value be -5000 row
Key assignments is not stored in same clustered node.Therefore, the line unit value that numerical value is -10000 should be stored in clustered node 1, by
It is -10000 in the maximum row key assignments that the line unit value that numerical value is -10000 is in node 1 in all line unit values, therefore by numerical value
Line unit value stores into clustered node 1 last line (that is, target line).
In another example by comparing it is found that storing numerical value in HBase database is -5000, -4000, -1000 and -500
Line unit value, at this time, it may be necessary to be inserted into the line unit value that numerical value is -2000, by will -2000 respectively with -5000, -4000 and -1000 into
Row relatively it is found that -2000 be greater than -4000 and -5000, and with -4000 difference minimum, therefore, by numerical value for -2000 row
Key assignments should be inserted into the line unit value next line of the row (that is, target line) that numerical value is -4000;Or -2000 be less than -
1000 and -500, and with -1000 difference minimum, therefore by numerical value be -2000 line unit value to be inserted into -1000 of the row
In lastrow (that is, target line), and by line unit value for -2000 cutting data Di store to numerical value for -2000 line unit value institute
Any one column in the affiliated column family of target line, wherein the line unit value that numerical value is -4000 is the first row key assignments, and numerical value is -
1000 line unit value is the second line unit value.
The cutting data Di in clustered node Ai is ranked up by above-mentioned sort method, is not needed clustered node again
Cutting data Di in each clustered node in Ai is integrated, and the line unit value by reading cutting data Di, which can be realized, cuts
The quicksort of divided data Di.For example, being saved in row for the domain name of www.baidu.com as column name in the embodiment of the present application
Key assignments be -10000 row in, if there is the amount of access of other websites to be similarly 10000, can be saved into line unit value be -
In any one idle column in 10000 affiliated column family of the row, any punching does not occur for this column with www.baidu.com
It is prominent.In the embodiment of the present application, the negative of amount of access is saved as line unit value to HBase database is in order to which amount of access is big
Data come before in HBase, facilitate inquiry.Also, sort method provided by the embodiments of the present application, for data volume compared with
Big pending data is according to, it can be achieved that quickly treat the effect that sorting data is ranked up.
Optionally, sort method provided by the present application further includes following steps S7 to step S9:
Step S7 receives inquiry instruction from the user by the query interface in HBase database, wherein inquiry refers to
Enable the instruction that corresponding cutting data any two line unit value between have been stored for inquiry into HBase database.
Step S9 shows the cutting data inquired in a manner of adding default mark in HBase database.
HBase provides query interface quite convenient as database, can be inquired by query interface specified any
Cutting data in line unit value section.Such as can quickly inquire very much line unit value be -1000 to line unit value be between -900
Website domain name which has, and can be shown in the form of default mark, it is default to be identified as addition background colour, font
The forms such as modification and suspended bubble.
If in HBase database and line unit value of the not stored numerical value between -1000 to -900, and storing numerical value
When the line unit value that line unit value and numerical value for -1000 are -900, when inquiry line unit value is -1000, which the domain name of website has, and
When inquiry line unit value is -900, which the domain name of website has.
If again in HBase database and line unit value of the not stored numerical value between -1000 to -900, also not stored numerical value
When the line unit value that line unit value and numerical value for -1000 are -900, then prompt information can be popped up, to prompt user's " number of inquiry
According to being not present ", empty data can also be shown, to show in HBase database and the not stored line unit value.
If again in HBase database and not stored numerical value is -1000 line unit values and numerical value is -900 line unit value, and depositing
When having stored up line unit value of the numerical value between -1000 and -900, such as the line unit value of storage is -950 line unit value, then inquires line unit
When value is -950, which the domain name of website has.
The embodiment of the present application also provides a kind of data sorting device based on HBase database, the data sorting devices
It is mainly used for executing the data reordering method based on HBase database provided by the embodiment of the present application above content, it is right below
Data sorting device based on HBase database provided by the embodiment of the present application does specific introduction.
Fig. 2 is the schematic diagram according to a kind of data sorting device based on HBase database of the embodiment of the present application.Such as figure
Shown in 2, which includes: cutting unit 10, reading unit 20 and determination unit 30, in which:
Cutting unit 10, for will be in multiple clustered nodes of the pending data according to cutting to HBase database, wherein every
A clustered node is performed both by the line unit value sortord of HBase database after obtaining cutting data.
Specifically, pending data evidence can be amount of access of certain website in a certain period, such as the visit of www.baidu.com
The amount of asking, the amount of access etc. of www.google.comde can also be volumes of searches of certain keyword in a certain period, such as " war of resistance
Review troops live streaming " in September 3 volumes of searches in 2015, " Beijing restricted driving " was in September 3 volumes of searches in 2015.It needs to illustrate
It is that pending data further includes any number for needing to sort according to the volumes of searches not just for above-mentioned website visiting amount and keyword
According to.
Line unit value (rowkey) sortord as possessed by HBase database, when cutting pending data evidence to HBase
When multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to the line unit value sortord
It is ranked up.
Reading unit 20 obtains multiple ranking results for reading the ranking results of each clustered node, wherein each
After clustered node execution line unit value sortord is ranked up cutting data, a ranking results are obtained.
Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value,
To a ranking results, the ranking results being successively read in each clustered node obtain multiple ranking results.It needs to illustrate
It is that multiple ranking results are stored in multiple clustered nodes in HBase database, for example, when cutting data are certain website
It can be the ranking results that amount of access is 10000 most webpages of amount of access 1000, cluster section when amount of access, in clustered node a
It can be the ranking results that amount of access is 999 most webpages of amount of access 100 in point b, can be for amount of access in clustered node c
99 to amount of access be 0 webpage ranking results.Wherein, after carrying out whole sequence to cutting data, obtained multiple sequences
As a result quantity can be chosen according to the actual needs of user.
Determination unit 30, for determining that the collection of multiple ranking results is combined into the ranking results of pending data evidence.
In the embodiment of the present application, by by pending data according to multiple clustered nodes where cutting to HBase database
In, since HBase database has the line unit value sortord for capableing of auto-sequencing, pending data is according to cutting to multiple collection
, it can be achieved that automatically sequence after group node;Then the ranking results in multiple clustered nodes are read, it will be in each clustered node
The cutting data of completion of having sorted carry out whole sequence according to line unit value sortord again, obtain multiple ranking results, wherein more
The set of a ranking results is the ranking results of pending data evidence, and the application uses the sequence side of HBase database row key assignments
Formula be omitted need in the prior art by the pending data in each clustered node according to merge after can just treat sorting data into
Row sequence link, achieved the purpose that shorten the data sorting time, thus realize do not need in each database wait arrange
Ordinal number solves data sorting low efficiency in the prior art according to merging the technical effect that can be achieved with data sorting
Technical problem improves the performance of data sorting.
Optionally, reading unit 20 includes sorting subunit, in which:
Sorting subunit executes line unit value sortord to the cutting number of cutting to clustered node Ai for clustered node Ai
It being ranked up according to Di, obtains ranking results Ri, wherein it is the quantity of clustered node in HBase database that i, which successively takes 1 to n, n,
Clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase database, cutting data D1 to cutting data Dn structure
At pending data evidence.
Determination unit 30 includes storing sub-units, wherein storing sub-units are for clustered node Ai by cutting data Di's
Data key values obtain the ranking results of pending data evidence to storing to HBase database, wherein the data key of cutting data Di
It is worth the key-value pair to the total amount of data composition of mark and cutting data Di for cutting data Di.
Specifically, when cutting pending data is according to the clustered node A1 to clustered node An into HBase database, cutting number
It is stored in each clustered node according to Di with the form of data key values pair is unordered.Then by the data key values of cutting data Di
It is right, it is ranked up according to line unit value sortord, after each clustered node is according to line unit value sortord, obtains one
Ranking results Ri.It is assumed that cutting data Di can be the amount of access of multiple websites in a certain period, if domain name is
The amount of access of the website of www.baidu.com is 10000, then the domain name www.baidu.com of the website and 10000 groups of amount of access
At data key values pair, the data key values are to being expressed as (www.baidu.com 10000);Similarly, if domain name is
The amount of access of the website of www.google.com is 1000, then the domain name www.google.com of the website and 1000 groups of amount of access
At data key values pair, it is expressed as (www.google.com 1000).
It should be noted that user can be according to the surplus of each clustered node during above-mentioned pending data is according to cutting
Remaining memory space determines the data volume of the cutting data Di being sliced into each clustered node.
The data key values pair of the cutting data Di in ranking results Ri are read respectively, and according to fixed format by cutting number
Data key values according to Di are to storing into HBase database.Work as cutting it should be noted that having been illustrated in the foregoing description
Data Di is website in the amount of access of a certain period, and data key values are to domain name (the i.e. cutting data Di that can be the website
Mark) and the website a certain period amount of access (i.e. the total amount of data of cutting data Di) composition.Similarly, if cutting data
For a certain keyword volumes of searches when, then the data key values are to can also be keyword (i.e. the mark of cutting data Di) and the pass
Volumes of searches, that is, cutting data Di total amount of data of the keyword in a certain period) composition.It to sum up describes, the data key of cutting data Di
Value forms the total amount of data of mark and cutting data Di by cutting data Di.
When cutting data Di is the amount of access of certain website in a certain amount of time, to the cutting data Di in clustered node Ai
When carrying out whole sequence, the data key values of the cutting data Di after whole sequence need to store to HBase orderly
In multiple clustered nodes in database, the ranking results of multiple clustered node compositions are the ranking results of pending data evidence.
Where it is assumed that there is 4 clustered nodes, it can store the cutting data that amount of access is 100000~10000 in clustered node 1, collection
Can store in group node 2 amount of access be 9999~1000 cutting data, clustered node 3 can store amount of access be 999~
100 cutting data, clustered node 4 can store the cutting data that amount of access is 99~0.Clustered node 1 to clustered node 4 is
Clustered node in HBase database, wherein the quantity of clustered node can be chosen according to user's actual needs.
Optionally, storing sub-units include enquiry module, the first memory module and the second memory module, in which:
Whether enquiry module has stored the line unit value of cutting data Di for inquiring HBase database, wherein line unit value
For the negative of the total amount of data of cutting data Di;First memory module, for having stored cutting data Di in HBase database
Line unit value in the case where, by cutting data Di store to first object arrange, wherein first object is classified as the row of cutting data Di
Any one column belonging to key assignments is of the row in column family pair;Second memory module is used for the not stored cutting in HBase database
In the case where the line unit value of data Di, cutting data Di key-value pair is stored according to the stored line unit value of HBase database.
Specifically, in the embodiment of the present application, using line unit value sortord to the cutting data Di in clustered node Ai into
Therefore row sequence when the cutting data Di in clustered node Ai is carried out whole sequence, first obtains the line unit of cutting data Di
Value rowkey, and the line unit value rowkey for whether having stored cutting data Di is inquired in HBase database, wherein line unit value
Rowkey is the negative of the data key values centering total amount of data of cutting data Di.
It is assumed that the data key values of a certain cutting data then should to for (www.baidu.com 10000) in cutting data Di
The line unit value of cutting data is -10000.If line unit value -10000 are inquired in HBase database, by data key values
Domain name " www.baidu.com " in (www.baidu.com 10000) is stored into HBase database, is specifically deposited
The position of storage is any family in column family belonging to line unit value -10000 is of the row.If not inquired in HBase database
Line unit value -10000, then by comparing the size of the line unit value -10000 and the line unit value stored into HBase database
The line unit value -10000 is inserted into the cluster-specific node in HBase database.
It is rowkey:-10000, family:f for having stored the cutting data Di storage form into database
Column:www.baidu.com value:1, wherein family is the mark of HBase physically separate data, and value is
The column name of HBase, when rowkey:-10000 is stored to certain a line, family:f, column:www.baidu.com and
Value:1 stores any family into the affiliated column family of the row of line unit value -10000, and family:f, column:
Www.baidu.com and value:1 are stored in same row.
Optionally, the second memory module includes Comparative sub-module, insertion submodule, sub-module stored and updates submodule,
Wherein:
Comparative sub-module, for successively comparing the line unit value of cutting data Di and the row stored into HBase database
The size of key assignments;It is inserted into submodule, for the line unit value of cutting data Di to be inserted into the target line in HBase database,
In, goal behavior the first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and
Two line unit values are stored line unit value in HBase database, and the first row key assignments is the line unit value greater than cutting data Di, and
And the smallest line unit value of line unit value difference value with cutting data Di, the second line unit value are the line unit value less than cutting data Di, and
And the smallest line unit value of line unit value difference value with cutting data Di;Sub-module stored, for by cutting data Di store to mesh
Corresponding second target column of mark row, wherein the second target is classified as any one column in the affiliated column family of target line;Update submodule
Block, for updating the line unit value stored into HBase database.
Specifically, if the line unit value stored in HBase database is -100000, -50000 and -40000, and not stored number
Value be -10000 line unit value, at this time by numerical value be -10000 line unit value respectively with numerical value be -100000, -50000 with -
40000 line unit value is compared.
When cutting data Di is the amount of access of website or is the volumes of searches of certain keyword, user is practical to be preferred to check
Amount of access is former websites, or checks that volumes of searches is former keywords, at this point, with negative in HBase database
Several forms saves line unit value, to arrange line unit value according to sequence from small to large, i.e., arranges according to descending sequence
Column amount of access.
For example, the line unit value for being respectively -100000, -50000 and -40000 with numerical value by the line unit value that numerical value is -10000
It is compared, is respectively less than -10000, and -40000 and -10000 by comparing it is found that -100000, -50000 and -40000
Difference is minimum, therefore -10000 will be inserted into the next line of the row of line unit value -40000, wherein line unit value -40000 namely
For the first row key assignments.Foregoing description is it is found that if store the cutting data that amount of access is 100000~10000, row in clustered node 1
Key assignments -40000 and -10000 should be stored in clustered node 1, therefore, the line unit value that numerical value is -10000 is inserted into number
Value for -40000 line unit value next line of the row in, and by numerical value for -10000 line unit value cutting data Di store to
Any one column where the line unit value that numerical value is -10000 in the affiliated column family of target line, wherein the line unit that numerical value is -40000
Value is the first row key assignments.By above-mentioned sort method, it can be completed and descending sequence is carried out to the amount of access of website.
In another example if the line unit value stored in HBase database is -5000, -4000 and -1000, and not stored numerical value
For -10000 line unit value, by comparing it is found that -5000, -4000 and -1000 be all larger than -10000, and -10000 with -
5000 difference is minimum, therefore -10000 should be inserted into the lastrow of the row of line unit value -5000, wherein line unit value -
5000 be also the second line unit value.But foregoing description is it is found that if storing amount of access in clustered node 2 is 9999~1000
Cutting data, at this time by numerical value be -10000 line unit value and numerical value be -5000 line unit value be not stored in same cluster section
Point in.Therefore, the line unit value that numerical value is -10000 should be stored in clustered node 1, since the line unit value that numerical value is -10000 is
Maximum row key assignments in node 1 in all line unit values, therefore the line unit value that numerical value is -10000 is stored into clustered node 1 most
A line afterwards.
In another example by comparing it is found that storing numerical value in HBase database is -5000, -4000, -1000 and -500
Line unit value, at this time, it may be necessary to be inserted into the line unit value that numerical value is -2000, by will -2000 respectively with -5000, -4000 and -1000 into
Row relatively it is found that -2000 be greater than -4000 and -5000, and with -4000 difference minimum, therefore, by numerical value for -2000 row
Key assignments should be inserted into the line unit value next line of the row that numerical value is -4000;Or -2000 be less than -1000 and -500,
And it is minimum with -1000 difference, therefore the line unit value that numerical value is -2000 is inserted into -1000 lastrows of the row, and
Where the cutting data Di that line unit value is -2000 is stored the line unit value for being -2000 to numerical value in the affiliated column family of target line
Any one column, wherein the line unit value that numerical value is -4000 is the first row key assignments, and the line unit value that numerical value is -1000 is the second row
Key assignments.
Optionally, the data sorting device provided by the present application based on HBase database further includes receiving unit and display
Unit, in which:
Receiving unit, for receiving inquiry instruction from the user by the query interface in HBase database, wherein
Inquiry instruction is to inquire cutting data corresponding to the line unit value stored into HBase database between any two line unit value
Instruction;Display unit, for showing that the line unit value institute inquired is right in HBase database in a manner of adding default mark
The cutting data answered.
HBase provides query interface quite convenient as database, can be inquired by query interface specified any
Cutting data in line unit value section.Such as can quickly inquire very much line unit value be -1000 to line unit value be between -900
Website domain name which has, and can be shown in the form of default mark, it is default to be identified as addition background colour, font
The forms such as modification and suspended bubble.
If in HBase database and line unit value of the not stored numerical value between -1000 to -900, and storing numerical value
When the line unit value that line unit value and numerical value for -1000 are -900, when inquiry line unit value is -1000, which the domain name of website has, and
When inquiry line unit value is -900, which the domain name of website has.
If again in HBase database and line unit value of the not stored numerical value between -1000 to -900, also not stored numerical value
When the line unit value that line unit value and numerical value for -1000 are -900, then prompt information can be popped up, to prompt user's " number of inquiry
According to being not present ", empty data can also be shown, to show in HBase database and the not stored line unit value.
If again in HBase database and not stored numerical value is -1000 line unit values and numerical value is -900 line unit value, and depositing
When having stored up line unit value of the numerical value between -1000 and -900, such as the line unit value of storage is -950 line unit value, then inquires line unit
When value is -950, which the domain name of website has.
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
In above-described embodiment of the application, all emphasizes particularly on different fields to the description of each embodiment, do not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the application whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered
It is considered as the protection scope of the application.
Claims (8)
1. a kind of data reordering method based on HBase database characterized by comprising
It will be in multiple clustered nodes of the pending data according to cutting to the HBase database, wherein each clustered node exists
After obtaining cutting data, it is performed both by the line unit value sortord of the HBase database;
The ranking results for reading each clustered node obtain multiple ranking results, wherein each clustered node
It executes after the line unit value sortord is ranked up cutting data, obtains the ranking results;And
Determine that the collection of multiple ranking results is combined into the ranking results of the pending data evidence;
Wherein, after each clustered node execution line unit value sortord is ranked up cutting data, one is obtained
A ranking results include: that clustered node Ai executes the line unit value sortord and cuts to cutting to the clustered node Ai
Divided data Di is ranked up, and obtains ranking results Ri, wherein it is clustered node in the HBase database that i, which successively takes 1 to n, n,
Quantity, clustered node A1 to clustered node An constitutes multiple clustered nodes of the HBase database, and cutting data D1 is to cutting
Divided data Dn constitutes the pending data evidence, determines that the collection of multiple ranking results is combined into the sequence knot of the pending data evidence
Fruit include: the clustered node Ai by the data key values of the cutting data Di to storing to the HBase database, obtain institute
State the ranking results of pending data evidence, wherein the data key values of the cutting data Di are to the mark for the cutting data Di
The key-value pair formed with the total amount of data of the cutting data Di.
2. the method according to claim 1, wherein the clustered node Ai is by the data of the cutting data Di
Key-value pair is stored to the HBase database
Inquire the line unit value whether the HBase database has stored the cutting data Di, wherein the cutting data Di's
Line unit value is the negative of the total amount of data of the cutting data Di;
In the case where having stored the line unit value of the cutting data Di in the HBase database, by the cutting data Di
It stores to first object and arranges, wherein the first object is classified as the line unit value affiliated column family of the row of the cutting data Di
In any one column;And
In the HBase database in the case where line unit value of the not stored cutting data Di, according to the HBase data
The stored line unit value in library stores the cutting data Di key-value pair.
3. according to the method described in claim 2, it is characterized in that, in the HBase database the not stored cutting number
In the case where line unit value according to Di, the cutting data Di key assignments is stored according to the stored line unit value of the HBase database
To including:
The successively size of the line unit value of the cutting data Di and the line unit value stored into the HBase database;
The target line line unit value of the cutting data Di being inserted into the HBase database, wherein the goal behavior
The first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and the second line unit value
It is stored line unit value in the HBase database, the first row key assignments is the line unit less than the cutting data Di
Value, and the smallest line unit value of line unit value difference value with the cutting data Di, the second line unit value are greater than the cutting
The line unit value of data Di, and the smallest line unit value of line unit value difference value with the cutting data Di;
The cutting data Di is stored to the second target column corresponding with the target line, wherein second target is classified as
Any one column in the affiliated column family of the target line;And
Update has stored the line unit value into the HBase database.
4. the method according to claim 1, wherein the method also includes:
Inquiry instruction from the user is received by query interface in the HBase database, wherein the inquiry instruction is
Inquiry has stored the finger of cutting data corresponding to the line unit value into the HBase database between any two line unit value
It enables;And
Cutting number corresponding to the line unit value inquired is shown in the HBase database in a manner of adding default mark
According to.
5. a kind of data sorting device based on HBase database characterized by comprising
Cutting unit, for will be in multiple clustered nodes of the pending data according to cutting to the HBase database, wherein each
The clustered node is performed both by the line unit value sortord of the HBase database after obtaining cutting data;
Reading unit obtains multiple ranking results for reading the ranking results of each clustered node, wherein every
After a clustered node execution line unit value sortord is ranked up cutting data, the sequence knot is obtained
Fruit;And
Determination unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence;
Wherein, the reading unit includes: sorting subunit, executes the line unit value sortord to cutting for clustered node Ai
Divide the cutting data Di to the clustered node Ai to be ranked up, obtains ranking results Ri, wherein it is institute that i, which successively takes 1 to n, n,
The quantity of clustered node in HBase database is stated, clustered node A1 to clustered node An constitutes the multiple of the HBase database
Clustered node, cutting data D1 to cutting data Dn constitute the pending data evidence, and the determination unit includes: that storage is single
The data key values of the cutting data Di are obtained institute to storing to the HBase database for the clustered node Ai by member
State the ranking results of pending data evidence, wherein the data key values of the cutting data Di are to the mark for the cutting data Di
The key-value pair formed with the total amount of data of the cutting data Di.
6. device according to claim 5, which is characterized in that the storing sub-units include:
Whether enquiry module has stored the line unit value of the cutting data Di for inquiring the HBase database, wherein institute
The negative for the total amount of data that the line unit value for stating cutting data Di is the cutting data Di;
First memory module, in the case where for having stored the line unit value of the cutting data Di in the HBase database,
The cutting data Di is stored to first object and is arranged, wherein the first object is classified as the line unit value of the cutting data Di
Any one column belonging to of the row in column family pair;And
Second memory module, in the case where line unit value for the cutting data Di not stored in the HBase database,
The cutting data Di key-value pair is stored according to the stored line unit value of the HBase database.
7. device according to claim 6, which is characterized in that second memory module includes:
Comparative sub-module for the successively line unit value of the cutting data Di and has been stored into the HBase database
Line unit value size;
It is inserted into submodule, for the line unit value of the cutting data Di to be inserted into the target line in the HBase database,
In, goal behavior the first row key assignments next line of the row or the second line unit value lastrow of the row, the first row
Key assignments and the second line unit value are stored line unit value in the HBase database, and the first row key assignments is greater than described
The line unit value of cutting data Di, and the smallest line unit value of line unit value difference value with the cutting data Di, second line unit
Value is line unit value less than the cutting data Di, and the smallest line unit value of line unit value difference value with the cutting data Di;
Sub-module stored, for storing the cutting data Di to the second target column corresponding with the target line, wherein institute
State any one column that the second target is classified as in the affiliated column family of the target line;And
Submodule is updated, for updating the line unit value stored into the HBase database.
8. device according to claim 5, which is characterized in that described device further include:
Receiving unit, for receiving inquiry instruction from the user by the query interface in the HBase database, wherein
The inquiry instruction is corresponding to the line unit value that inquiry has been stored into the HBase database between any two line unit value
The instruction of cutting data;And
Display unit, for showing the line unit value institute inquired in the HBase database in a manner of adding default mark
Corresponding cutting data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510733850.7A CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510733850.7A CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649385A CN106649385A (en) | 2017-05-10 |
CN106649385B true CN106649385B (en) | 2019-12-03 |
Family
ID=58809823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510733850.7A Expired - Fee Related CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649385B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819828B (en) * | 2017-10-16 | 2020-03-10 | 平安科技(深圳)有限公司 | Data transmission method and device, computer equipment and storage medium |
CN108733790B (en) * | 2018-05-11 | 2021-07-02 | 广州虎牙信息科技有限公司 | Data sorting method, device, server and storage medium |
CN113254488A (en) * | 2020-08-05 | 2021-08-13 | 深圳市汉云科技有限公司 | Data sorting method and system of distributed database |
CN112925809A (en) * | 2021-02-24 | 2021-06-08 | 浙江大华技术股份有限公司 | Data storage method, device and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942108A (en) * | 2014-04-25 | 2014-07-23 | 四川大学 | Resource parameter optimization method under Hadoop homogenous cluster |
CN103995827A (en) * | 2014-04-10 | 2014-08-20 | 北京大学 | High-performance ordering method for MapReduce calculation frame |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216683A (en) * | 2013-05-31 | 2014-12-17 | 国际商业机器公司 | Method and system for data processing through simultaneous multithreading (SMT) |
US10241709B2 (en) * | 2013-12-09 | 2019-03-26 | Vmware, Inc. | Elastic temporary filesystem |
-
2015
- 2015-11-02 CN CN201510733850.7A patent/CN106649385B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103995827A (en) * | 2014-04-10 | 2014-08-20 | 北京大学 | High-performance ordering method for MapReduce calculation frame |
CN103942108A (en) * | 2014-04-25 | 2014-07-23 | 四川大学 | Resource parameter optimization method under Hadoop homogenous cluster |
Non-Patent Citations (2)
Title |
---|
"Hadoop中TeraSort算法分析";CharlieQiao;《百度文库》;20110715;第2-3节,图2 * |
"一种周期性MapReduce作业的负载均衡策略";傅杰 等;《计算机科学》;20130331;第40卷(第03期);第3节 * |
Also Published As
Publication number | Publication date |
---|---|
CN106649385A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9946752B2 (en) | Low-latency query processor | |
US9870382B2 (en) | Data encoding and corresponding data structure | |
CN105550225B (en) | Index structuring method, querying method and device | |
CN106649385B (en) | Data reordering method and device based on HBase database | |
US20160328445A1 (en) | Data Query Method and Apparatus | |
CN106528787A (en) | Mass data multi-dimensional analysis-based query method and device | |
CN108228799B (en) | Object index information storage method and device | |
CN104182405A (en) | Method and device for connection query | |
CN105760380A (en) | Database query method, device and system | |
CN110322318B (en) | Client grouping method, device and computer storage medium | |
CN112527824B (en) | Paging query method, paging query device, electronic equipment and computer-readable storage medium | |
CN103186666A (en) | Method, device and equipment for searching based on favorites | |
CN103605848A (en) | Method and device for analyzing paths | |
CN103116641B (en) | Obtain method and the collator of the statistics of sequence | |
JP4758429B2 (en) | Shared memory multiprocessor system and information processing method thereof | |
CN104462420A (en) | Method and device for executing query tasks on database | |
CN104123329B (en) | Searching method and device | |
CN110555034B (en) | Data query paging method, device, server and medium | |
CN108268523B (en) | Database aggregation processing method and device | |
CN104077361A (en) | Big data sequencing method and system | |
CN106202412A (en) | Data retrieval method and device | |
CN108255893B (en) | Personalized object recommendation method and device | |
CN114996552A (en) | Data acquisition method and terminal | |
CN108984582A (en) | A kind of inquiry request processing method | |
EP3120265A1 (en) | A method and system for determining a measure of overlap between data entries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191203 |