CN106649385A - Data ranking method and device based on HBase database - Google Patents
Data ranking method and device based on HBase database Download PDFInfo
- Publication number
- CN106649385A CN106649385A CN201510733850.7A CN201510733850A CN106649385A CN 106649385 A CN106649385 A CN 106649385A CN 201510733850 A CN201510733850 A CN 201510733850A CN 106649385 A CN106649385 A CN 106649385A
- Authority
- CN
- China
- Prior art keywords
- line unit
- data
- unit value
- cutting data
- cutting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000005520 cutting process Methods 0.000 claims description 282
- 238000003860 storage Methods 0.000 claims description 14
- 230000006399 behavior Effects 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 4
- 230000037431 insertion Effects 0.000 claims description 4
- 108010001267 Protein Subunits Proteins 0.000 claims description 3
- 239000000203 mixture Substances 0.000 description 14
- 230000000694 effects Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 241001269238 Data Species 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004904 shortening Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- VIKNJXKGJWUCNN-XGXHKTLJSA-N norethisterone Chemical compound O=C1CC[C@@H]2[C@H]3CC[C@](C)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=C1 VIKNJXKGJWUCNN-XGXHKTLJSA-N 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data ranking method and a device based on a HBase database. The method comprises the steps that the to-be-ranked data are segmented into multiple cluster nodes in HBase database, wherein each cluster node after being given the segmented data executes a row-key value ranking method in HBase database; the ranked results for each cluster node are read, and multiple ranking results are obtained, wherein each cluster node after executing the row-key value ranking method and performing rankings on the segmented data obtains a ranked result; the collections of multiple ranking results are determined as the ranking results of the to-be-ranked data. The technical problem that in the prior art the efficiency of data ranking is low is solved.
Description
Technical field
The application is related to computer realm, in particular to a kind of data reordering method based on HBase databases
And device.
Background technology
The sequence of data is more applied at present in big data statistics, for example, website visiting amount is ranked up, can be with
The maximum website rank order of visit capacity is checked, some decision-makings are then carried out.Wherein, the data in small data quantity are carried out
In the case of sequence, using current many quick sort algorithms, sequence will be a very simple thing.But
After data volume arrives greatly certain rank, simple sequence originally also becomes complexity.For example, when the number for having 100G
When according to sequence is needed, system simply cannot read in internal memory and then carry out one-machine sequencing data, because not having
Have any server to have the internal memory of 100G, even if there is the server of this very large memory, also can never with it
Full memory is being ranked up.
Sortord constructed by current Technical Architecture, can be by existing framework come the task of data sorting point
Cloth goes to calculate on each node of cluster.I.e. the data cutting of 100G, in being distributed to each node of cluster,
By the United Dispatching of framework, data are read on each node, be then ranked up calculating, finally by each node
Upper sorted result is merged, and then overall result is exported in file system.
Above-mentioned sortord has two problems:
First problem is the merging of result:Sorted ranking results need to merge on each node.Because
Data are in itself unordered, therefore are also that milli is irregular between the ranking results of each node, and such problem is exactly
Or merging process is very slow, or needing to introduce new distribution mechanism, for the ranking results data to be merged are entered
Minor sort and merging again after the relatively orderly distribution again of row, regardless of whether any scheme all can be relatively slower.
Second Problem is checking for result:After i.e. whole data set all sorts, needs preserve into file, and this
Sample has resulted in the inconvenience for checking ranking results, it is impossible to quickly check the ranking results for being arbitrarily designated interval.
For above-mentioned problem, effective solution is not yet proposed at present.
The content of the invention
The embodiment of the present application provides a kind of data reordering method and device based on HBase databases, at least to solve
The low technical problem of data sorting efficiency in prior art.
According to the one side of the embodiment of the present application, there is provided a kind of data reordering method based on HBase databases,
The method includes:By in multiple clustered nodes of the pending data according to cutting to the HBase databases, wherein, each
The clustered node is performed both by the line unit value sortord of the HBase databases after cutting data are obtained;Read
The ranking results of each clustered node, obtain multiple ranking results, wherein, each described clustered node is held
After the row line unit value sortord is ranked up to cutting data, the ranking results are obtained;And determine
The collection of multiple ranking results is combined into the ranking results of the pending data evidence.
Further, each described clustered node is performed after the line unit value sortord is ranked up to cutting data,
Obtaining the ranking results includes:Clustered node Ai performs the line unit value sortord to cutting to the collection
Cutting data Di of group node Ai are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is institute
The quantity of clustered node in HBase databases is stated, clustered node A1 to clustered node An constitutes the HBase data
Multiple clustered nodes in storehouse, cutting data D1 to cutting data Dn constitute the pending data evidence, determine multiple described
The collection of ranking results is combined into the ranking results of the pending data evidence to be included:The clustered node Ai is by the cutting data
The data key values of Di obtain the ranking results of the pending data evidence to storing to the HBase databases, wherein,
The data key values of cutting data Di are total to being the mark of cutting data Di and the data of cutting data Di
The key-value pair of amount composition.
Further, the clustered node Ai by the data key values of cutting data Di to storing to the HBase
Database includes:The line unit value whether the HBase databases have stored cutting data Di is inquired about, wherein, institute
The line unit value for stating cutting data Di is the negative of the data total amount of cutting data Di;In the HBase databases
In the case of inside having stored the line unit value of cutting data Di, cutting data Di are stored to first object row,
Wherein, the first object is classified as any one row in the affiliated row race that the line unit value of cutting data Di is expert at;
And in the case of the line unit value of cutting data Di is not stored in the HBase databases, according to described
The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.
Further, in the case of the line unit value of cutting data Di is not stored in the HBase databases, root
The line unit value stored according to the HBase databases stores the cutting data Di key-value pair to be included:It is relatively more described successively
The size of the line unit value of cutting data Di and the line unit value stored into the HBase databases;By the cutting number
The target line in the HBase databases is inserted into according to the line unit value of Di, wherein, the goal behavior the first row key assignments
The lastrow that the next line or the second line unit value being expert at is expert at, the first row key assignments and the second line unit value are institute
State the line unit value stored in HBase databases, the first row key assignments be less than the line unit value of cutting data Di,
And the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is more than the cutting
The line unit value of data Di, and the line unit value minimum with the line unit value difference value of cutting data Di;By the cutting
Data Di are stored to the second target column corresponding with the target line, wherein, second target is classified as the target line
Affiliated row race in any one row;And renewal has stored the line unit value into the HBase databases.
Further, methods described also includes:Received from user by the query interface in the HBase databases
Query statement, wherein, the query statement for inquiry stored any two line unit into the HBase databases
The instruction of the cutting data corresponding to line unit value between value;And in the HBase in the way of adding and preset mark
The cutting data corresponding to the line unit value for inquiring are shown in database.
According to the another aspect of the embodiment of the present application, a kind of data sorting device based on HBase databases is additionally provided,
The device includes:Cutting unit, for by pending data according to cutting to the HBase databases multiple clustered nodes
In, wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained
Sortord;Reading unit, for reading the ranking results of each clustered node, obtains multiple sequence knots
Really, wherein, each described clustered node performed after the line unit value sortord is ranked up to cutting data,
To the ranking results;And determining unit, for determining that the collection of multiple ranking results is combined into the row for the treatment of
The ranking results of ordinal number evidence.
Further, the reading unit includes:Sequence subelement, for clustered node Ai the line unit value row is performed
Sequential mode is ranked up to cutting to cutting data Di of the clustered node Ai, obtains ranking results Ri, wherein,
I takes successively 1 to n, and n is the quantity of clustered node in the HBase databases, and clustered node A1 is to clustered node
An constitutes multiple clustered nodes of the HBase databases, the row for the treatment of described in cutting data D1 to cutting data Dn composition
Ordinal number evidence, the determining unit includes:Storing sub-units, for the clustered node Ai by cutting data Di
Data key values to storing to the HBase databases, obtain the ranking results of the pending data evidence, wherein, institute
The data key values of cutting data Di are stated to the mark for cutting data Di and the data total amount of cutting data Di
The key-value pair of composition.
Further, the storing sub-units include:Enquiry module, for whether to inquire about the HBase databases
The line unit value of cutting data Di is stored, wherein, the line unit value of cutting data Di is cutting data Di
Data total amount negative;First memory module, for having stored the cutting data in the HBase databases
In the case of the line unit value of Di, cutting data Di are stored to first object row, wherein, the first object
Be classified as in the affiliated row race that the line unit value of cutting data Di is expert to any one row;And second memory module,
For in the case of the line unit value that cutting data Di are not stored in the HBase databases, according to described
The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.
Further, second memory module includes:Comparison sub-module, for comparing cutting data Di successively
Line unit value and the size of line unit value stored into the HBase databases;Insertion submodule, for will be described
The line unit value of cutting data Di is inserted into the target line in the HBase databases, wherein, the goal behavior first
The lastrow that the next line or the second line unit value that line unit value is expert at is expert at, the first row key assignments and the second line unit value
The line unit value stored in the HBase databases is, the first row key assignments is more than cutting data Di
Line unit value, and the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is little
Line unit value in the line unit value of cutting data Di and minimum with the line unit value difference value of cutting data Di;
Sub-module stored, for cutting data Di to be stored to the second target column corresponding with the target line, wherein,
Second target is classified as any one row in the affiliated row race of the target line;And submodule is updated, for updating
The line unit value into the HBase databases has been stored.
Further, described device also includes:Receiving unit, for being connect by the inquiry in the HBase databases
Mouth receives the query statement from user, wherein, the query statement has been stored to the HBase databases for inquiry
The instruction of the cutting data corresponding to line unit value between middle any two line unit value;And display unit, for adding
Plus the mode of default mark shows the cutting data corresponding to the line unit value for inquiring in the HBase databases.
In the embodiment of the present application, using by pending data according to cutting to the HBase databases multiple clustered nodes
In, wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained
Sortord;The ranking results of each clustered node are read, multiple ranking results are obtained, wherein, each
The clustered node is performed after the line unit value sortord is ranked up to cutting data, obtains a sequence
As a result;And determine that the collection of multiple ranking results is combined into the mode of the ranking results of the pending data evidence, pass through
In multiple clustered nodes that pending data is located according to cutting to HBase databases, because HBase databases have energy
The line unit value sortord of enough auto-sequencings, therefore after pending data is according to cutting to multiple clustered nodes, be capable of achieving certainly
Dynamic sequence;Then the ranking results in multiple clustered nodes are read, the cutting of completing of having sorted in each clustered node
Divided data carries out again overall sequence according to line unit value sortord, obtains multiple ranking results, wherein, multiple sequence knots
The set of fruit is the ranking results of pending data evidence, and the application is saved using the sortord of HBase database row key assignments
Omited needs the pending data in each clustered node to be carried out according to can just treat sorting data after merging in prior art
The link of sequence, reached shorten the data sorting time purpose, it is achieved thereby that need not to each database in treat
Sorting data is merged and can be achieved with the technique effect of data sorting, and then solves data sorting effect in prior art
The low technical problem of rate, improves the performance of data sorting.
Description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing
In:
Fig. 1 is a kind of flow chart of the data reordering method based on HBase databases according to the embodiment of the present application;With
And
Fig. 2 is a kind of schematic diagram of the data sorting device based on HBase databases according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment
The only embodiment of the application part, rather than the embodiment of whole.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, all should belong to
The scope of the application protection.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, "
Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, and for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or other steps intrinsic for these processes, method, product or equipment or unit.
First, description below is made to the technical term that the present embodiment is related to:
HBase be one it is distributed, towards row PostgreSQL database, the distributed memory system of a structural data.
HBase is different from general relational database, and it is a database for being suitable for unstructured data storage.It is another
HBase unlike individual is per-column rather than based on capable pattern.
Rowkey is the data unique mark of HBase.HBase preserves data and sorts according to rowkey, HBase inquiries
Data are also based on rowkey, or direct access specifies the whole piece data of single rowkey, or scanning to start
Rowkey is to the whole data interval for terminating rowkey.
Family is the mark of HBase physically separate datas, needs to be predefined when table is built, when most of
Wait and only use single family, i.e., for same data line is not physically isolated, convenient inquiry.
Column is the row name of HBase, and HBase is non-structured database, that is, when HBase table is created not
Pre-defined column is needed, can at any time be used and be added at any time.
Value is the data value that HBase is finally preserved, and by rowkey+family+column preservation can be found
Value.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for the data reordering method based on HBase databases,
It should be noted that can be in the calculating of such as one group of computer executable instructions the step of the flow process of accompanying drawing is illustrated
Perform in machine system, and, although show logical order in flow charts, but in some cases, can be with
Shown or described step is performed different from order herein.
Fig. 1 is a kind of flow chart of the data reordering method based on HBase databases according to the embodiment of the present application, such as
Shown in Fig. 1, the method comprises the steps S102 to step S106:
Step S102, by multiple clustered nodes of the pending data according to cutting to HBase databases, wherein, each collection
Group node is performed both by the line unit value sortord of HBase databases after cutting data are obtained.
Specifically, pending data evidence can be visit capacity of certain website in a certain period, such as www.baidu.com's
Visit capacity, visit capacity of www.google.comde etc., can also be certain keyword a certain period volumes of searches, example
If " war of resistance is reviewed troops live " is in the volumes of searches of September 3 in 2015, " Beijing restricted driving " was in September 3 in 2015
Volumes of searches.It should be noted that pending data is according to the volumes of searches not just for above-mentioned website visiting amount and keyword,
Also include arbitrarily needing the data of sequence.
Due to line unit value (rowkey) sortord that HBase data places have, when cutting pending data evidence to HBase
During multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to line unit value sequence
Mode is ranked up.
Step S104, reads the ranking results of each clustered node, obtains multiple ranking results, wherein, each cluster
After node execution line unit value sortord is ranked up to cutting data, a ranking results are obtained.
Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value,
To a ranking results, the ranking results being successively read in each clustered node, to multiple ranking results.Need
Bright, multiple ranking results are stored in the multiple clustered nodes in HBase databases, for example, when cutting data
For certain website visit capacity when, can be webpage that 10000 to visit capacity is 1000 for visit capacity in clustered node a
Ranking results, can be the ranking results that visit capacity is the webpage that 999 to visit capacity is 100 in clustered node b, collection
Can be ranking results that visit capacity is the webpage that 99 to visit capacity is 0 in group node c.Wherein, cutting data are entered
After the overall sequence of row, the quantity of the multiple ranking results for obtaining can be actually needed to be chosen according to user.
Step S106, the collection for determining multiple ranking results is combined into the ranking results of pending data evidence.It is distributed in multiple clusters
The collection of the multiple ranking results in node is combined into the ranking results of pending data evidence.
In the embodiment of the present application, the multiple clustered nodes by the way that pending data is located according to cutting to HBase databases
In, be capable of the line unit value sortord of auto-sequencing because HBase databases have, therefore pending data according to cutting extremely
After multiple clustered nodes, automatically sequence is capable of achieving;Then the ranking results in multiple clustered nodes are read, will be every
The cutting data for completing that sorted in individual clustered node carry out again overall sequence according to line unit value sortord, obtain multiple
Ranking results, wherein, the set of multiple ranking results is the ranking results of pending data evidence, and the application adopts HBase
The sortord of database row key assignments is eliminated to be needed the pending data in each clustered node according to merging in prior art
The link that sorting data is ranked up can be just treated afterwards, the purpose for shortening the data sorting time has been reached, so as to realize
To the pending data in each database according to merging the technique effect that can be achieved with data sorting, and then need not solve
The low technical problem of data sorting efficiency in prior art of having determined, improves the performance of data sorting.
Alternatively, after each clustered node execution line unit value sortord is ranked up to cutting data, one is obtained
Ranking results comprise the steps S1041:
Step S1041, clustered node Ai performs cutting data of the line unit value sortord to cutting to clustered node Ai
Di is ranked up, and obtains ranking results Ri, wherein, i takes successively 1 to n, and n is cluster section in HBase databases
The quantity of point, clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase databases, cutting data
D1 to cutting data Dn constitute pending data evidence.
Specifically, when cutting pending data is according to clustered node A1 in HBase databases to clustered node An, cut
Divided data Di is unordered in the form of data key values pair to be stored in each clustered node.Then by cutting data Di
Data key values pair, are ranked up according to line unit value sortord, each clustered node according to line unit value sortord it
Afterwards, a ranking results Ri is obtained.It is assumed that cutting data Di can be the access of multiple websites in a certain period
Amount, if domain name is 10000 for the visit capacity of the website of www.baidu.com, domain name www.baidu.com of the website
With the composition data key-value pair of visit capacity 10000, the data key values are to being expressed as (www.baidu.com 10000);Together
Sample ground, if domain name is 1000 for the visit capacity of the website of www.google.com, the domain name of the website
Www.google.com and the composition data key-value pair of visit capacity 1000, are expressed as (www.google.com 1000).
It should be noted that during above-mentioned pending data is according to cutting, user can be according to the surplus of each clustered node
Remaining memory space is determining the data volume of cutting data Di being sliced in each clustered node.
The collection for determining multiple ranking results is combined into the ranking results of pending data evidence including step S1061, clustered node Ai
By the data key values of cutting data Di to storing to HBase databases, the ranking results of pending data evidence are obtained, wherein,
The key assignments that the data key values of cutting data Di are constituted to the data total amount of the mark for cutting data Di and cutting data Di
It is right.
Specifically, the data key values pair of cutting data Di in ranking results Ri are read respectively, and according to fixed lattice
Formula is by the data key values of cutting data Di to storing into HBase databases.It should be noted that in above-mentioned steps
Illustrate when cutting data Di be website in the visit capacity of a certain period when, data key values are to being the website
Visit capacity (the i.e. data of cutting data Di of domain name (i.e. the marks of cutting data Di) and the website in a certain period
Total amount) composition.Similarly, if cutting data are the volumes of searches of a certain keyword, the data key values pair can be with
It is keyword (i.e. the marks of cutting data Di) and volumes of searches of the keyword in a certain period is cutting data Di
Data total amount) composition.To sum up describe, the data key values of cutting data Di to the mark by cutting data Di and this cut
The data total amount composition of divided data Di.
When cutting data Di are certain website visit capacity in a certain amount of time, to the cutting data in clustered node Ai
When Di carries out overall sequence, need to be by the data key values of cutting data Di after overall sequence to orderly storage
In multiple clustered nodes into HBase databases, the ranking results of multiple clustered node compositions are pending data evidence
Ranking results.Where it is assumed that there is 4 clustered nodes, visit capacity can be stored in clustered node 1 is
100000~10000 cutting data, can store the cutting data that visit capacity is 9999~1000 in clustered node 2,
Clustered node 3 can store the cutting data that visit capacity is 999~100, and clustered node 4 can store visit capacity for 99~0
Cutting data.Clustered node 1 is the clustered node in HBase databases to clustered node 4, wherein, cluster
The quantity of node can be actually needed to choose according to user.
Alternatively, in step S1061 clustered node Ai by the data key values of cutting data Di to storing to HBase numbers
Comprise the steps S1 to step S5 according to storehouse:
Step S1, inquires about the line unit value whether HBase databases have stored cutting data Di, wherein, cutting data
The line unit value of Di is the negative of the data total amount of cutting data Di.
Step S3, in the case of having stored the line unit value of cutting data Di in HBase databases, by cutting data
Di is stored to first object row, wherein, first object is classified as the affiliated row race that the line unit value of cutting data Di is expert at
In to any one row.
Step S5, in the case of the line unit value of cutting data Di is not stored in HBase databases, according to HBase
The line unit value storage cutting data Di key-value pair that database has been stored.
Specifically, in the embodiment of the present application, using line unit value sortord to cutting data Di in clustered node Ai
It is ranked up, therefore, when cutting data Di in clustered node Ai are carried out into overall sequence, first obtain cutting number
According to line unit value rowkey of Di, and the line unit value for storing cutting data Di whether is inquired about in HBase databases
Rowkey, wherein, line unit value rowkey is the negative of the data key values centering data total amount of cutting data Di.
It is assumed that in cutting data Di the data key values of a certain cutting data to for (www.baidu.com 10000), then
The line unit value of the cutting data is -10000.If inquiring line unit value -10000 in HBase databases, will
Data key values are stored to HBase data to the domain name " www.baidu.com " in (www.baidu.com 10000)
Any family in storehouse, in the affiliated row race that the concrete position for storing is expert at by line unit value -10000.If
Line unit value -10000 are not inquired in HBase databases, then by compare the line unit value -10000 with store to
The line unit value -10000 is inserted into the size of the line unit value in HBase databases the specified collection in HBase databases
In group node.
For the cutting data Di storage form stored into database is rowkey:- 10000, family:f
column:www.baidu.com value:1, wherein, family is the mark of HBase physically separate datas,
Value is the row name of HBase, works as rowkey:- 10000 when storing to certain a line, family:F,
column:Www.baidu.com and value:1 stores appointing in the affiliated row race being expert to line unit value -10000
Meaning family, and family:F, column:Www.baidu.com and value:1 is stored in same row.
Further, in step S5, in the case of the line unit value of cutting data Di is not stored in HBase databases,
Storing cutting data Di key-value pair according to the line unit value that HBase databases have been stored comprises the steps S51 to step
S57:
Step S51, the line unit value that cutting data Di are compared successively and the line unit value stored into HBase databases
Size.
Step S53, the target line line unit value of cutting data Di being inserted in HBase databases, wherein, target
The lastrow that the next line or the second line unit value that behavior the first row key assignments is expert at is expert at, the first row key assignments and the second row
Key assignments is the line unit value stored in HBase databases, and the first row key assignments is the line unit value less than cutting data Di,
And the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is more than the row of cutting data Di
Key assignments, and the line unit value minimum with the line unit value difference value of cutting data Di.
Step S55, cutting data Di are stored to the second target column corresponding with target line, wherein, the second target column
For any one row in the affiliated row race of target line.
Step S57, renewal has stored the line unit value into HBase databases.
Wherein, after the line unit value for having stored is updated every time, in next time according to the line unit value stored in HBase databases
When not storing cutting data Di key-value pair, corresponding the first row key assignments and the second line unit value can be reaffirmed.For
The key-value pair of one cutting data Di, if its first row key assignments and the second line unit value are all present, then the first line unit
Value is less than the second line unit value.
Specifically, if the line unit value stored in HBase databases is -100000, -50000 and -40000, do not deposit
Storage numerical value is -10000 line unit value, is respectively now -100000 with numerical value by the line unit value that numerical value is -10000,
- 50000 and -40000 line unit value is compared.
When visit capacity or volumes of searches for certain keyword of cutting data Di for website, user is actual to be more desirable to check
Visit capacity is the website of former, or checks the keyword that volumes of searches is former, now, in HBase databases
In in the form of negative preserving line unit value, so as to according to order arrangement line unit value from small to large, i.e., according to by greatly to
Little order arrangement visit capacity.
For example, it is respectively -100000, -50000 line units with -40000 with numerical value by the line unit value that numerical value is -10000
Value is compared, and by comparing ,-100000 ,-50000 and-40000-10000, and-40000 are respectively less than
Difference with -10000 is minimum, therefore -10000 are inserted in the next line that line unit value -40000 is expert at, wherein,
Line unit value -40000 is also the first row key assignments.Description in above-mentioned steps S1061 understands, if in clustered node 1
Storage visit capacity is 100000~10000 cutting data, and line unit value -40000 and -10000 should be stored in cluster
In node 1, therefore, the line unit value that numerical value is -10000 is inserted under the line unit value that numerical value is -40000 is expert at
In a line, and cutting data Di of the line unit value that numerical value is -10000 are stored to the line unit value institute that numerical value is -10000
Any one row in the affiliated row race of target line, wherein, numerical value is that -40000 line unit value is the first row key assignments.
By above-mentioned sort method, you can completing the visit capacity to website carries out descending sequence.
Again for example, if the line unit value stored in HBase databases is -5000, -4000 and -1000, number is not stored
It is worth the line unit value for-10000, by comparing ,-5000 ,-4000 and-1000-10000, and-10000 is all higher than
Difference with -5000 is minimum, therefore should be inserted into lastrow (that is, the mesh that line unit value -5000 is expert at by -10000
Mark row) in, wherein, line unit value -5000 is also the second line unit value.But, retouching in above-mentioned steps S1061
State and understand, be now -10000 by numerical value if storing the cutting data that visit capacity is 9999~1000 in clustered node 2
Line unit value and line unit value that numerical value is -5000 be not stored in same clustered node.Therefore, numerical value is -10000
Line unit value should be stored in clustered node 1, because line unit value that numerical value is -10000 is all line unit values in node 1
In maximum row key assignments, therefore the line unit value that numerical value is -10000 is stored into into clustered node 1 last column (i.e.,
Target line).
Again for example, by comparing, it is -5000, -4000, -1000 and -500 that numerical value is stored in HBase databases
Line unit value, at this time, it may be necessary to insert the line unit value that numerical value is -2000, by will be -2000 respectively with -5000, -4000
It is compared with -1000 and understands, -2000 is more than -4000 and -5000, and the difference with -4000 is minimum, therefore,
The line unit value that numerical value is -2000 should be inserted into into the next line (that is, target line) that the line unit value that numerical value is -4000 is expert at
In;Or, -2000 are less than -1000 and -500, and the difference with -1000 is minimum, therefore are -2000 by numerical value
Line unit value be inserted in -1000 lastrows being expert at (that is, target line), and by cutting that line unit value is -2000
Data Di store any one row into the affiliated row race of the line unit value place target line that numerical value is -2000, wherein, number
It is worth and is the first row key assignments for -4000 line unit value, numerical value is that -1000 line unit value is the second line unit value.
Cutting data Di in clustered node Ai are ranked up by above-mentioned sort method, it is not necessary to again by cluster section
Cutting data Di in each clustered node in point Ai are integrated, and are by reading the line unit value of cutting data Di
The quicksort of achievable cutting data Di.For example, in the embodiment of the present application, the domain name of www.baidu.com is made
It is saved in the row that line unit value is -10000 for row name, if the visit capacity for having other websites is similarly 10000, can be with
It is saved in any one idle row in the affiliated row race being expert at that line unit value is -10000, with
There is no any conflict in www.baidu.com this row.In the embodiment of the present application, using the negative of visit capacity as row
It is the convenient inquiry in order to the big data of visit capacity are come before in HBase that key assignments is preserved to HBase databases.
Also, the sort method that the embodiment of the present application is provided, the pending data evidence larger for data volume, it is quick right to be capable of achieving
Pending data is according to the effect being ranked up.
Alternatively, the sort method that the application is provided also comprises the steps S7 to step S9:
Step S7, by the query interface in HBase databases the query statement from user is received, wherein, inquiry
Instruct to inquire about the instruction of the cutting data corresponding to having stored into HBase databases between any two line unit value.
Step S9, in the way of adding and preset mark the cutting data for inquiring are shown in HBase databases.
HBase provides quite convenient query interface as database, can be inquired about by query interface and specify any
Cutting data in line unit value interval.It is -900 that line unit value such as can quickly be inquired very much for -1000 to line unit value
Between website domain name which has, and can be shown in the form of default mark, it is default to be designated addition background
The modification of color, font, and the form such as suspended bubble.
If not storing line unit value of the numerical value between -1000 to -900 in HBase databases, and store numerical value
When line unit value for -1000 and line unit value that numerical value is -900, when inquiry line unit value is -1000, the domain name of website has
Which, and inquire about line unit value for -900 when, which the domain name of website has.
If again line unit value of the numerical value between -1000 to -900 is not stored in HBase databases, number is not stored yet
When being worth line unit value and the line unit value that numerical value is -900 for -1000, then information can be ejected, to point out user " to look into
The data of inquiry are not present ", empty data can also be shown, to show not storing the line unit value in HBase databases.
If again the line unit value that numerical value is -1000 line unit values and numerical value is -900 is not stored in HBase databases, and
When storing line unit value of the numerical value between -1000 and -900, the line unit value of such as storage is -950 line unit value, then
When inquiry line unit value is -950, which the domain name of website has.
The embodiment of the present application additionally provides a kind of data sorting device based on HBase databases, the data sorting device
It is mainly used in performing the data reordering method based on HBase databases that the embodiment of the present application the above is provided, with
Under the embodiment of the present application is provided concrete introduction is done based on the data sorting device of HBase databases.
Fig. 2 is a kind of schematic diagram of the data sorting device based on HBase databases according to the embodiment of the present application.Such as
Shown in Fig. 2, the data sorting device includes:Cutting unit 10, reading unit 20 and determining unit 30, wherein:
Cutting unit 10, for the multiple clustered nodes by pending data according to cutting to HBase databases in, wherein,
Each clustered node is performed both by the line unit value sortord of HBase databases after cutting data are obtained.
Specifically, pending data evidence can be visit capacity of certain website in a certain period, such as www.baidu.com's
Visit capacity, visit capacity of www.google.comde etc., can also be certain keyword a certain period volumes of searches, example
If " war of resistance is reviewed troops live " is in the volumes of searches of September 3 in 2015, " Beijing restricted driving " was in September 3 in 2015
Volumes of searches.It should be noted that pending data is according to the volumes of searches not just for above-mentioned website visiting amount and keyword,
Also include arbitrarily needing the data of sequence.
Due to line unit value (rowkey) sortord that HBase data places have, when cutting pending data evidence to HBase
During multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to line unit value sequence
Mode is ranked up.
Reading unit 20, for reading the ranking results of each clustered node, obtains multiple ranking results, wherein, often
After individual clustered node execution line unit value sortord is ranked up to cutting data, a ranking results are obtained.
Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value,
To a ranking results, the ranking results being successively read in each clustered node, multiple ranking results are obtained.Need
Illustrate, multiple ranking results are stored in the multiple clustered nodes in HBase databases, for example, when cutting number
Can be webpage that visit capacity is 10000 most visit capacities 1000 in clustered node a during according to visit capacity for certain website
Ranking results, can be the ranking results that visit capacity is 999 most webpages of visit capacity 100 in clustered node b,
Can be ranking results that visit capacity is the webpage that 99 to visit capacity is 0 in clustered node c.Wherein, to cutting data
After carrying out overall sequence, the quantity of the multiple ranking results for obtaining can be actually needed to be chosen according to user.
Determining unit 30, for determining that the collection of multiple ranking results is combined into the ranking results of pending data evidence.
In the embodiment of the present application, the multiple clustered nodes by the way that pending data is located according to cutting to HBase databases
In, be capable of the line unit value sortord of auto-sequencing because HBase databases have, therefore pending data according to cutting extremely
After multiple clustered nodes, automatically sequence is capable of achieving;Then the ranking results in multiple clustered nodes are read, will be every
The cutting data for completing that sorted in individual clustered node carry out again overall sequence according to line unit value sortord, obtain multiple
Ranking results, wherein, the set of multiple ranking results is the ranking results of pending data evidence, and the application adopts HBase
The sortord of database row key assignments is eliminated to be needed the pending data in each clustered node according to merging in prior art
The link that sorting data is ranked up can be just treated afterwards, the purpose for shortening the data sorting time has been reached, so as to realize
To the pending data in each database according to merging the technique effect that can be achieved with data sorting, and then need not solve
The low technical problem of data sorting efficiency in prior art of having determined, improves the performance of data sorting.
Alternatively, reading unit 20 includes sequence subelement, wherein:
Sequence subelement, for clustered node Ai cutting of the line unit value sortord to cutting to clustered node Ai is performed
Data Di are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is to collect in HBase databases
The quantity of group node, clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase databases, cutting
Data D1 to cutting data Dn constitute pending data evidence.
Determining unit 30 includes storing sub-units, wherein, storing sub-units are used for clustered node Ai by cutting data Di
Data key values to storing to HBase databases, obtain the ranking results of pending data evidence, wherein, cutting data Di
Data key values key-value pair that the data total amount of the mark for cutting data Di and cutting data Di is constituted.
Specifically, when cutting pending data is according to clustered node A1 in HBase databases to clustered node An, cut
Divided data Di is unordered in the form of data key values pair to be stored in each clustered node.Then by cutting data Di
Data key values pair, are ranked up according to line unit value sortord, each clustered node according to line unit value sortord it
Afterwards, a ranking results Ri is obtained.It is assumed that cutting data Di can be the access of multiple websites in a certain period
Amount, if domain name is 10000 for the visit capacity of the website of www.baidu.com, domain name www.baidu.com of the website
With the composition data key-value pair of visit capacity 10000, the data key values are to being expressed as (www.baidu.com 10000);Together
Sample ground, if domain name is 1000 for the visit capacity of the website of www.google.com, the domain name of the website
Www.google.com and the composition data key-value pair of visit capacity 1000, are expressed as (www.google.com 1000).
It should be noted that during above-mentioned pending data is according to cutting, user can be according to the surplus of each clustered node
Remaining memory space is determining the data volume of cutting data Di being sliced in each clustered node.
The data key values pair of cutting data Di in ranking results Ri are read respectively, and according to fixed form by cutting
The data key values of data Di are to storing into HBase databases.It should be noted that illustrating in the foregoing description
When cutting data Di be website in the visit capacity of a certain period when, data key values to can be the website domain name (i.e.
The mark of cutting data Di) and the website a certain period visit capacity (i.e. the data total amounts of cutting data Di) group
Into.Similarly, if cutting data are the volumes of searches of a certain keyword, the data key values are to being keyword
(i.e. the marks of cutting data Di) and volumes of searches of the keyword in a certain period are the data total amount of cutting data Di)
Composition.To sum up describe, the data key values of cutting data Di are to the mark by cutting data Di and cutting data Di
Data total amount composition.
When cutting data Di are certain website visit capacity in a certain amount of time, to the cutting data in clustered node Ai
When Di carries out overall sequence, need to be by the data key values of cutting data Di after overall sequence to orderly storage
In multiple clustered nodes into HBase databases, the ranking results of multiple clustered node compositions are pending data evidence
Ranking results.Where it is assumed that there is 4 clustered nodes, visit capacity can be stored in clustered node 1 is
100000~10000 cutting data, can store the cutting data that visit capacity is 9999~1000 in clustered node 2,
Clustered node 3 can store the cutting data that visit capacity is 999~100, and clustered node 4 can store visit capacity for 99~0
Cutting data.Clustered node 1 is the clustered node in HBase databases to clustered node 4, wherein, cluster
The quantity of node can be actually needed to choose according to user.
Alternatively, storing sub-units include enquiry module, the first memory module and the second memory module, wherein:
Enquiry module, for inquiring about the line unit value whether HBase databases have stored cutting data Di, wherein, line unit
It is worth the negative of the data total amount for cutting data Di;First memory module, cuts for storing in HBase databases
In the case of the line unit value of divided data Di, cutting data Di are stored to first object row, wherein, first object row
In the affiliated row race being expert at by the line unit value of cutting data Di to any one row;Second memory module, for
In the case of the line unit value of cutting data Di is not stored in HBase databases, according to the row that HBase databases have been stored
Key assignments stores cutting data Di key-value pair.
Specifically, in the embodiment of the present application, using line unit value sortord to cutting data Di in clustered node Ai
It is ranked up, therefore, when cutting data Di in clustered node Ai are carried out into overall sequence, first obtain cutting number
According to line unit value rowkey of Di, and the line unit value for storing cutting data Di whether is inquired about in HBase databases
Rowkey, wherein, line unit value rowkey is the negative of the data key values centering data total amount of cutting data Di.
It is assumed that in cutting data Di the data key values of a certain cutting data to for (www.baidu.com 10000), then
The line unit value of the cutting data is -10000.If inquiring line unit value -10000 in HBase databases, will
Data key values are stored to HBase data to the domain name " www.baidu.com " in (www.baidu.com 10000)
Any family in storehouse, in the affiliated row race that the concrete position for storing is expert at by line unit value -10000.If
Line unit value -10000 are not inquired in HBase databases, then by compare the line unit value -10000 with store to
The line unit value -10000 is inserted into the size of the line unit value in HBase databases the specified collection in HBase databases
In group node.
For the cutting data Di storage form stored into database is rowkey:- 10000, family:f
column:www.baidu.com value:1, wherein, family is the mark of HBase physically separate datas,
Value is the row name of HBase, works as rowkey:- 10000 when storing to certain a line, family:F,
column:Www.baidu.com and value:1 stores appointing in the affiliated row race being expert to line unit value -10000
Meaning family, and family:F, column:Www.baidu.com and value:1 is stored in same row.
Alternatively, the second memory module includes comparison sub-module, insertion submodule, sub-module stored and renewal submodule,
Wherein:
Comparison sub-module, for the line unit value for comparing cutting data Di successively and the row stored into HBase databases
The size of key assignments;Insertion submodule, for the target being inserted into the line unit value of cutting data Di in HBase databases
OK, wherein, the lastrow that the next line or the second line unit value that goal behavior the first row key assignments is expert at is expert at, first
Line unit value and the second line unit value are the line unit value stored in HBase databases, and the first row key assignments is more than cutting number
According to the line unit value of Di, and the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value be less than
The line unit value of cutting data Di, and the line unit value minimum with the line unit value difference value of cutting data Di;Sub-module stored,
For cutting data Di to be stored to the second target column corresponding with target line, wherein, the second target is classified as target line
Any one row in affiliated row race;Submodule is updated, for updating the line unit value into HBase databases has been stored.
Specifically, if the line unit value stored in HBase databases is -100000, -50000 and -40000, do not deposit
Storage numerical value is -10000 line unit value, is respectively now -100000 with numerical value by the line unit value that numerical value is -10000,
- 50000 and -40000 line unit value is compared.
When visit capacity or volumes of searches for certain keyword of cutting data Di for website, user is actual to be more desirable to check
Visit capacity is the website of former, or checks the keyword that volumes of searches is former, now, in HBase databases
In in the form of negative preserving line unit value, so as to according to order arrangement line unit value from small to large, i.e., according to by greatly to
Little order arrangement visit capacity.
For example, it is respectively -100000, -50000 line units with -40000 with numerical value by the line unit value that numerical value is -10000
Value is compared, and by comparing ,-100000 ,-50000 and-40000-10000, and-40000 are respectively less than
Difference with -10000 is minimum, therefore -10000 are inserted in the next line that line unit value -40000 is expert at, wherein,
Line unit value -40000 is also the first row key assignments.Foregoing description understands, if visit capacity is stored in clustered node 1 being
100000~10000 cutting data, line unit value -40000 and -10000 should be stored in clustered node 1, because
This, the line unit value that numerical value is -10000 is inserted in the next line that the line unit value that numerical value is -40000 is expert at, and will
Numerical value is that cutting data Di of -10000 line unit value are stored to the institute of the line unit value place target line that numerical value is -10000
Any one row in Shu Lie races, wherein, numerical value is that -40000 line unit value is the first row key assignments.By above-mentioned sequence
Method, you can completing the visit capacity to website carries out descending sequence.
Again for example, if the line unit value stored in HBase databases is -5000, -4000 and -1000, number is not stored
It is worth the line unit value for-10000, by comparing ,-5000 ,-4000 and-1000-10000, and-10000 is all higher than
Difference with -5000 is minimum, therefore -10000 should be inserted in the lastrow that line unit value -5000 is expert at, wherein,
Line unit value -5000 is also the second line unit value.But, foregoing description understands, if storing visit capacity in clustered node 2
For 9999~1000 cutting data, now by the line unit value that numerical value is -10000 and the line unit value that numerical value is -5000 simultaneously
In being not stored in same clustered node.Therefore, numerical value is that -10000 line unit value should be stored in clustered node 1, by
It is the maximum row key assignments in node 1 in all line unit values in the line unit value that numerical value is -10000, therefore is by numerical value
- 10000 line unit value stores last column into clustered node 1.
Again for example, by comparing, it is -5000, -4000, -1000 and -500 that numerical value is stored in HBase databases
Line unit value, at this time, it may be necessary to insert the line unit value that numerical value is -2000, by will be -2000 respectively with -5000, -4000
It is compared with -1000 and understands, -2000 is more than -4000 and -5000, and the difference with -4000 is minimum, therefore,
The line unit value that numerical value is -2000 should be inserted in the next line that the line unit value that numerical value is -4000 is expert at;Or,
- 2000 are less than -1000 and -500, and the difference with -1000 is minimum, therefore the line unit value that numerical value is -2000 is inserted
Enter into -1000 lastrows being expert at, and it is -2000 that cutting data Di that line unit value is -2000 are stored to numerical value
Line unit value place target line affiliated row race in any one row, wherein, numerical value is that -4000 line unit value is
A line key assignments, numerical value is that -1000 line unit value is the second line unit value.
Alternatively, the data sorting device based on HBase databases that the application is provided also includes receiving unit and display
Unit, wherein:
Receiving unit, for receiving the query statement from user by the query interface in HBase databases, wherein,
Query statement is the cutting corresponding to the line unit value that inquiry has been stored into HBase databases between any two line unit value
The instruction of data;Display unit, for showing what is inquired in HBase databases in the way of adding and preset mark
Cutting data corresponding to line unit value.
HBase provides quite convenient query interface as database, can be inquired about by query interface and specify any
Cutting data in line unit value interval.It is -900 that line unit value such as can quickly be inquired very much for -1000 to line unit value
Between website domain name which has, and can be shown in the form of default mark, it is default to be designated addition background
The modification of color, font, and the form such as suspended bubble.
If not storing line unit value of the numerical value between -1000 to -900 in HBase databases, and store numerical value
When line unit value for -1000 and line unit value that numerical value is -900, when inquiry line unit value is -1000, the domain name of website has
Which, and inquire about line unit value for -900 when, which the domain name of website has.
If again line unit value of the numerical value between -1000 to -900 is not stored in HBase databases, number is not stored yet
When being worth line unit value and the line unit value that numerical value is -900 for -1000, then information can be ejected, to point out user " to look into
The data of inquiry are not present ", empty data can also be shown, to show not storing the line unit value in HBase databases.
If again the line unit value that numerical value is -1000 line unit values and numerical value is -900 is not stored in HBase databases, and
When storing line unit value of the numerical value between -1000 and -900, the line unit value of such as storage is -950 line unit value, then
When inquiry line unit value is -950, which the domain name of website has.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, other can be passed through
Mode realize.Wherein, device embodiment described above is only schematic, such as division of described unit,
Can be a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing
Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute
The coupling each other for showing or discussing or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized and as independent production marketing or when using using in the form of SFU software functional unit,
During a computer read/write memory medium can be stored in.Based on such understanding, the technical scheme essence of the application
On all or part of prior art is contributed part in other words or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of data reordering method based on HBase databases, it is characterised in that include:
By in multiple clustered nodes of the pending data according to cutting to the HBase databases, wherein, described in each
Clustered node is performed both by the line unit value sortord of the HBase databases after cutting data are obtained;
The ranking results of each clustered node are read, multiple ranking results are obtained, wherein, each institute
State clustered node to perform after the line unit value sortord is ranked up cutting data, obtain a row
Sequence result;And
The collection for determining multiple ranking results is combined into the ranking results of the pending data evidence.
2. method according to claim 1, it is characterised in that each described clustered node performs the line unit value row
After sequential mode is ranked up to cutting data, obtaining the ranking results includes:
Clustered node Ai performs cutting data of the line unit value sortord to cutting to the clustered node Ai
Di is ranked up, and obtains ranking results Ri, wherein, i takes successively 1 to n, and n is the HBase databases
The quantity of middle clustered node, clustered node A1 to clustered node An constitutes multiple collection of the HBase databases
Group node, cutting data D1 to cutting data Dn constitute the pending data evidence,
Determine that the collection of multiple ranking results is combined into the ranking results of the pending data evidence and includes:The cluster
The data key values of cutting data Di to storing to the HBase databases, are obtained described treating by node Ai
The ranking results of sorting data, wherein, the data key values of cutting data Di are to for cutting data Di
Mark and cutting data Di data total amount constitute key-value pair.
3. method according to claim 2, it is characterised in that the clustered node Ai is by cutting data Di
Data key values include to storing to the HBase databases:
The line unit value whether the HBase databases have stored cutting data Di is inquired about, wherein, it is described to cut
The line unit value of divided data Di is the negative of the data total amount of cutting data Di;
In the case of having stored the line unit value of cutting data Di in the HBase databases, cut described
Divided data Di is stored to first object row, wherein, the first object is classified as the line unit of cutting data Di
Any one row in the affiliated row race that value is expert at;And
In the case of the line unit value of cutting data Di is not stored in the HBase databases, according to described
The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.
4. method according to claim 3, it is characterised in that described cutting is not stored in the HBase databases
In the case of the line unit value of divided data Di, cut according to the line unit value storage that the HBase databases have been stored
Divided data Di key-value pair includes:
Compare the line unit value and the line unit value stored into the HBase databases of cutting data Di successively
Size;
The line unit value of cutting data Di is inserted into into the target line in the HBase databases, wherein, institute
State the next line that goal behavior the first row key assignments is expert at or the lastrow that the second line unit value is expert at, described first
Line unit value and the second line unit value are the line unit value stored in the HBase databases, the first row key assignments
It is less than the line unit value of cutting data Di and minimum with the line unit value difference value of cutting data Di
Line unit value, the second line unit value be more than the line unit value of cutting data Di, and with the cutting data
The minimum line unit value of the line unit value difference value of Di;
Cutting data Di are stored to the second target column corresponding with the target line, wherein, described second
Target is classified as any one row in the affiliated row race of the target line;And
Renewal has stored the line unit value into the HBase databases.
5. method according to claim 1, it is characterised in that methods described also includes:
Query statement from user is received by the query interface in the HBase databases, wherein, it is described
The line unit value that query statement has been stored into the HBase databases between any two line unit value by inquiry is right
The instruction of the cutting data answered;And
Show corresponding to the line unit value for inquiring in the HBase databases in the way of to add default mark
Cutting data.
6. a kind of data sorting device based on HBase databases, it is characterised in that include:
Cutting unit, for the multiple clustered nodes by pending data according to cutting to the HBase databases in,
Wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained
Sortord;
Reading unit, for reading the ranking results of each clustered node, obtains multiple ranking results,
Wherein, each described clustered node is performed after the line unit value sortord is ranked up to cutting data,
To the ranking results;And
Determining unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence.
7. device according to claim 6, it is characterised in that the reading unit includes:
Sequence subelement, the line unit value sortord is performed to cutting to the cluster section for clustered node Ai
Cutting data Di of point Ai are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is institute
The quantity of clustered node in HBase databases is stated, clustered node A1 to clustered node An constitutes the HBase
Multiple clustered nodes of database, cutting data D1 to cutting data Dn constitute the pending data evidence,
The determining unit includes:Storing sub-units, for the clustered node Ai by cutting data Di
Data key values to storing to the HBase databases, obtain the ranking results of the pending data evidence, wherein,
The data key values of cutting data Di are to the mark for cutting data Di and the number of cutting data Di
According to the key-value pair that total amount is constituted.
8. device according to claim 7, it is characterised in that the storing sub-units include:
Enquiry module, for inquiring about the line unit value whether the HBase databases have stored cutting data Di,
Wherein, the line unit value of cutting data Di is the negative of the data total amount of cutting data Di;
First memory module, for having stored the line unit value of cutting data Di in the HBase databases
In the case of, cutting data Di are stored to first object row, wherein, the first object is classified as described
In the affiliated row race that the line unit value of cutting data Di is expert to any one row;And
Second memory module, for not storing the line unit value of cutting data Di in the HBase databases
In the case of, the cutting data D i key-value pairs are stored according to the line unit value that the HBase databases have been stored.
9. device according to claim 8, it is characterised in that second memory module includes:
Comparison sub-module, for comparing the line unit value of cutting data Di successively and storing to the HBase
The size of the line unit value in database;
Insertion submodule, for the line unit value of cutting data Di to be inserted in the HBase databases
Target line, wherein, what the next line or the second line unit value that the goal behavior the first row key assignments is expert at was expert at
Lastrow, the first row key assignments and the second line unit value are the line unit value stored in the HBase databases,
The first row key assignments be more than the line unit value of cutting data Di, and with the row of cutting data Di
The minimum line unit value of key assignments difference, the second line unit value be less than the line unit value of cutting data Di, and
The line unit value minimum with the line unit value difference value of cutting data Di;
Sub-module stored, for cutting data Di to be stored to the second target column corresponding with the target line,
Wherein, second target is classified as any one row in the affiliated row race of the target line;And
Submodule is updated, for updating the line unit value into the HBase databases has been stored.
10. device according to claim 6, it is characterised in that described device also includes:
Receiving unit, for being referred to from the inquiry of user by the query interface reception in the HBase databases
Order, wherein, the query statement for inquiry stored into the HBase databases any two line unit value it
Between line unit value corresponding to cutting data instruction;And
Display unit, for showing what is inquired in the HBase databases in the way of adding and preset mark
Cutting data corresponding to line unit value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510733850.7A CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510733850.7A CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649385A true CN106649385A (en) | 2017-05-10 |
CN106649385B CN106649385B (en) | 2019-12-03 |
Family
ID=58809823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510733850.7A Expired - Fee Related CN106649385B (en) | 2015-11-02 | 2015-11-02 | Data reordering method and device based on HBase database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649385B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819828A (en) * | 2017-10-16 | 2018-03-20 | 平安科技(深圳)有限公司 | Data transmission method, device, computer equipment and storage medium |
CN108733790A (en) * | 2018-05-11 | 2018-11-02 | 广州虎牙信息科技有限公司 | Data reordering method, device, server and storage medium |
CN112925809A (en) * | 2021-02-24 | 2021-06-08 | 浙江大华技术股份有限公司 | Data storage method, device and system |
CN113254488A (en) * | 2020-08-05 | 2021-08-13 | 深圳市汉云科技有限公司 | Data sorting method and system of distributed database |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942108A (en) * | 2014-04-25 | 2014-07-23 | 四川大学 | Resource parameter optimization method under Hadoop homogenous cluster |
CN103995827A (en) * | 2014-04-10 | 2014-08-20 | 北京大学 | High-performance ordering method for MapReduce calculation frame |
US20140359635A1 (en) * | 2013-05-31 | 2014-12-04 | International Business Machines Corporation | Processing data by using simultaneous multithreading |
US20150160884A1 (en) * | 2013-12-09 | 2015-06-11 | Vmware, Inc. | Elastic temporary filesystem |
-
2015
- 2015-11-02 CN CN201510733850.7A patent/CN106649385B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140359635A1 (en) * | 2013-05-31 | 2014-12-04 | International Business Machines Corporation | Processing data by using simultaneous multithreading |
US20150160884A1 (en) * | 2013-12-09 | 2015-06-11 | Vmware, Inc. | Elastic temporary filesystem |
CN103995827A (en) * | 2014-04-10 | 2014-08-20 | 北京大学 | High-performance ordering method for MapReduce calculation frame |
CN103942108A (en) * | 2014-04-25 | 2014-07-23 | 四川大学 | Resource parameter optimization method under Hadoop homogenous cluster |
Non-Patent Citations (2)
Title |
---|
CHARLIEQIAO: ""Hadoop中TeraSort算法分析"", 《百度文库》 * |
傅杰 等: ""一种周期性MapReduce作业的负载均衡策略"", 《计算机科学》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819828A (en) * | 2017-10-16 | 2018-03-20 | 平安科技(深圳)有限公司 | Data transmission method, device, computer equipment and storage medium |
CN107819828B (en) * | 2017-10-16 | 2020-03-10 | 平安科技(深圳)有限公司 | Data transmission method and device, computer equipment and storage medium |
CN108733790A (en) * | 2018-05-11 | 2018-11-02 | 广州虎牙信息科技有限公司 | Data reordering method, device, server and storage medium |
CN108733790B (en) * | 2018-05-11 | 2021-07-02 | 广州虎牙信息科技有限公司 | Data sorting method, device, server and storage medium |
CN113254488A (en) * | 2020-08-05 | 2021-08-13 | 深圳市汉云科技有限公司 | Data sorting method and system of distributed database |
CN112925809A (en) * | 2021-02-24 | 2021-06-08 | 浙江大华技术股份有限公司 | Data storage method, device and system |
Also Published As
Publication number | Publication date |
---|---|
CN106649385B (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10621174B2 (en) | Partitioned joins in analytical databases | |
EP2924594B1 (en) | Data encoding and corresponding data structure in a column-store database | |
EP0378038A2 (en) | Partitioning of sorted lists for multiprocessor sort and merge | |
CN104408169B (en) | Dimension querying method and device based on Multidimensional Expressions language | |
CN106649385A (en) | Data ranking method and device based on HBase database | |
EP1351165A2 (en) | Stratified sampling of data in a database system | |
EP3654195A1 (en) | Index data storage and retrieval methods and apparatuses, and storage medium | |
AU2018354550B2 (en) | Systems and methods for intelligently grouping financial product users into cohesive cohorts | |
US9305076B1 (en) | Flattening a cluster hierarchy tree to filter documents | |
US20110238677A1 (en) | Dynamic Sort-Based Parallelism | |
Purdom et al. | Backtracking with multi-level dynamic search rearrangement | |
CN108733790B (en) | Data sorting method, device, server and storage medium | |
CN112085644B (en) | Multi-column data ordering method and device, readable storage medium and electronic equipment | |
CN112860917A (en) | Method, device and equipment for processing data of goods to be warehoused and storage medium | |
CN103116641B (en) | Obtain method and the collator of the statistics of sequence | |
US20070239663A1 (en) | Parallel processing of count distinct values | |
CN114218263B (en) | Materialized view automatic creation method and materialized view based quick query method | |
CN104794237B (en) | web information processing method and device | |
CN106209614A (en) | A kind of net packet classifying method and device | |
CN116611769B (en) | Order aggregation method, order aggregation device, computer equipment and storage medium | |
US7647592B2 (en) | Methods and systems for assigning objects to processing units | |
CN109213751A (en) | Oracle database parallel migration technology based on Spark platform | |
CN109857856B (en) | Text retrieval sequencing determination method and system | |
CN107943989B (en) | Module recommendation device and method based on software as a service (SaaS) platform | |
CN106202412A (en) | Data retrieval method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191203 |