CN106649385A

CN106649385A - Data ranking method and device based on HBase database

Info

Publication number: CN106649385A
Application number: CN201510733850.7A
Authority: CN
Inventors: 陈克凡
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2015-11-02
Filing date: 2015-11-02
Publication date: 2017-05-10
Anticipated expiration: 2035-11-02
Also published as: CN106649385B

Abstract

The invention discloses a data ranking method and a device based on a HBase database. The method comprises the steps that the to-be-ranked data are segmented into multiple cluster nodes in HBase database, wherein each cluster node after being given the segmented data executes a row-key value ranking method in HBase database; the ranked results for each cluster node are read, and multiple ranking results are obtained, wherein each cluster node after executing the row-key value ranking method and performing rankings on the segmented data obtains a ranked result; the collections of multiple ranking results are determined as the ranking results of the to-be-ranked data. The technical problem that in the prior art the efficiency of data ranking is low is solved.

Description

Data reordering method and device based on HBase databases

Technical field

The application is related to computer realm, in particular to a kind of data reordering method based on HBase databases And device.

Background technology

The sequence of data is more applied at present in big data statistics, for example, website visiting amount is ranked up, can be with The maximum website rank order of visit capacity is checked, some decision-makings are then carried out.Wherein, the data in small data quantity are carried out In the case of sequence, using current many quick sort algorithms, sequence will be a very simple thing.But After data volume arrives greatly certain rank, simple sequence originally also becomes complexity.For example, when the number for having 100G When according to sequence is needed, system simply cannot read in internal memory and then carry out one-machine sequencing data, because not having Have any server to have the internal memory of 100G, even if there is the server of this very large memory, also can never with it Full memory is being ranked up.

Sortord constructed by current Technical Architecture, can be by existing framework come the task of data sorting point Cloth goes to calculate on each node of cluster.I.e. the data cutting of 100G, in being distributed to each node of cluster, By the United Dispatching of framework, data are read on each node, be then ranked up calculating, finally by each node Upper sorted result is merged, and then overall result is exported in file system.

Above-mentioned sortord has two problems：

First problem is the merging of result：Sorted ranking results need to merge on each node.Because Data are in itself unordered, therefore are also that milli is irregular between the ranking results of each node, and such problem is exactly Or merging process is very slow, or needing to introduce new distribution mechanism, for the ranking results data to be merged are entered Minor sort and merging again after the relatively orderly distribution again of row, regardless of whether any scheme all can be relatively slower.

Second Problem is checking for result：After i.e. whole data set all sorts, needs preserve into file, and this Sample has resulted in the inconvenience for checking ranking results, it is impossible to quickly check the ranking results for being arbitrarily designated interval.

For above-mentioned problem, effective solution is not yet proposed at present.

The content of the invention

The embodiment of the present application provides a kind of data reordering method and device based on HBase databases, at least to solve The low technical problem of data sorting efficiency in prior art.

According to the one side of the embodiment of the present application, there is provided a kind of data reordering method based on HBase databases, The method includes：By in multiple clustered nodes of the pending data according to cutting to the HBase databases, wherein, each The clustered node is performed both by the line unit value sortord of the HBase databases after cutting data are obtained；Read The ranking results of each clustered node, obtain multiple ranking results, wherein, each described clustered node is held After the row line unit value sortord is ranked up to cutting data, the ranking results are obtained；And determine The collection of multiple ranking results is combined into the ranking results of the pending data evidence.

Further, each described clustered node is performed after the line unit value sortord is ranked up to cutting data, Obtaining the ranking results includes：Clustered node Ai performs the line unit value sortord to cutting to the collection Cutting data Di of group node Ai are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is institute The quantity of clustered node in HBase databases is stated, clustered node A1 to clustered node An constitutes the HBase data Multiple clustered nodes in storehouse, cutting data D1 to cutting data Dn constitute the pending data evidence, determine multiple described The collection of ranking results is combined into the ranking results of the pending data evidence to be included：The clustered node Ai is by the cutting data The data key values of Di obtain the ranking results of the pending data evidence to storing to the HBase databases, wherein, The data key values of cutting data Di are total to being the mark of cutting data Di and the data of cutting data Di The key-value pair of amount composition.

Further, the clustered node Ai by the data key values of cutting data Di to storing to the HBase Database includes：The line unit value whether the HBase databases have stored cutting data Di is inquired about, wherein, institute The line unit value for stating cutting data Di is the negative of the data total amount of cutting data Di；In the HBase databases In the case of inside having stored the line unit value of cutting data Di, cutting data Di are stored to first object row, Wherein, the first object is classified as any one row in the affiliated row race that the line unit value of cutting data Di is expert at； And in the case of the line unit value of cutting data Di is not stored in the HBase databases, according to described The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.

Further, in the case of the line unit value of cutting data Di is not stored in the HBase databases, root The line unit value stored according to the HBase databases stores the cutting data Di key-value pair to be included：It is relatively more described successively The size of the line unit value of cutting data Di and the line unit value stored into the HBase databases；By the cutting number The target line in the HBase databases is inserted into according to the line unit value of Di, wherein, the goal behavior the first row key assignments The lastrow that the next line or the second line unit value being expert at is expert at, the first row key assignments and the second line unit value are institute State the line unit value stored in HBase databases, the first row key assignments be less than the line unit value of cutting data Di, And the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is more than the cutting The line unit value of data Di, and the line unit value minimum with the line unit value difference value of cutting data Di；By the cutting Data Di are stored to the second target column corresponding with the target line, wherein, second target is classified as the target line Affiliated row race in any one row；And renewal has stored the line unit value into the HBase databases.

Further, methods described also includes：Received from user by the query interface in the HBase databases Query statement, wherein, the query statement for inquiry stored any two line unit into the HBase databases The instruction of the cutting data corresponding to line unit value between value；And in the HBase in the way of adding and preset mark The cutting data corresponding to the line unit value for inquiring are shown in database.

According to the another aspect of the embodiment of the present application, a kind of data sorting device based on HBase databases is additionally provided, The device includes：Cutting unit, for by pending data according to cutting to the HBase databases multiple clustered nodes In, wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained Sortord；Reading unit, for reading the ranking results of each clustered node, obtains multiple sequence knots Really, wherein, each described clustered node performed after the line unit value sortord is ranked up to cutting data, To the ranking results；And determining unit, for determining that the collection of multiple ranking results is combined into the row for the treatment of The ranking results of ordinal number evidence.

Further, the reading unit includes：Sequence subelement, for clustered node Ai the line unit value row is performed Sequential mode is ranked up to cutting to cutting data Di of the clustered node Ai, obtains ranking results Ri, wherein, I takes successively 1 to n, and n is the quantity of clustered node in the HBase databases, and clustered node A1 is to clustered node An constitutes multiple clustered nodes of the HBase databases, the row for the treatment of described in cutting data D1 to cutting data Dn composition Ordinal number evidence, the determining unit includes：Storing sub-units, for the clustered node Ai by cutting data Di Data key values to storing to the HBase databases, obtain the ranking results of the pending data evidence, wherein, institute The data key values of cutting data Di are stated to the mark for cutting data Di and the data total amount of cutting data Di The key-value pair of composition.

Further, the storing sub-units include：Enquiry module, for whether to inquire about the HBase databases The line unit value of cutting data Di is stored, wherein, the line unit value of cutting data Di is cutting data Di Data total amount negative；First memory module, for having stored the cutting data in the HBase databases In the case of the line unit value of Di, cutting data Di are stored to first object row, wherein, the first object Be classified as in the affiliated row race that the line unit value of cutting data Di is expert to any one row；And second memory module, For in the case of the line unit value that cutting data Di are not stored in the HBase databases, according to described The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.

Further, second memory module includes：Comparison sub-module, for comparing cutting data Di successively Line unit value and the size of line unit value stored into the HBase databases；Insertion submodule, for will be described The line unit value of cutting data Di is inserted into the target line in the HBase databases, wherein, the goal behavior first The lastrow that the next line or the second line unit value that line unit value is expert at is expert at, the first row key assignments and the second line unit value The line unit value stored in the HBase databases is, the first row key assignments is more than cutting data Di Line unit value, and the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is little Line unit value in the line unit value of cutting data Di and minimum with the line unit value difference value of cutting data Di； Sub-module stored, for cutting data Di to be stored to the second target column corresponding with the target line, wherein, Second target is classified as any one row in the affiliated row race of the target line；And submodule is updated, for updating The line unit value into the HBase databases has been stored.

Further, described device also includes：Receiving unit, for being connect by the inquiry in the HBase databases Mouth receives the query statement from user, wherein, the query statement has been stored to the HBase databases for inquiry The instruction of the cutting data corresponding to line unit value between middle any two line unit value；And display unit, for adding Plus the mode of default mark shows the cutting data corresponding to the line unit value for inquiring in the HBase databases.

In the embodiment of the present application, using by pending data according to cutting to the HBase databases multiple clustered nodes In, wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained Sortord；The ranking results of each clustered node are read, multiple ranking results are obtained, wherein, each The clustered node is performed after the line unit value sortord is ranked up to cutting data, obtains a sequence As a result；And determine that the collection of multiple ranking results is combined into the mode of the ranking results of the pending data evidence, pass through In multiple clustered nodes that pending data is located according to cutting to HBase databases, because HBase databases have energy The line unit value sortord of enough auto-sequencings, therefore after pending data is according to cutting to multiple clustered nodes, be capable of achieving certainly Dynamic sequence；Then the ranking results in multiple clustered nodes are read, the cutting of completing of having sorted in each clustered node Divided data carries out again overall sequence according to line unit value sortord, obtains multiple ranking results, wherein, multiple sequence knots The set of fruit is the ranking results of pending data evidence, and the application is saved using the sortord of HBase database row key assignments Omited needs the pending data in each clustered node to be carried out according to can just treat sorting data after merging in prior art The link of sequence, reached shorten the data sorting time purpose, it is achieved thereby that need not to each database in treat Sorting data is merged and can be achieved with the technique effect of data sorting, and then solves data sorting effect in prior art The low technical problem of rate, improves the performance of data sorting.

Description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing In：

Fig. 1 is a kind of flow chart of the data reordering method based on HBase databases according to the embodiment of the present application；With And

Fig. 2 is a kind of schematic diagram of the data sorting device based on HBase databases according to the embodiment of the present application.

Specific embodiment

In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment The only embodiment of the application part, rather than the embodiment of whole.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, all should belong to The scope of the application protection.

It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, " Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, and for example, contains process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or other steps intrinsic for these processes, method, product or equipment or unit.

First, description below is made to the technical term that the present embodiment is related to：

HBase be one it is distributed, towards row PostgreSQL database, the distributed memory system of a structural data. HBase is different from general relational database, and it is a database for being suitable for unstructured data storage.It is another HBase unlike individual is per-column rather than based on capable pattern.

Rowkey is the data unique mark of HBase.HBase preserves data and sorts according to rowkey, HBase inquiries Data are also based on rowkey, or direct access specifies the whole piece data of single rowkey, or scanning to start Rowkey is to the whole data interval for terminating rowkey.

Family is the mark of HBase physically separate datas, needs to be predefined when table is built, when most of Wait and only use single family, i.e., for same data line is not physically isolated, convenient inquiry.

Column is the row name of HBase, and HBase is non-structured database, that is, when HBase table is created not Pre-defined column is needed, can at any time be used and be added at any time.

Value is the data value that HBase is finally preserved, and by rowkey+family+column preservation can be found Value.

According to the embodiment of the present application, there is provided a kind of embodiment of the method for the data reordering method based on HBase databases, It should be noted that can be in the calculating of such as one group of computer executable instructions the step of the flow process of accompanying drawing is illustrated Perform in machine system, and, although show logical order in flow charts, but in some cases, can be with Shown or described step is performed different from order herein.

Fig. 1 is a kind of flow chart of the data reordering method based on HBase databases according to the embodiment of the present application, such as Shown in Fig. 1, the method comprises the steps S102 to step S106：

Step S102, by multiple clustered nodes of the pending data according to cutting to HBase databases, wherein, each collection Group node is performed both by the line unit value sortord of HBase databases after cutting data are obtained.

Specifically, pending data evidence can be visit capacity of certain website in a certain period, such as www.baidu.com's Visit capacity, visit capacity of www.google.comde etc., can also be certain keyword a certain period volumes of searches, example If " war of resistance is reviewed troops live " is in the volumes of searches of September 3 in 2015, " Beijing restricted driving " was in September 3 in 2015 Volumes of searches.It should be noted that pending data is according to the volumes of searches not just for above-mentioned website visiting amount and keyword, Also include arbitrarily needing the data of sequence.

Due to line unit value (rowkey) sortord that HBase data places have, when cutting pending data evidence to HBase During multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to line unit value sequence Mode is ranked up.

Step S104, reads the ranking results of each clustered node, obtains multiple ranking results, wherein, each cluster After node execution line unit value sortord is ranked up to cutting data, a ranking results are obtained.

Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value, To a ranking results, the ranking results being successively read in each clustered node, to multiple ranking results.Need Bright, multiple ranking results are stored in the multiple clustered nodes in HBase databases, for example, when cutting data For certain website visit capacity when, can be webpage that 10000 to visit capacity is 1000 for visit capacity in clustered node a Ranking results, can be the ranking results that visit capacity is the webpage that 999 to visit capacity is 100 in clustered node b, collection Can be ranking results that visit capacity is the webpage that 99 to visit capacity is 0 in group node c.Wherein, cutting data are entered After the overall sequence of row, the quantity of the multiple ranking results for obtaining can be actually needed to be chosen according to user.

Step S106, the collection for determining multiple ranking results is combined into the ranking results of pending data evidence.It is distributed in multiple clusters The collection of the multiple ranking results in node is combined into the ranking results of pending data evidence.

In the embodiment of the present application, the multiple clustered nodes by the way that pending data is located according to cutting to HBase databases In, be capable of the line unit value sortord of auto-sequencing because HBase databases have, therefore pending data according to cutting extremely After multiple clustered nodes, automatically sequence is capable of achieving；Then the ranking results in multiple clustered nodes are read, will be every The cutting data for completing that sorted in individual clustered node carry out again overall sequence according to line unit value sortord, obtain multiple Ranking results, wherein, the set of multiple ranking results is the ranking results of pending data evidence, and the application adopts HBase The sortord of database row key assignments is eliminated to be needed the pending data in each clustered node according to merging in prior art The link that sorting data is ranked up can be just treated afterwards, the purpose for shortening the data sorting time has been reached, so as to realize To the pending data in each database according to merging the technique effect that can be achieved with data sorting, and then need not solve The low technical problem of data sorting efficiency in prior art of having determined, improves the performance of data sorting.

Alternatively, after each clustered node execution line unit value sortord is ranked up to cutting data, one is obtained Ranking results comprise the steps S1041：

Step S1041, clustered node Ai performs cutting data of the line unit value sortord to cutting to clustered node Ai Di is ranked up, and obtains ranking results Ri, wherein, i takes successively 1 to n, and n is cluster section in HBase databases The quantity of point, clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase databases, cutting data D1 to cutting data Dn constitute pending data evidence.

Specifically, when cutting pending data is according to clustered node A1 in HBase databases to clustered node An, cut Divided data Di is unordered in the form of data key values pair to be stored in each clustered node.Then by cutting data Di Data key values pair, are ranked up according to line unit value sortord, each clustered node according to line unit value sortord it Afterwards, a ranking results Ri is obtained.It is assumed that cutting data Di can be the access of multiple websites in a certain period Amount, if domain name is 10000 for the visit capacity of the website of www.baidu.com, domain name www.baidu.com of the website With the composition data key-value pair of visit capacity 10000, the data key values are to being expressed as (www.baidu.com 10000)；Together Sample ground, if domain name is 1000 for the visit capacity of the website of www.google.com, the domain name of the website Www.google.com and the composition data key-value pair of visit capacity 1000, are expressed as (www.google.com 1000).

It should be noted that during above-mentioned pending data is according to cutting, user can be according to the surplus of each clustered node Remaining memory space is determining the data volume of cutting data Di being sliced in each clustered node.

The collection for determining multiple ranking results is combined into the ranking results of pending data evidence including step S1061, clustered node Ai By the data key values of cutting data Di to storing to HBase databases, the ranking results of pending data evidence are obtained, wherein, The key assignments that the data key values of cutting data Di are constituted to the data total amount of the mark for cutting data Di and cutting data Di It is right.

Specifically, the data key values pair of cutting data Di in ranking results Ri are read respectively, and according to fixed lattice Formula is by the data key values of cutting data Di to storing into HBase databases.It should be noted that in above-mentioned steps Illustrate when cutting data Di be website in the visit capacity of a certain period when, data key values are to being the website Visit capacity (the i.e. data of cutting data Di of domain name (i.e. the marks of cutting data Di) and the website in a certain period Total amount) composition.Similarly, if cutting data are the volumes of searches of a certain keyword, the data key values pair can be with It is keyword (i.e. the marks of cutting data Di) and volumes of searches of the keyword in a certain period is cutting data Di Data total amount) composition.To sum up describe, the data key values of cutting data Di to the mark by cutting data Di and this cut The data total amount composition of divided data Di.

When cutting data Di are certain website visit capacity in a certain amount of time, to the cutting data in clustered node Ai When Di carries out overall sequence, need to be by the data key values of cutting data Di after overall sequence to orderly storage In multiple clustered nodes into HBase databases, the ranking results of multiple clustered node compositions are pending data evidence Ranking results.Where it is assumed that there is 4 clustered nodes, visit capacity can be stored in clustered node 1 is 100000～10000 cutting data, can store the cutting data that visit capacity is 9999～1000 in clustered node 2, Clustered node 3 can store the cutting data that visit capacity is 999～100, and clustered node 4 can store visit capacity for 99～0 Cutting data.Clustered node 1 is the clustered node in HBase databases to clustered node 4, wherein, cluster The quantity of node can be actually needed to choose according to user.

Alternatively, in step S1061 clustered node Ai by the data key values of cutting data Di to storing to HBase numbers Comprise the steps S1 to step S5 according to storehouse：

Step S1, inquires about the line unit value whether HBase databases have stored cutting data Di, wherein, cutting data The line unit value of Di is the negative of the data total amount of cutting data Di.

Step S3, in the case of having stored the line unit value of cutting data Di in HBase databases, by cutting data Di is stored to first object row, wherein, first object is classified as the affiliated row race that the line unit value of cutting data Di is expert at In to any one row.

Step S5, in the case of the line unit value of cutting data Di is not stored in HBase databases, according to HBase The line unit value storage cutting data Di key-value pair that database has been stored.

Specifically, in the embodiment of the present application, using line unit value sortord to cutting data Di in clustered node Ai It is ranked up, therefore, when cutting data Di in clustered node Ai are carried out into overall sequence, first obtain cutting number According to line unit value rowkey of Di, and the line unit value for storing cutting data Di whether is inquired about in HBase databases Rowkey, wherein, line unit value rowkey is the negative of the data key values centering data total amount of cutting data Di.

It is assumed that in cutting data Di the data key values of a certain cutting data to for (www.baidu.com 10000), then The line unit value of the cutting data is -10000.If inquiring line unit value -10000 in HBase databases, will Data key values are stored to HBase data to the domain name " www.baidu.com " in (www.baidu.com 10000) Any family in storehouse, in the affiliated row race that the concrete position for storing is expert at by line unit value -10000.If Line unit value -10000 are not inquired in HBase databases, then by compare the line unit value -10000 with store to The line unit value -10000 is inserted into the size of the line unit value in HBase databases the specified collection in HBase databases In group node.

For the cutting data Di storage form stored into database is rowkey:- 10000, family:f column:www.baidu.com value：1, wherein, family is the mark of HBase physically separate datas, Value is the row name of HBase, works as rowkey:- 10000 when storing to certain a line, family:F, column:Www.baidu.com and value：1 stores appointing in the affiliated row race being expert to line unit value -10000 Meaning family, and family:F, column:Www.baidu.com and value：1 is stored in same row.

Further, in step S5, in the case of the line unit value of cutting data Di is not stored in HBase databases, Storing cutting data Di key-value pair according to the line unit value that HBase databases have been stored comprises the steps S51 to step S57：

Step S51, the line unit value that cutting data Di are compared successively and the line unit value stored into HBase databases Size.

Step S53, the target line line unit value of cutting data Di being inserted in HBase databases, wherein, target The lastrow that the next line or the second line unit value that behavior the first row key assignments is expert at is expert at, the first row key assignments and the second row Key assignments is the line unit value stored in HBase databases, and the first row key assignments is the line unit value less than cutting data Di, And the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value is more than the row of cutting data Di Key assignments, and the line unit value minimum with the line unit value difference value of cutting data Di.

Step S55, cutting data Di are stored to the second target column corresponding with target line, wherein, the second target column For any one row in the affiliated row race of target line.

Step S57, renewal has stored the line unit value into HBase databases.

Wherein, after the line unit value for having stored is updated every time, in next time according to the line unit value stored in HBase databases When not storing cutting data Di key-value pair, corresponding the first row key assignments and the second line unit value can be reaffirmed.For The key-value pair of one cutting data Di, if its first row key assignments and the second line unit value are all present, then the first line unit Value is less than the second line unit value.

Specifically, if the line unit value stored in HBase databases is -100000, -50000 and -40000, do not deposit Storage numerical value is -10000 line unit value, is respectively now -100000 with numerical value by the line unit value that numerical value is -10000, - 50000 and -40000 line unit value is compared.

When visit capacity or volumes of searches for certain keyword of cutting data Di for website, user is actual to be more desirable to check Visit capacity is the website of former, or checks the keyword that volumes of searches is former, now, in HBase databases In in the form of negative preserving line unit value, so as to according to order arrangement line unit value from small to large, i.e., according to by greatly to Little order arrangement visit capacity.

For example, it is respectively -100000, -50000 line units with -40000 with numerical value by the line unit value that numerical value is -10000 Value is compared, and by comparing ,-100000 ,-50000 and-40000-10000, and-40000 are respectively less than Difference with -10000 is minimum, therefore -10000 are inserted in the next line that line unit value -40000 is expert at, wherein, Line unit value -40000 is also the first row key assignments.Description in above-mentioned steps S1061 understands, if in clustered node 1 Storage visit capacity is 100000～10000 cutting data, and line unit value -40000 and -10000 should be stored in cluster In node 1, therefore, the line unit value that numerical value is -10000 is inserted under the line unit value that numerical value is -40000 is expert at In a line, and cutting data Di of the line unit value that numerical value is -10000 are stored to the line unit value institute that numerical value is -10000 Any one row in the affiliated row race of target line, wherein, numerical value is that -40000 line unit value is the first row key assignments. By above-mentioned sort method, you can completing the visit capacity to website carries out descending sequence.

Again for example, if the line unit value stored in HBase databases is -5000, -4000 and -1000, number is not stored It is worth the line unit value for-10000, by comparing ,-5000 ,-4000 and-1000-10000, and-10000 is all higher than Difference with -5000 is minimum, therefore should be inserted into lastrow (that is, the mesh that line unit value -5000 is expert at by -10000 Mark row) in, wherein, line unit value -5000 is also the second line unit value.But, retouching in above-mentioned steps S1061 State and understand, be now -10000 by numerical value if storing the cutting data that visit capacity is 9999～1000 in clustered node 2 Line unit value and line unit value that numerical value is -5000 be not stored in same clustered node.Therefore, numerical value is -10000 Line unit value should be stored in clustered node 1, because line unit value that numerical value is -10000 is all line unit values in node 1 In maximum row key assignments, therefore the line unit value that numerical value is -10000 is stored into into clustered node 1 last column (i.e., Target line).

Again for example, by comparing, it is -5000, -4000, -1000 and -500 that numerical value is stored in HBase databases Line unit value, at this time, it may be necessary to insert the line unit value that numerical value is -2000, by will be -2000 respectively with -5000, -4000 It is compared with -1000 and understands, -2000 is more than -4000 and -5000, and the difference with -4000 is minimum, therefore, The line unit value that numerical value is -2000 should be inserted into into the next line (that is, target line) that the line unit value that numerical value is -4000 is expert at In；Or, -2000 are less than -1000 and -500, and the difference with -1000 is minimum, therefore are -2000 by numerical value Line unit value be inserted in -1000 lastrows being expert at (that is, target line), and by cutting that line unit value is -2000 Data Di store any one row into the affiliated row race of the line unit value place target line that numerical value is -2000, wherein, number It is worth and is the first row key assignments for -4000 line unit value, numerical value is that -1000 line unit value is the second line unit value.

Cutting data Di in clustered node Ai are ranked up by above-mentioned sort method, it is not necessary to again by cluster section Cutting data Di in each clustered node in point Ai are integrated, and are by reading the line unit value of cutting data Di The quicksort of achievable cutting data Di.For example, in the embodiment of the present application, the domain name of www.baidu.com is made It is saved in the row that line unit value is -10000 for row name, if the visit capacity for having other websites is similarly 10000, can be with It is saved in any one idle row in the affiliated row race being expert at that line unit value is -10000, with There is no any conflict in www.baidu.com this row.In the embodiment of the present application, using the negative of visit capacity as row It is the convenient inquiry in order to the big data of visit capacity are come before in HBase that key assignments is preserved to HBase databases. Also, the sort method that the embodiment of the present application is provided, the pending data evidence larger for data volume, it is quick right to be capable of achieving Pending data is according to the effect being ranked up.

Alternatively, the sort method that the application is provided also comprises the steps S7 to step S9：

Step S7, by the query interface in HBase databases the query statement from user is received, wherein, inquiry Instruct to inquire about the instruction of the cutting data corresponding to having stored into HBase databases between any two line unit value.

Step S9, in the way of adding and preset mark the cutting data for inquiring are shown in HBase databases.

HBase provides quite convenient query interface as database, can be inquired about by query interface and specify any Cutting data in line unit value interval.It is -900 that line unit value such as can quickly be inquired very much for -1000 to line unit value Between website domain name which has, and can be shown in the form of default mark, it is default to be designated addition background The modification of color, font, and the form such as suspended bubble.

If not storing line unit value of the numerical value between -1000 to -900 in HBase databases, and store numerical value When line unit value for -1000 and line unit value that numerical value is -900, when inquiry line unit value is -1000, the domain name of website has Which, and inquire about line unit value for -900 when, which the domain name of website has.

If again line unit value of the numerical value between -1000 to -900 is not stored in HBase databases, number is not stored yet When being worth line unit value and the line unit value that numerical value is -900 for -1000, then information can be ejected, to point out user " to look into The data of inquiry are not present ", empty data can also be shown, to show not storing the line unit value in HBase databases.

If again the line unit value that numerical value is -1000 line unit values and numerical value is -900 is not stored in HBase databases, and When storing line unit value of the numerical value between -1000 and -900, the line unit value of such as storage is -950 line unit value, then When inquiry line unit value is -950, which the domain name of website has.

The embodiment of the present application additionally provides a kind of data sorting device based on HBase databases, the data sorting device It is mainly used in performing the data reordering method based on HBase databases that the embodiment of the present application the above is provided, with Under the embodiment of the present application is provided concrete introduction is done based on the data sorting device of HBase databases.

Fig. 2 is a kind of schematic diagram of the data sorting device based on HBase databases according to the embodiment of the present application.Such as Shown in Fig. 2, the data sorting device includes：Cutting unit 10, reading unit 20 and determining unit 30, wherein：

Cutting unit 10, for the multiple clustered nodes by pending data according to cutting to HBase databases in, wherein, Each clustered node is performed both by the line unit value sortord of HBase databases after cutting data are obtained.

Reading unit 20, for reading the ranking results of each clustered node, obtains multiple ranking results, wherein, often After individual clustered node execution line unit value sortord is ranked up to cutting data, a ranking results are obtained.

Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value, To a ranking results, the ranking results being successively read in each clustered node, multiple ranking results are obtained.Need Illustrate, multiple ranking results are stored in the multiple clustered nodes in HBase databases, for example, when cutting number Can be webpage that visit capacity is 10000 most visit capacities 1000 in clustered node a during according to visit capacity for certain website Ranking results, can be the ranking results that visit capacity is 999 most webpages of visit capacity 100 in clustered node b, Can be ranking results that visit capacity is the webpage that 99 to visit capacity is 0 in clustered node c.Wherein, to cutting data After carrying out overall sequence, the quantity of the multiple ranking results for obtaining can be actually needed to be chosen according to user.

Determining unit 30, for determining that the collection of multiple ranking results is combined into the ranking results of pending data evidence.

Alternatively, reading unit 20 includes sequence subelement, wherein：

Sequence subelement, for clustered node Ai cutting of the line unit value sortord to cutting to clustered node Ai is performed Data Di are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is to collect in HBase databases The quantity of group node, clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase databases, cutting Data D1 to cutting data Dn constitute pending data evidence.

Determining unit 30 includes storing sub-units, wherein, storing sub-units are used for clustered node Ai by cutting data Di Data key values to storing to HBase databases, obtain the ranking results of pending data evidence, wherein, cutting data Di Data key values key-value pair that the data total amount of the mark for cutting data Di and cutting data Di is constituted.

The data key values pair of cutting data Di in ranking results Ri are read respectively, and according to fixed form by cutting The data key values of data Di are to storing into HBase databases.It should be noted that illustrating in the foregoing description When cutting data Di be website in the visit capacity of a certain period when, data key values to can be the website domain name (i.e. The mark of cutting data Di) and the website a certain period visit capacity (i.e. the data total amounts of cutting data Di) group Into.Similarly, if cutting data are the volumes of searches of a certain keyword, the data key values are to being keyword (i.e. the marks of cutting data Di) and volumes of searches of the keyword in a certain period are the data total amount of cutting data Di) Composition.To sum up describe, the data key values of cutting data Di are to the mark by cutting data Di and cutting data Di Data total amount composition.

Alternatively, storing sub-units include enquiry module, the first memory module and the second memory module, wherein：

Enquiry module, for inquiring about the line unit value whether HBase databases have stored cutting data Di, wherein, line unit It is worth the negative of the data total amount for cutting data Di；First memory module, cuts for storing in HBase databases In the case of the line unit value of divided data Di, cutting data Di are stored to first object row, wherein, first object row In the affiliated row race being expert at by the line unit value of cutting data Di to any one row；Second memory module, for In the case of the line unit value of cutting data Di is not stored in HBase databases, according to the row that HBase databases have been stored Key assignments stores cutting data Di key-value pair.

Alternatively, the second memory module includes comparison sub-module, insertion submodule, sub-module stored and renewal submodule, Wherein：

Comparison sub-module, for the line unit value for comparing cutting data Di successively and the row stored into HBase databases The size of key assignments；Insertion submodule, for the target being inserted into the line unit value of cutting data Di in HBase databases OK, wherein, the lastrow that the next line or the second line unit value that goal behavior the first row key assignments is expert at is expert at, first Line unit value and the second line unit value are the line unit value stored in HBase databases, and the first row key assignments is more than cutting number According to the line unit value of Di, and the line unit value minimum with the line unit value difference value of cutting data Di, the second line unit value be less than The line unit value of cutting data Di, and the line unit value minimum with the line unit value difference value of cutting data Di；Sub-module stored, For cutting data Di to be stored to the second target column corresponding with target line, wherein, the second target is classified as target line Any one row in affiliated row race；Submodule is updated, for updating the line unit value into HBase databases has been stored.

For example, it is respectively -100000, -50000 line units with -40000 with numerical value by the line unit value that numerical value is -10000 Value is compared, and by comparing ,-100000 ,-50000 and-40000-10000, and-40000 are respectively less than Difference with -10000 is minimum, therefore -10000 are inserted in the next line that line unit value -40000 is expert at, wherein, Line unit value -40000 is also the first row key assignments.Foregoing description understands, if visit capacity is stored in clustered node 1 being 100000～10000 cutting data, line unit value -40000 and -10000 should be stored in clustered node 1, because This, the line unit value that numerical value is -10000 is inserted in the next line that the line unit value that numerical value is -40000 is expert at, and will Numerical value is that cutting data Di of -10000 line unit value are stored to the institute of the line unit value place target line that numerical value is -10000 Any one row in Shu Lie races, wherein, numerical value is that -40000 line unit value is the first row key assignments.By above-mentioned sequence Method, you can completing the visit capacity to website carries out descending sequence.

Again for example, if the line unit value stored in HBase databases is -5000, -4000 and -1000, number is not stored It is worth the line unit value for-10000, by comparing ,-5000 ,-4000 and-1000-10000, and-10000 is all higher than Difference with -5000 is minimum, therefore -10000 should be inserted in the lastrow that line unit value -5000 is expert at, wherein, Line unit value -5000 is also the second line unit value.But, foregoing description understands, if storing visit capacity in clustered node 2 For 9999～1000 cutting data, now by the line unit value that numerical value is -10000 and the line unit value that numerical value is -5000 simultaneously In being not stored in same clustered node.Therefore, numerical value is that -10000 line unit value should be stored in clustered node 1, by It is the maximum row key assignments in node 1 in all line unit values in the line unit value that numerical value is -10000, therefore is by numerical value - 10000 line unit value stores last column into clustered node 1.

Again for example, by comparing, it is -5000, -4000, -1000 and -500 that numerical value is stored in HBase databases Line unit value, at this time, it may be necessary to insert the line unit value that numerical value is -2000, by will be -2000 respectively with -5000, -4000 It is compared with -1000 and understands, -2000 is more than -4000 and -5000, and the difference with -4000 is minimum, therefore, The line unit value that numerical value is -2000 should be inserted in the next line that the line unit value that numerical value is -4000 is expert at；Or, - 2000 are less than -1000 and -500, and the difference with -1000 is minimum, therefore the line unit value that numerical value is -2000 is inserted Enter into -1000 lastrows being expert at, and it is -2000 that cutting data Di that line unit value is -2000 are stored to numerical value Line unit value place target line affiliated row race in any one row, wherein, numerical value is that -4000 line unit value is A line key assignments, numerical value is that -1000 line unit value is the second line unit value.

Alternatively, the data sorting device based on HBase databases that the application is provided also includes receiving unit and display Unit, wherein：

Receiving unit, for receiving the query statement from user by the query interface in HBase databases, wherein, Query statement is the cutting corresponding to the line unit value that inquiry has been stored into HBase databases between any two line unit value The instruction of data；Display unit, for showing what is inquired in HBase databases in the way of adding and preset mark Cutting data corresponding to line unit value.

Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.

In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed technology contents, other can be passed through Mode realize.Wherein, device embodiment described above is only schematic, such as division of described unit, Can be a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute The coupling each other for showing or discussing or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.

The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.

In addition, each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

If the integrated unit is realized and as independent production marketing or when using using in the form of SFU software functional unit, During a computer read/write memory medium can be stored in.Based on such understanding, the technical scheme essence of the application On all or part of prior art is contributed part in other words or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used so that one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And aforesaid storage medium includes：USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD Etc. it is various can be with the medium of store program codes.

The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims

1. a kind of data reordering method based on HBase databases, it is characterised in that include：

By in multiple clustered nodes of the pending data according to cutting to the HBase databases, wherein, described in each Clustered node is performed both by the line unit value sortord of the HBase databases after cutting data are obtained；

The ranking results of each clustered node are read, multiple ranking results are obtained, wherein, each institute State clustered node to perform after the line unit value sortord is ranked up cutting data, obtain a row Sequence result；And

The collection for determining multiple ranking results is combined into the ranking results of the pending data evidence.

2. method according to claim 1, it is characterised in that each described clustered node performs the line unit value row After sequential mode is ranked up to cutting data, obtaining the ranking results includes：

Clustered node Ai performs cutting data of the line unit value sortord to cutting to the clustered node Ai Di is ranked up, and obtains ranking results Ri, wherein, i takes successively 1 to n, and n is the HBase databases The quantity of middle clustered node, clustered node A1 to clustered node An constitutes multiple collection of the HBase databases Group node, cutting data D1 to cutting data Dn constitute the pending data evidence,

Determine that the collection of multiple ranking results is combined into the ranking results of the pending data evidence and includes：The cluster The data key values of cutting data Di to storing to the HBase databases, are obtained described treating by node Ai The ranking results of sorting data, wherein, the data key values of cutting data Di are to for cutting data Di Mark and cutting data Di data total amount constitute key-value pair.

3. method according to claim 2, it is characterised in that the clustered node Ai is by cutting data Di Data key values include to storing to the HBase databases：

The line unit value whether the HBase databases have stored cutting data Di is inquired about, wherein, it is described to cut The line unit value of divided data Di is the negative of the data total amount of cutting data Di；

In the case of having stored the line unit value of cutting data Di in the HBase databases, cut described Divided data Di is stored to first object row, wherein, the first object is classified as the line unit of cutting data Di Any one row in the affiliated row race that value is expert at；And

In the case of the line unit value of cutting data Di is not stored in the HBase databases, according to described The line unit value that HBase databases have been stored stores the cutting data Di key-value pair.

4. method according to claim 3, it is characterised in that described cutting is not stored in the HBase databases In the case of the line unit value of divided data Di, cut according to the line unit value storage that the HBase databases have been stored Divided data Di key-value pair includes：

Compare the line unit value and the line unit value stored into the HBase databases of cutting data Di successively Size；

The line unit value of cutting data Di is inserted into into the target line in the HBase databases, wherein, institute State the next line that goal behavior the first row key assignments is expert at or the lastrow that the second line unit value is expert at, described first Line unit value and the second line unit value are the line unit value stored in the HBase databases, the first row key assignments It is less than the line unit value of cutting data Di and minimum with the line unit value difference value of cutting data Di Line unit value, the second line unit value be more than the line unit value of cutting data Di, and with the cutting data The minimum line unit value of the line unit value difference value of Di；

Cutting data Di are stored to the second target column corresponding with the target line, wherein, described second Target is classified as any one row in the affiliated row race of the target line；And

Renewal has stored the line unit value into the HBase databases.

5. method according to claim 1, it is characterised in that methods described also includes：

Query statement from user is received by the query interface in the HBase databases, wherein, it is described The line unit value that query statement has been stored into the HBase databases between any two line unit value by inquiry is right The instruction of the cutting data answered；And

Show corresponding to the line unit value for inquiring in the HBase databases in the way of to add default mark Cutting data.

6. a kind of data sorting device based on HBase databases, it is characterised in that include：

Cutting unit, for the multiple clustered nodes by pending data according to cutting to the HBase databases in, Wherein, each described clustered node is performed both by the line unit value of the HBase databases after cutting data are obtained Sortord；

Reading unit, for reading the ranking results of each clustered node, obtains multiple ranking results, Wherein, each described clustered node is performed after the line unit value sortord is ranked up to cutting data, To the ranking results；And

Determining unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence.

7. device according to claim 6, it is characterised in that the reading unit includes：

Sequence subelement, the line unit value sortord is performed to cutting to the cluster section for clustered node Ai Cutting data Di of point Ai are ranked up, and obtain ranking results Ri, wherein, i takes successively 1 to n, and n is institute The quantity of clustered node in HBase databases is stated, clustered node A1 to clustered node An constitutes the HBase Multiple clustered nodes of database, cutting data D1 to cutting data Dn constitute the pending data evidence,

The determining unit includes：Storing sub-units, for the clustered node Ai by cutting data Di Data key values to storing to the HBase databases, obtain the ranking results of the pending data evidence, wherein, The data key values of cutting data Di are to the mark for cutting data Di and the number of cutting data Di According to the key-value pair that total amount is constituted.

8. device according to claim 7, it is characterised in that the storing sub-units include：

Enquiry module, for inquiring about the line unit value whether the HBase databases have stored cutting data Di, Wherein, the line unit value of cutting data Di is the negative of the data total amount of cutting data Di；

First memory module, for having stored the line unit value of cutting data Di in the HBase databases In the case of, cutting data Di are stored to first object row, wherein, the first object is classified as described In the affiliated row race that the line unit value of cutting data Di is expert to any one row；And

Second memory module, for not storing the line unit value of cutting data Di in the HBase databases In the case of, the cutting data D i key-value pairs are stored according to the line unit value that the HBase databases have been stored.

9. device according to claim 8, it is characterised in that second memory module includes：

Comparison sub-module, for comparing the line unit value of cutting data Di successively and storing to the HBase The size of the line unit value in database；

Insertion submodule, for the line unit value of cutting data Di to be inserted in the HBase databases Target line, wherein, what the next line or the second line unit value that the goal behavior the first row key assignments is expert at was expert at Lastrow, the first row key assignments and the second line unit value are the line unit value stored in the HBase databases, The first row key assignments be more than the line unit value of cutting data Di, and with the row of cutting data Di The minimum line unit value of key assignments difference, the second line unit value be less than the line unit value of cutting data Di, and The line unit value minimum with the line unit value difference value of cutting data Di；

Sub-module stored, for cutting data Di to be stored to the second target column corresponding with the target line, Wherein, second target is classified as any one row in the affiliated row race of the target line；And

Submodule is updated, for updating the line unit value into the HBase databases has been stored.

10. device according to claim 6, it is characterised in that described device also includes：

Receiving unit, for being referred to from the inquiry of user by the query interface reception in the HBase databases Order, wherein, the query statement for inquiry stored into the HBase databases any two line unit value it Between line unit value corresponding to cutting data instruction；And

Display unit, for showing what is inquired in the HBase databases in the way of adding and preset mark Cutting data corresponding to line unit value.