CN106649385B

CN106649385B - Data reordering method and device based on HBase database

Info

Publication number: CN106649385B
Application number: CN201510733850.7A
Authority: CN
Inventors: 陈克凡
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2015-11-02
Filing date: 2015-11-02
Publication date: 2019-12-03
Anticipated expiration: 2035-11-02
Also published as: CN106649385A

Abstract

This application discloses a kind of data reordering method and device based on HBase database.Wherein, this method comprises: by multiple clustered nodes of the pending data according to cutting to HBase database, wherein each clustered node is performed both by the line unit value sortord of HBase database after obtaining cutting data；The ranking results for reading each clustered node obtain multiple ranking results, wherein after each clustered node execution line unit value sortord is ranked up cutting data, obtain a ranking results；Determine that the collection of multiple ranking results is combined into the ranking results of pending data evidence.The technical issues of present application addresses data sorting low efficiencys in the prior art.

Description

Data reordering method and device based on HBase database

Technical field

This application involves computer field, in particular to a kind of data reordering method based on HBase database and Device.

Background technique

The sequence of data is more applied in big data statistics at present, such as is ranked up to website visiting amount, can be with It checks the maximum website rank order of amount of access, then carries out some decisions.Wherein, it is ranked up in the data of small data quantity In the case of, using current many quick sort algorithms, sequence will be a very simple thing.But when data volume arrives greatly After certain rank, simple sequence also becomes complexity originally.For example, when thering are the data of 100G to need to sort, Data simply can not be read in memory and then carry out one-machine sequencing by system, because having 100G without any server Memory can never be also ranked up with its full memory even if there is the server of this very large memory.

Sortord constructed by current Technical Architecture, can be by existing frame come the task of data sorting point It goes to calculate on cloth to each node of cluster.I.e. the data cutting of 100G, it is distributed in each node of cluster, passes through frame The United Dispatching of frame, reads data on each node, is then ranked up calculating, finally by knot sorted on each node Fruit merges, and then whole result is output in file system.

There are two problems for above-mentioned sortord:

First problem is the merging of result: sorted ranking results needs merge on each node.Because Data itself are unordered, therefore are also that milli is irregular, and such problems is exactly to merge between the ranking results of each node It process or very slowly or needs to introduce new distribution mechanism, combined ranking results data opposite have Minor sort and merging again after the distribution again of sequence, regardless of any scheme all can be relatively slower.

Second Problem is checking for result: after i.e. entire data set all sorts well, needs to save into file, and in this way It has resulted in checking the inconvenience of ranking results, can not quickly check the ranking results for being arbitrarily designated section.

For above-mentioned problem, currently no effective solution has been proposed.

Summary of the invention

The embodiment of the present application provides a kind of data reordering method and device based on HBase database, at least to solve In the prior art the technical issues of data sorting low efficiency.

According to the one aspect of the embodiment of the present application, a kind of data reordering method based on HBase database is provided, it should Method includes: in multiple clustered nodes by pending data according to cutting to the HBase database, wherein each cluster Node is performed both by the line unit value sortord of the HBase database after obtaining cutting data；Read each cluster section The ranking results of point, obtain multiple ranking results, wherein each clustered node executes the line unit value sortord After being ranked up to cutting data, the ranking results are obtained；And determine that the collection of multiple ranking results is combined into The ranking results of the pending data evidence.

Further, after each clustered node execution line unit value sortord is ranked up cutting data, Obtaining the ranking results includes: that clustered node Ai executes the line unit value sortord to cutting to the cluster section The cutting data Di of point Ai is ranked up, and obtains ranking results Ri, wherein it is in the HBase database that i, which successively takes 1 to n, n, The quantity of clustered node, clustered node A1 to clustered node An constitute multiple clustered nodes of the HBase database, cutting number The pending data evidence is constituted according to D1 to cutting data Dn, determines that the collection of multiple ranking results is combined into the pending data evidence Ranking results include: the clustered node Ai by the data key values of the cutting data Di to storing to the HBase data Library obtains the ranking results of the pending data evidence, wherein the data key values of the cutting data Di are to for the cutting data The key-value pair that the mark of Di and the total amount of data of the cutting data Di form.

Further, the clustered node Ai is by the data key values of the cutting data Di to storing to the HBase number It include: the line unit value whether the inquiry HBase database has stored the cutting data Di according to library, wherein the cutting number The negative for the total amount of data that line unit value according to Di is the cutting data Di；Described cut has been stored in the HBase database In the case where the line unit value of divided data Di, the cutting data Di is stored to first object and is arranged, wherein the first object column For any one column in the line unit value affiliated column family of the row of the cutting data Di；And in the HBase database In the case where the line unit value of the not stored cutting data Di, institute is stored according to the stored line unit value of the HBase database State cutting data Di key-value pair.

Further, in the HBase database in the case where line unit value of the not stored cutting data Di, according to It includes: the successively cutting number that the stored line unit value of the HBase database, which stores the cutting data Di key-value pair, According to the size of the line unit value and the line unit value stored into the HBase database of Di；By the line unit of the cutting data Di Value is inserted into the target line in the HBase database, wherein goal behavior the first row key assignments next line of the row or Second line unit value lastrow of the row, the first row key assignments and the second line unit value are to have deposited in the HBase database The line unit value of storage, the first row key assignments are the line unit value less than the cutting data Di, and with the cutting data Di's The smallest line unit value of line unit value difference value, the second line unit value are line unit value greater than the cutting data Di, and with it is described The smallest line unit value of line unit value difference value of cutting data Di；The cutting data Di is stored to corresponding with the target line Two target columns, wherein second target is classified as any one column in the affiliated column family of the target line；And it updates and has stored To the line unit value in the HBase database.

Further, the method also includes: received by query interface in the HBase database from the user Inquiry instruction, wherein the inquiry instruction is that inquiry has been stored into the HBase database between any two line unit value The instruction of cutting data corresponding to line unit value；And it is shown in the HBase database in a manner of adding default mark Cutting data corresponding to the line unit value inquired.

According to the another aspect of the embodiment of the present application, a kind of data sorting device based on HBase database is additionally provided, The device includes: cutting unit, for by multiple clustered nodes of the pending data according to cutting to the HBase database, In, each clustered node is performed both by the line unit value sortord of the HBase database after obtaining cutting data；It reads Unit is taken, for reading the ranking results of each clustered node, obtains multiple ranking results, wherein is each described After the clustered node execution line unit value sortord is ranked up cutting data, the ranking results are obtained；With And determination unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence.

Further, the reading unit includes: sorting subunit, executes the line unit value sequence for clustered node Ai Mode is ranked up the cutting data Di of cutting to the clustered node Ai, obtains ranking results Ri, wherein i successively take 1 to N, n are the quantity of clustered node in the HBase database, and clustered node A1 to clustered node An constitutes the HBase data Multiple clustered nodes in library, cutting data D1 to cutting data Dn constitute the pending data evidence, and the determination unit includes: to deposit Subelement is stored up, for the clustered node Ai by the data key values of the cutting data Di to storing to the HBase database, Obtain the ranking results of the pending data evidence, wherein the data key values of the cutting data Di are to for the cutting data Di Mark and the cutting data Di total amount of data form key-value pair.

Further, the storing sub-units include: enquiry module, for inquiring whether the HBase database has been deposited Store up the line unit value of the cutting data Di, wherein the line unit value of the cutting data Di is that the data of the cutting data Di are total The negative of amount；First memory module, the feelings of the line unit value for having stored the cutting data Di in the HBase database Under condition, the cutting data Di is stored to first object and is arranged, wherein the first object is classified as the row of the cutting data Di Any one column belonging to key assignments is of the row in column family pair；And second memory module, in the HBase database not In the case where the line unit value for storing the cutting data Di, according to the stored line unit value storage of the HBase database Cutting data Di key-value pair.

Further, second memory module includes: Comparative sub-module, for the successively cutting data Di's The size of line unit value and the line unit value stored into the HBase database；It is inserted into submodule, is used for the cutting data The line unit value of Di is inserted into the target line in the HBase database, wherein the goal behavior the first row key assignments is of the row Next line or the second line unit value lastrow of the row, the first row key assignments and the second line unit value are the HBase data Stored line unit value in library, the first row key assignments are line unit value greater than the cutting data Di, and with the cutting The smallest line unit value of line unit value difference value of data Di, the second line unit value are the line unit value less than the cutting data Di, and And the smallest line unit value of line unit value difference value with the cutting data Di；Sub-module stored, for depositing the cutting data Di Corresponding second target column of target line described in Chu Zhiyu, wherein second target is classified as in the affiliated column family of the target line Any one column；And submodule is updated, for updating the line unit value stored into the HBase database.

Further, described device further include: receiving unit, for passing through the query interface in the HBase database Receive inquiry instruction from the user, wherein the inquiry instruction is that inquiry has been stored into the HBase database any two The instruction of cutting data corresponding to line unit value between a line unit value；And display unit, for add default mark Mode shows cutting data corresponding to the line unit value inquired in the HBase database.

In the embodiment of the present application, using by pending data according to cutting to multiple clustered nodes of the HBase database In, wherein each clustered node is performed both by the line unit value sequence side of the HBase database after obtaining cutting data Formula；The ranking results for reading each clustered node obtain multiple ranking results, wherein each clustered node It executes after the line unit value sortord is ranked up cutting data, obtains the ranking results；And determination is more The collection of a ranking results is combined into the mode of the ranking results of the pending data evidence, by by pending data according to cutting extremely In multiple clustered nodes where HBase database, since HBase database has the line unit value sequence side for capableing of auto-sequencing Formula, thus pending data according to after cutting to multiple clustered nodes, it can be achieved that automatically sequence；Then multiple clustered nodes are read In ranking results, the cutting data for completion of having sorted in each clustered node are subjected to entirety according to line unit value sortord again Sequence, obtains multiple ranking results, wherein the set of multiple ranking results is the ranking results of pending data evidence, the application The pending data needed in the prior art by each clustered node is omitted using the sortord of HBase database row key assignments The link being ranked up according to can just treat sorting data after merging has achieved the purpose that shorten the data sorting time, thus real Show and do not needed to merge the pending data evidence in each database the technical effect that can be achieved with data sorting, and then has solved In the prior art the technical issues of data sorting low efficiency, the performance of data sorting is improved.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:

Fig. 1 is the flow chart according to a kind of data reordering method based on HBase database of the embodiment of the present application；And

Fig. 2 is the schematic diagram according to a kind of data sorting device based on HBase database of the embodiment of the present application.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.

It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Firstly, making description below to the technical term that the present embodiment is related to:

HBase is a PostgreSQL database distributed, towards column, the distributed storage system of a structural data System.HBase is different from general relational database, it is the database for being suitable for unstructured data storage.Another Unlike HBase it is per-column rather than based on capable mode.

Rowkey is the data unique identification of HBase.HBase saves data and sorts according to rowkey, and HBase inquires data It is also based on rowkey, the whole data for specifying single rowkey is perhaps directly acquired or scanning starts rowkey to end The entire data interval of rowkey.

Family is the mark of HBase physically separate data, needs to predefine when building table, when most of Single family is only used, i.e., inquiry is facilitated physically and without isolation for same data line.

Column is the column name of HBase, and HBase is unstructured database, that is, is not needed when creating HBase table Pre-defined column can be used at any time and be added at any time.

Value is the data value that HBase is finally saved, and can find preservation by rowkey+family+column Value.

According to the embodiment of the present application, a kind of embodiment of the method for data reordering method based on HBase database is provided, It should be noted that step shown in the flowchart of the accompanying drawings can be in the department of computer science of such as a group of computer-executable instructions It is executed in system, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein Sequence execute shown or described step.

Fig. 1 is such as schemed according to a kind of flow chart of data reordering method based on HBase database of the embodiment of the present application Shown in 1, this method comprises the following steps S102 to step S106:

Step S102, will be in multiple clustered nodes of the pending data according to cutting to HBase database, wherein each cluster Node is performed both by the line unit value sortord of HBase database after obtaining cutting data.

Specifically, pending data evidence can be amount of access of certain website in a certain period, such as the visit of www.baidu.com The amount of asking, the amount of access etc. of www.google.comde can also be volumes of searches of certain keyword in a certain period, such as " war of resistance Review troops live streaming " in September 3 volumes of searches in 2015, " Beijing restricted driving " was in September 3 volumes of searches in 2015.It needs to illustrate It is that pending data further includes any number for needing to sort according to the volumes of searches not just for above-mentioned website visiting amount and keyword According to.

Line unit value (rowkey) sortord as possessed by HBase database, when cutting pending data evidence to HBase When multiple clustered nodes in database, the cutting data in each clustered node are sliced into according to the line unit value sortord It is ranked up.

Step S104 reads the ranking results of each clustered node, obtains multiple ranking results, wherein each cluster section After point execution line unit value sortord is ranked up cutting data, a ranking results are obtained.

Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value, To a ranking results, the ranking results being successively read in each clustered node arrive multiple ranking results.It needs to illustrate It is that multiple ranking results are stored in multiple clustered nodes in HBase database, for example, when cutting data are certain website Can be when amount of access, in clustered node a amount of access be 10000 to amount of access be 1000 webpage ranking results, cluster section Can be in point b amount of access be 999 to amount of access be 100 webpage ranking results, can be for amount of access in clustered node c 99 to amount of access be 0 webpage ranking results.Wherein, after carrying out whole sequence to cutting data, obtained multiple sequences As a result quantity can be chosen according to the actual needs of user.

Step S106 determines that the collection of multiple ranking results is combined into the ranking results of pending data evidence.It is distributed in multiple clusters The collection of multiple ranking results in node is combined into the ranking results of pending data evidence.

In the embodiment of the present application, by by pending data according to multiple clustered nodes where cutting to HBase database In, since HBase database has the line unit value sortord for capableing of auto-sequencing, pending data is according to cutting to multiple collection , it can be achieved that automatically sequence after group node；Then the ranking results in multiple clustered nodes are read, it will be in each clustered node The cutting data of completion of having sorted carry out whole sequence according to line unit value sortord again, obtain multiple ranking results, wherein more The set of a ranking results is the ranking results of pending data evidence, and the application uses the sequence side of HBase database row key assignments Formula be omitted need in the prior art by the pending data in each clustered node according to merge after can just treat sorting data into Row sequence link, achieved the purpose that shorten the data sorting time, thus realize do not need in each database wait arrange Ordinal number solves data sorting low efficiency in the prior art according to merging the technical effect that can be achieved with data sorting Technical problem improves the performance of data sorting.

Optionally, after each clustered node execution line unit value sortord is ranked up cutting data, one is obtained Ranking results include the following steps S1041:

Step S1041, clustered node Ai execute line unit value sortord to the cutting data Di of cutting to clustered node Ai It is ranked up, obtains ranking results Ri, wherein it is the quantity of clustered node in HBase database, cluster that i, which successively takes 1 to n, n, Node A1 to clustered node An constitute HBase database multiple clustered nodes, cutting data D1 to cutting data Dn constitute to Sorting data.

Specifically, when cutting pending data is according to the clustered node A1 to clustered node An into HBase database, cutting number It is stored in each clustered node according to Di with the form of data key values pair is unordered.Then by the data key values of cutting data Di It is right, it is ranked up according to line unit value sortord, after each clustered node is according to line unit value sortord, obtains one Ranking results Ri.It is assumed that cutting data Di can be the amount of access of multiple websites in a certain period, if domain name is The amount of access of the website of www.baidu.com is 10000, then the domain name www.baidu.com of the website and 10000 groups of amount of access At data key values pair, the data key values are to being expressed as (www.baidu.com 10000)；Similarly, if domain name is The amount of access of the website of www.google.com is 1000, then the domain name www.google.com of the website and 1000 groups of amount of access At data key values pair, it is expressed as (www.google.com 1000).

It should be noted that user can be according to the surplus of each clustered node during above-mentioned pending data is according to cutting Remaining memory space determines the data volume of the cutting data Di being sliced into each clustered node.

The ranking results for determining that the collection of multiple ranking results is combined into pending data evidence include step S1061, clustered node Ai By the data key values of cutting data Di to storing the ranking results for obtaining pending data evidence to HBase database, wherein cutting Key-value pair of the data key values of data Di to the total amount of data composition of mark and cutting data Di for cutting data Di.

Specifically, the data key values pair of the cutting data Di in ranking results Ri are read respectively, and according to fixed format By the data key values of cutting data Di to storing into HBase database.It should be noted that being illustrated in above-mentioned steps When cutting data Di is website in the amount of access of a certain period, domain name (i.e. cutting of the data key values to that can be the website The mark of data Di) and the website a certain period amount of access (i.e. the total amount of data of cutting data Di) form.Similarly, if When cutting data are the volumes of searches of a certain keyword, then the data key values are to can also be keyword (the i.e. mark of cutting data Di Know) and the keyword a certain period volumes of searches, that is, cutting data Di total amount of data) form.It to sum up describes, cutting data The data key values of Di form the total amount of data of mark and cutting data Di by cutting data Di.

When cutting data Di is the amount of access of certain website in a certain amount of time, to the cutting data Di in clustered node Ai When carrying out whole sequence, the data key values of the cutting data Di after whole sequence need to store to HBase orderly In multiple clustered nodes in database, the ranking results of multiple clustered node compositions are the ranking results of pending data evidence. Where it is assumed that there is 4 clustered nodes, it can store the cutting data that amount of access is 100000~10000 in clustered node 1, collection Can store in group node 2 amount of access be 9999~1000 cutting data, clustered node 3 can store amount of access be 999~ 100 cutting data, clustered node 4 can store the cutting data that amount of access is 99~0.Clustered node 1 to clustered node 4 is Clustered node in HBase database, wherein the quantity of clustered node can be chosen according to user's actual needs.

Optionally, in step S1061 clustered node Ai by the data key values of cutting data Di to storing to HBase database Include the following steps S1 to step S5:

Whether step S1, inquiry HBase database have stored the line unit value of cutting data Di, wherein cutting data Di's Line unit value is the negative of the total amount of data of cutting data Di.

In the case where having stored the line unit value of cutting data Di in HBase database, cutting data Di is deposited by step S3 Storage to first object arranges, wherein first object be classified as cutting data Di line unit value it is of the row belonging to it is any in column family pair One column.

Step S5, in HBase database in the case where the line unit value of not stored cutting data Di, according to HBase data The stored line unit value in library stores cutting data Di key-value pair.

Specifically, in the embodiment of the present application, using line unit value sortord to the cutting data Di in clustered node Ai into Therefore row sequence when the cutting data Di in clustered node Ai is carried out whole sequence, first obtains the line unit of cutting data Di Value rowkey, and the line unit value rowkey for whether having stored cutting data Di is inquired in HBase database, wherein line unit value Rowkey is the negative of the data key values centering total amount of data of cutting data Di.

It is assumed that the data key values of a certain cutting data then should to for (www.baidu.com 10000) in cutting data Di The line unit value of cutting data is -10000.If line unit value -10000 are inquired in HBase database, by data key values Domain name " www.baidu.com " in (www.baidu.com 10000) is stored into HBase database, is specifically deposited The position of storage is any family in column family belonging to line unit value -10000 is of the row.If not inquired in HBase database Line unit value -10000, then by comparing the size of the line unit value -10000 and the line unit value stored into HBase database The line unit value -10000 is inserted into the cluster-specific node in HBase database.

It is rowkey:-10000, family:f for having stored the cutting data Di storage form into database Column:www.baidu.com value:1, wherein family is the mark of HBase physically separate data, and value is The column name of HBase, when rowkey:-10000 is stored to certain a line, family:f, column:www.baidu.com and Value:1 stores any family into the affiliated column family of the row of line unit value -10000, and family:f, column: Www.baidu.com and value:1 are stored in same row.

Further, in step S5, in HBase database in the case where the line unit value of not stored cutting data Di, root Include the following steps S51 to step S57 according to the stored line unit value storage cutting data Di key-value pair of HBase database:

Step S51 successively compares the big of the line unit value of cutting data Di and the line unit value stored into HBase database It is small.

Step S53, the target line line unit value of cutting data Di being inserted into HBase database, wherein goal behavior The first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and the second line unit value are Stored line unit value in HBase database, the first row key assignments are line unit value less than cutting data Di, and with cutting data The smallest line unit value of line unit value difference value of Di, the second line unit value are line unit value greater than cutting data Di, and with cutting data The smallest line unit value of line unit value difference value of Di.

Step S55 stores cutting data Di to the second target column corresponding with target line, wherein the second target is classified as Any one column in the affiliated column family of target line.

Step S57, update have stored the line unit value into HBase database.

Wherein, it after updating stored line unit value every time, is deposited in next time according to line unit value stored in HBase database When storing up not stored cutting data Di key-value pair, corresponding the first row key assignments and the second line unit value can be reaffirmed.One is cut The key-value pair of divided data Di, if its first row key assignments and the second line unit value all exist, the first row key assignments is less than second Line unit value.

Specifically, if the line unit value stored in HBase database is -100000, -50000 and -40000, and not stored number Value be -10000 line unit value, at this time by numerical value be -10000 line unit value respectively with numerical value be -100000, -50000 with - 40000 line unit value is compared.

When cutting data Di is the amount of access of website or is the volumes of searches of certain keyword, user is practical to be preferred to check Amount of access is former websites, or checks that volumes of searches is former keywords, at this point, with negative in HBase database Several forms saves line unit value, to arrange line unit value according to sequence from small to large, i.e., arranges according to descending sequence Column amount of access.

For example, the line unit value for being respectively -100000, -50000 and -40000 with numerical value by the line unit value that numerical value is -10000 It is compared, is respectively less than -10000, and -40000 and -10000 by comparing it is found that -100000, -50000 and -40000 Difference is minimum, therefore -10000 will be inserted into the next line of the row of line unit value -40000, wherein line unit value -40000 namely For the first row key assignments.In the description in above-mentioned steps S1061 it is found that if in clustered node 1 store amount of access be 100000~ 10000 cutting data, line unit value -40000 and -10000 should be stored in clustered node 1, therefore, by numerical value be - 10000 line unit value is inserted into the line unit value next line of the row that numerical value is -40000, and the row for being -10000 by numerical value Any one column where the line unit value that it is -10000 to numerical value that the cutting data Di of key assignments, which is stored, in the affiliated column family of target line, In, the line unit value that numerical value is -40000 is the first row key assignments.By above-mentioned sort method, the amount of access to website can be completed Carry out descending sequence.

In another example if the line unit value stored in HBase database is -5000, -4000 and -1000, and not stored numerical value For -10000 line unit value, by comparing it is found that -5000, -4000 and -1000 be all larger than -10000, and -10000 with - 5000 difference is minimum, therefore -10000 should be inserted into -5000 lastrow of the row (that is, target line) of line unit value, In, line unit value -5000 is also the second line unit value.But in the description in above-mentioned steps S1061 it is found that if in clustered node 2 Store amount of access be 9999~1000 cutting data, at this time by numerical value be -10000 line unit value and numerical value be -5000 row Key assignments is not stored in same clustered node.Therefore, the line unit value that numerical value is -10000 should be stored in clustered node 1, by It is -10000 in the maximum row key assignments that the line unit value that numerical value is -10000 is in node 1 in all line unit values, therefore by numerical value Line unit value stores into clustered node 1 last line (that is, target line).

In another example by comparing it is found that storing numerical value in HBase database is -5000, -4000, -1000 and -500 Line unit value, at this time, it may be necessary to be inserted into the line unit value that numerical value is -2000, by will -2000 respectively with -5000, -4000 and -1000 into Row relatively it is found that -2000 be greater than -4000 and -5000, and with -4000 difference minimum, therefore, by numerical value for -2000 row Key assignments should be inserted into the line unit value next line of the row (that is, target line) that numerical value is -4000；Or -2000 be less than - 1000 and -500, and with -1000 difference minimum, therefore by numerical value be -2000 line unit value to be inserted into -1000 of the row In lastrow (that is, target line), and by line unit value for -2000 cutting data Di store to numerical value for -2000 line unit value institute Any one column in the affiliated column family of target line, wherein the line unit value that numerical value is -4000 is the first row key assignments, and numerical value is - 1000 line unit value is the second line unit value.

The cutting data Di in clustered node Ai is ranked up by above-mentioned sort method, is not needed clustered node again Cutting data Di in each clustered node in Ai is integrated, and the line unit value by reading cutting data Di, which can be realized, cuts The quicksort of divided data Di.For example, being saved in row for the domain name of www.baidu.com as column name in the embodiment of the present application Key assignments be -10000 row in, if there is the amount of access of other websites to be similarly 10000, can be saved into line unit value be - In any one idle column in 10000 affiliated column family of the row, any punching does not occur for this column with www.baidu.com It is prominent.In the embodiment of the present application, the negative of amount of access is saved as line unit value to HBase database is in order to which amount of access is big Data come before in HBase, facilitate inquiry.Also, sort method provided by the embodiments of the present application, for data volume compared with Big pending data is according to, it can be achieved that quickly treat the effect that sorting data is ranked up.

Optionally, sort method provided by the present application further includes following steps S7 to step S9:

Step S7 receives inquiry instruction from the user by the query interface in HBase database, wherein inquiry refers to Enable the instruction that corresponding cutting data any two line unit value between have been stored for inquiry into HBase database.

Step S9 shows the cutting data inquired in a manner of adding default mark in HBase database.

HBase provides query interface quite convenient as database, can be inquired by query interface specified any Cutting data in line unit value section.Such as can quickly inquire very much line unit value be -1000 to line unit value be between -900 Website domain name which has, and can be shown in the form of default mark, it is default to be identified as addition background colour, font The forms such as modification and suspended bubble.

If in HBase database and line unit value of the not stored numerical value between -1000 to -900, and storing numerical value When the line unit value that line unit value and numerical value for -1000 are -900, when inquiry line unit value is -1000, which the domain name of website has, and When inquiry line unit value is -900, which the domain name of website has.

If again in HBase database and line unit value of the not stored numerical value between -1000 to -900, also not stored numerical value When the line unit value that line unit value and numerical value for -1000 are -900, then prompt information can be popped up, to prompt user's " number of inquiry According to being not present ", empty data can also be shown, to show in HBase database and the not stored line unit value.

If again in HBase database and not stored numerical value is -1000 line unit values and numerical value is -900 line unit value, and depositing When having stored up line unit value of the numerical value between -1000 and -900, such as the line unit value of storage is -950 line unit value, then inquires line unit When value is -950, which the domain name of website has.

The embodiment of the present application also provides a kind of data sorting device based on HBase database, the data sorting devices It is mainly used for executing the data reordering method based on HBase database provided by the embodiment of the present application above content, it is right below Data sorting device based on HBase database provided by the embodiment of the present application does specific introduction.

Fig. 2 is the schematic diagram according to a kind of data sorting device based on HBase database of the embodiment of the present application.Such as figure Shown in 2, which includes: cutting unit 10, reading unit 20 and determination unit 30, in which:

Cutting unit 10, for will be in multiple clustered nodes of the pending data according to cutting to HBase database, wherein every A clustered node is performed both by the line unit value sortord of HBase database after obtaining cutting data.

Reading unit 20 obtains multiple ranking results for reading the ranking results of each clustered node, wherein each After clustered node execution line unit value sortord is ranked up cutting data, a ranking results are obtained.

Specifically, after the cutting data in each clustered node are ranked up according to the sortord of line unit value, To a ranking results, the ranking results being successively read in each clustered node obtain multiple ranking results.It needs to illustrate It is that multiple ranking results are stored in multiple clustered nodes in HBase database, for example, when cutting data are certain website It can be the ranking results that amount of access is 10000 most webpages of amount of access 1000, cluster section when amount of access, in clustered node a It can be the ranking results that amount of access is 999 most webpages of amount of access 100 in point b, can be for amount of access in clustered node c 99 to amount of access be 0 webpage ranking results.Wherein, after carrying out whole sequence to cutting data, obtained multiple sequences As a result quantity can be chosen according to the actual needs of user.

Determination unit 30, for determining that the collection of multiple ranking results is combined into the ranking results of pending data evidence.

Optionally, reading unit 20 includes sorting subunit, in which:

Sorting subunit executes line unit value sortord to the cutting number of cutting to clustered node Ai for clustered node Ai It being ranked up according to Di, obtains ranking results Ri, wherein it is the quantity of clustered node in HBase database that i, which successively takes 1 to n, n, Clustered node A1 to clustered node An constitutes multiple clustered nodes of HBase database, cutting data D1 to cutting data Dn structure At pending data evidence.

Determination unit 30 includes storing sub-units, wherein storing sub-units are for clustered node Ai by cutting data Di's Data key values obtain the ranking results of pending data evidence to storing to HBase database, wherein the data key of cutting data Di It is worth the key-value pair to the total amount of data composition of mark and cutting data Di for cutting data Di.

The data key values pair of the cutting data Di in ranking results Ri are read respectively, and according to fixed format by cutting number Data key values according to Di are to storing into HBase database.Work as cutting it should be noted that having been illustrated in the foregoing description Data Di is website in the amount of access of a certain period, and data key values are to domain name (the i.e. cutting data Di that can be the website Mark) and the website a certain period amount of access (i.e. the total amount of data of cutting data Di) composition.Similarly, if cutting data For a certain keyword volumes of searches when, then the data key values are to can also be keyword (i.e. the mark of cutting data Di) and the pass Volumes of searches, that is, cutting data Di total amount of data of the keyword in a certain period) composition.It to sum up describes, the data key of cutting data Di Value forms the total amount of data of mark and cutting data Di by cutting data Di.

Optionally, storing sub-units include enquiry module, the first memory module and the second memory module, in which:

Whether enquiry module has stored the line unit value of cutting data Di for inquiring HBase database, wherein line unit value For the negative of the total amount of data of cutting data Di；First memory module, for having stored cutting data Di in HBase database Line unit value in the case where, by cutting data Di store to first object arrange, wherein first object is classified as the row of cutting data Di Any one column belonging to key assignments is of the row in column family pair；Second memory module is used for the not stored cutting in HBase database In the case where the line unit value of data Di, cutting data Di key-value pair is stored according to the stored line unit value of HBase database.

Optionally, the second memory module includes Comparative sub-module, insertion submodule, sub-module stored and updates submodule, Wherein:

Comparative sub-module, for successively comparing the line unit value of cutting data Di and the row stored into HBase database The size of key assignments；It is inserted into submodule, for the line unit value of cutting data Di to be inserted into the target line in HBase database, In, goal behavior the first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and Two line unit values are stored line unit value in HBase database, and the first row key assignments is the line unit value greater than cutting data Di, and And the smallest line unit value of line unit value difference value with cutting data Di, the second line unit value are the line unit value less than cutting data Di, and And the smallest line unit value of line unit value difference value with cutting data Di；Sub-module stored, for by cutting data Di store to mesh Corresponding second target column of mark row, wherein the second target is classified as any one column in the affiliated column family of target line；Update submodule Block, for updating the line unit value stored into HBase database.

For example, the line unit value for being respectively -100000, -50000 and -40000 with numerical value by the line unit value that numerical value is -10000 It is compared, is respectively less than -10000, and -40000 and -10000 by comparing it is found that -100000, -50000 and -40000 Difference is minimum, therefore -10000 will be inserted into the next line of the row of line unit value -40000, wherein line unit value -40000 namely For the first row key assignments.Foregoing description is it is found that if store the cutting data that amount of access is 100000~10000, row in clustered node 1 Key assignments -40000 and -10000 should be stored in clustered node 1, therefore, the line unit value that numerical value is -10000 is inserted into number Value for -40000 line unit value next line of the row in, and by numerical value for -10000 line unit value cutting data Di store to Any one column where the line unit value that numerical value is -10000 in the affiliated column family of target line, wherein the line unit that numerical value is -40000 Value is the first row key assignments.By above-mentioned sort method, it can be completed and descending sequence is carried out to the amount of access of website.

In another example if the line unit value stored in HBase database is -5000, -4000 and -1000, and not stored numerical value For -10000 line unit value, by comparing it is found that -5000, -4000 and -1000 be all larger than -10000, and -10000 with - 5000 difference is minimum, therefore -10000 should be inserted into the lastrow of the row of line unit value -5000, wherein line unit value - 5000 be also the second line unit value.But foregoing description is it is found that if storing amount of access in clustered node 2 is 9999~1000 Cutting data, at this time by numerical value be -10000 line unit value and numerical value be -5000 line unit value be not stored in same cluster section Point in.Therefore, the line unit value that numerical value is -10000 should be stored in clustered node 1, since the line unit value that numerical value is -10000 is Maximum row key assignments in node 1 in all line unit values, therefore the line unit value that numerical value is -10000 is stored into clustered node 1 most A line afterwards.

In another example by comparing it is found that storing numerical value in HBase database is -5000, -4000, -1000 and -500 Line unit value, at this time, it may be necessary to be inserted into the line unit value that numerical value is -2000, by will -2000 respectively with -5000, -4000 and -1000 into Row relatively it is found that -2000 be greater than -4000 and -5000, and with -4000 difference minimum, therefore, by numerical value for -2000 row Key assignments should be inserted into the line unit value next line of the row that numerical value is -4000；Or -2000 be less than -1000 and -500, And it is minimum with -1000 difference, therefore the line unit value that numerical value is -2000 is inserted into -1000 lastrows of the row, and Where the cutting data Di that line unit value is -2000 is stored the line unit value for being -2000 to numerical value in the affiliated column family of target line Any one column, wherein the line unit value that numerical value is -4000 is the first row key assignments, and the line unit value that numerical value is -1000 is the second row Key assignments.

Optionally, the data sorting device provided by the present application based on HBase database further includes receiving unit and display Unit, in which:

Receiving unit, for receiving inquiry instruction from the user by the query interface in HBase database, wherein Inquiry instruction is to inquire cutting data corresponding to the line unit value stored into HBase database between any two line unit value Instruction；Display unit, for showing that the line unit value institute inquired is right in HBase database in a manner of adding default mark The cutting data answered.

Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

In above-described embodiment of the application, all emphasizes particularly on different fields to the description of each embodiment, do not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the application whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims

1. a kind of data reordering method based on HBase database characterized by comprising

It will be in multiple clustered nodes of the pending data according to cutting to the HBase database, wherein each clustered node exists After obtaining cutting data, it is performed both by the line unit value sortord of the HBase database；

The ranking results for reading each clustered node obtain multiple ranking results, wherein each clustered node It executes after the line unit value sortord is ranked up cutting data, obtains the ranking results；And

Determine that the collection of multiple ranking results is combined into the ranking results of the pending data evidence；

Wherein, after each clustered node execution line unit value sortord is ranked up cutting data, one is obtained A ranking results include: that clustered node Ai executes the line unit value sortord and cuts to cutting to the clustered node Ai Divided data Di is ranked up, and obtains ranking results Ri, wherein it is clustered node in the HBase database that i, which successively takes 1 to n, n, Quantity, clustered node A1 to clustered node An constitutes multiple clustered nodes of the HBase database, and cutting data D1 is to cutting Divided data Dn constitutes the pending data evidence, determines that the collection of multiple ranking results is combined into the sequence knot of the pending data evidence Fruit include: the clustered node Ai by the data key values of the cutting data Di to storing to the HBase database, obtain institute State the ranking results of pending data evidence, wherein the data key values of the cutting data Di are to the mark for the cutting data Di The key-value pair formed with the total amount of data of the cutting data Di.

2. the method according to claim 1, wherein the clustered node Ai is by the data of the cutting data Di Key-value pair is stored to the HBase database

Inquire the line unit value whether the HBase database has stored the cutting data Di, wherein the cutting data Di's Line unit value is the negative of the total amount of data of the cutting data Di；

In the case where having stored the line unit value of the cutting data Di in the HBase database, by the cutting data Di It stores to first object and arranges, wherein the first object is classified as the line unit value affiliated column family of the row of the cutting data Di In any one column；And

In the HBase database in the case where line unit value of the not stored cutting data Di, according to the HBase data The stored line unit value in library stores the cutting data Di key-value pair.

3. according to the method described in claim 2, it is characterized in that, in the HBase database the not stored cutting number In the case where line unit value according to Di, the cutting data Di key assignments is stored according to the stored line unit value of the HBase database To including:

The successively size of the line unit value of the cutting data Di and the line unit value stored into the HBase database；

The target line line unit value of the cutting data Di being inserted into the HBase database, wherein the goal behavior The first row key assignments next line of the row or the second line unit value lastrow of the row, the first row key assignments and the second line unit value It is stored line unit value in the HBase database, the first row key assignments is the line unit less than the cutting data Di Value, and the smallest line unit value of line unit value difference value with the cutting data Di, the second line unit value are greater than the cutting The line unit value of data Di, and the smallest line unit value of line unit value difference value with the cutting data Di；

The cutting data Di is stored to the second target column corresponding with the target line, wherein second target is classified as Any one column in the affiliated column family of the target line；And

Update has stored the line unit value into the HBase database.

4. the method according to claim 1, wherein the method also includes:

Inquiry instruction from the user is received by query interface in the HBase database, wherein the inquiry instruction is Inquiry has stored the finger of cutting data corresponding to the line unit value into the HBase database between any two line unit value It enables；And

Cutting number corresponding to the line unit value inquired is shown in the HBase database in a manner of adding default mark According to.

5. a kind of data sorting device based on HBase database characterized by comprising

Cutting unit, for will be in multiple clustered nodes of the pending data according to cutting to the HBase database, wherein each The clustered node is performed both by the line unit value sortord of the HBase database after obtaining cutting data；

Reading unit obtains multiple ranking results for reading the ranking results of each clustered node, wherein every After a clustered node execution line unit value sortord is ranked up cutting data, the sequence knot is obtained Fruit；And

Determination unit, for determining that the collection of multiple ranking results is combined into the ranking results of the pending data evidence；

Wherein, the reading unit includes: sorting subunit, executes the line unit value sortord to cutting for clustered node Ai Divide the cutting data Di to the clustered node Ai to be ranked up, obtains ranking results Ri, wherein it is institute that i, which successively takes 1 to n, n, The quantity of clustered node in HBase database is stated, clustered node A1 to clustered node An constitutes the multiple of the HBase database Clustered node, cutting data D1 to cutting data Dn constitute the pending data evidence, and the determination unit includes: that storage is single The data key values of the cutting data Di are obtained institute to storing to the HBase database for the clustered node Ai by member State the ranking results of pending data evidence, wherein the data key values of the cutting data Di are to the mark for the cutting data Di The key-value pair formed with the total amount of data of the cutting data Di.

6. device according to claim 5, which is characterized in that the storing sub-units include:

Whether enquiry module has stored the line unit value of the cutting data Di for inquiring the HBase database, wherein institute The negative for the total amount of data that the line unit value for stating cutting data Di is the cutting data Di；

First memory module, in the case where for having stored the line unit value of the cutting data Di in the HBase database, The cutting data Di is stored to first object and is arranged, wherein the first object is classified as the line unit value of the cutting data Di Any one column belonging to of the row in column family pair；And

Second memory module, in the case where line unit value for the cutting data Di not stored in the HBase database, The cutting data Di key-value pair is stored according to the stored line unit value of the HBase database.

7. device according to claim 6, which is characterized in that second memory module includes:

Comparative sub-module for the successively line unit value of the cutting data Di and has been stored into the HBase database Line unit value size；

It is inserted into submodule, for the line unit value of the cutting data Di to be inserted into the target line in the HBase database, In, goal behavior the first row key assignments next line of the row or the second line unit value lastrow of the row, the first row Key assignments and the second line unit value are stored line unit value in the HBase database, and the first row key assignments is greater than described The line unit value of cutting data Di, and the smallest line unit value of line unit value difference value with the cutting data Di, second line unit Value is line unit value less than the cutting data Di, and the smallest line unit value of line unit value difference value with the cutting data Di；

Sub-module stored, for storing the cutting data Di to the second target column corresponding with the target line, wherein institute State any one column that the second target is classified as in the affiliated column family of the target line；And

Submodule is updated, for updating the line unit value stored into the HBase database.

8. device according to claim 5, which is characterized in that described device further include:

Receiving unit, for receiving inquiry instruction from the user by the query interface in the HBase database, wherein The inquiry instruction is corresponding to the line unit value that inquiry has been stored into the HBase database between any two line unit value The instruction of cutting data；And

Display unit, for showing the line unit value institute inquired in the HBase database in a manner of adding default mark Corresponding cutting data.