Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 2 is a flow chart of an embodiment of a method for window statistics of data according to the present application. Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or module configurations may be included in the method or apparatus based on conventional or non-inventive efforts. In the case of steps or structures where there is no logically necessary cause-and-effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure provided in the embodiments of the present application. The described methods or modular structures, as applied to a device or end product in practice, may be executed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or the modular structures shown in the figures.
Specifically, as shown in fig. 2, a method for statistical window of data provided in an embodiment of the present application may include:
s1: acquiring unit time level data of a current time service dimension and a historical window statistical result of the last unit time service dimension of the current time.
This embodiment will be described with a keyword statistic in a certain search system as an application scenario. In this embodiment, in an application scenario, the service system may perform service dimension statistics on the recorded data in unit time, and generate unit time level data and periodic window negative data of the unit time level data. The unit time may include unit time for periodically counting the set recorded data, for example, in this embodiment, the search system may take minutes as unit time, and may count the data for one minute level for each minute to generate the search keyword minute level data for the current minute. Of course, the unit time may be set by self-definition according to actual data processing or scene, design requirements, and the like, for example, hours may be used as the unit time, or days, weeks, and the like may be used as the unit time, and then the service dimension statistics of the same unit time is performed to generate corresponding unit time level data. In the data window statistics, a window time window length of the window statistics is also generally set, and the time window length may be a preset time period for performing the window statistics, such as 24 hours, or a week, a month, and the like. Generally, the system may count the recorded data in unit time at intervals of the unit time and store the counted data in the database (count and store keyword search records within one minute), and may perform window counting based on the stored unit time level (e.g., minute level) when performing real-time window counting of the data, so that the window time window length for performing window counting is usually longer than the unit time for performing system service dimension counting, such as minute level counting of the keyword and 24 hour window counting.
The embodiment can be described by taking the service dimension statistics in minutes as unit time, taking the window time window length of the 24-hour window statistics, taking the keyword searched by the user as the service dimension, and taking the TOP100 keyword which is searched by using the search system and is used for counting the last 24 hours per minute as an application scene. Specifically, in this embodiment, the per-minute window statistical result may be stored in the "keyword TOP100 minute-level result calculation table", and then the "keyword TOP100 minute-level result calculation table" may be updated in real time according to the window statistical result at the current minute time. Therefore, in the embodiment of the application, statistical data to be processed can be obtained, then business dimension statistics of keywords can be performed on the statistical data to generate minute-level data, and the minute-level data can be stored in the database HBase. Then, when performing window statistics of the service dimension at the current time, the current time can be obtained as the data of the dimension unit time level, and meanwhile, the historical window statistics result of the service dimension at the previous unit time at the current time can be obtained, for example, the "keyword TOP100 minute level result calculation table" for calculating the current time 2016-3-1210:20 can be obtained, the minute level data of the keyword at the minute in the 2016-3-1210:20 system and the "keyword TOP100 minute level result calculation table" for the historical window statistics result of the previous minute 2016-3-1210: 19 can be obtained.
Certainly, the service dimension described in this application may be set by an operator according to a specific application scenario, for example, in the application scenario of this embodiment, a keyword searched by a user may be used as the service dimension in the search system, the service dimension in other application scenarios may include, but is not limited to, a flow rate of a data interface, a geographic location, an access source, a specified key field, and the like, and may be set according to a window statistics actual service data processing scenario. In the application scenario of this embodiment, a description may be given by taking a keyword as a service dimension to execute a minute-level data system as an example.
The statistical data described in this application may include log information generated by data operation, execution, and the like, and for example, in the application scenario of this embodiment, the statistical data may include original log information of a keyword and a search frequency of a search application. In this embodiment, the original log information of the keyword search may be acquired in real time every minute, and then minute-level data of the keyword at the current time may be generated according to the original log information at the current minute and stored in the database HBase.
When the real-time window statistics at the current moment is carried out, the unit time level data of the service dimension at the current moment and the historical window statistics result of the service dimension at the last unit time at the current moment can be obtained.
S2: and inquiring the periodic window negative data of the current moment from the stored historical unit time level data, and calculating to obtain the service dimension increment data of the current moment according to the periodic window negative data and the unit time level data of the service dimension of the current moment.
In the embodiment of the application, when unit time level data statistics is performed, periodic window negative data at a corresponding time after the current time plus the window time window length can be generated at the same time. Specifically, in an embodiment of the present application, the periodic window negative data may include a negative statistical value-N of the statistical value N at the statistical time T, which is taken as a value of the statistical data of the corresponding service dimension at the time (T + L) when the statistical time T is added to the set window time window length L. Therefore, in the embodiment, by setting the window time window length, when performing real-time window statistics, the expired invalid statistical data before the window time window length can be cancelled out, so that the business dimension incremental data is calculated, and the method and the device for calculating the window time window length accurately and effectively realize window statistics in a business dimension incremental data mode are ensured. In the embodiment of the application, when unit time level data statistics is performed, periodic window negative data at a corresponding time after the current time plus the window time window length can be generated at the same time. The specific setting, conversion and generation modes of the periodic window negative data can be set according to the type of statistical data or the scene requirement. In an implementation manner provided by the embodiment of the present application, the periodic window negative data at the current time includes:
s201: and at the previous window counting time of the current time, generating a negative counting value of the service dimension at the current time based on the negative value of the service dimension counting value in the unit time level data of the service dimension at the previous window counting time.
Specifically, for example, in an application scenario in which a keyword is taken as a service dimension statistic in the search application of this embodiment, if the search time statistic of the keyword "mobile phone" in one minute in the obtained original log information at the current time 2016-3-1210:20 is 10, the unit time level data of the keyword "mobile phone" at the current time 2016-3-1210:20 that can be generated may include a data record of which the search time statistic of the keyword "mobile phone" is 10. Meanwhile, the periodic window negative data of the keyword 'mobile phone' at the periodic time 2016-3-1310: 20 after the current time 2016-3-1210:20 plus the window time window length of 24 hours can be generated, and the periodic window negative data can comprise data records with the 2016-3-1310: 20 keyword 'mobile phone' search frequency statistic value of-10. FIG. 3 is a schematic illustration of a record of clock level data and corresponding cycle window negative data generated by the present application. Of course, as shown in fig. 3, for other keywords at the current time 2016-3-1210:20, such as "home appliance", "iPhone SE", "car", etc., statistics of the minute level can be performed according to the original log information, so as to generate corresponding unit time level data and corresponding periodic window negative data. For example, 2016-3-1210:20 keywords "home" are respectively 20 in the statistics of the number of searches for one minute, and 0 in the statistics of the number of searches for one minute of the keywords "iPhone SE", negative data of a period window of 2016-3-1210:20 keywords "home" can be recorded at 24 hours of a current moment 2016-3-1210:20, and certainly, in the application scenario of this embodiment, "iPhone SE" is to be issued only once in this minute because 2016-3-12 does not disclose "iPhone SE" to the outside in this day. Of course, if there is no search record, there may be no "iPhone SE" search record data, or the number of searches thereof may be set to 0.
When the unit time level data is generated by performing the keyword statistics of unit time on the statistical data in the application scenario of the embodiment of the application, the negative statistical value of the keyword at the moment of the window length when the current time is windowed can be correspondingly generated, and can be stored in a database such as HBase.
According to the embodiment of the application, the periodic window negative data at the current moment can be inquired from historical unit time level data stored in a database, and then the service dimension increment data at the current moment can be obtained through calculation according to the periodic window negative data and the unit time level data of the service dimension at the current moment.
Specifically, for example, in the application scenario of keyword statistics in the search system of this embodiment, a search time statistic of 35 for one minute of the keyword "mobile phone", a search time statistic of 20 for one minute of the keyword "home appliance", and a search time statistic of 50 for one minute of the keyword "iPhoneSE" in the current time 2016-3-1310: 20 may be queried from historical minute-level statistics data stored in the database HBase. Meanwhile, the database HBase also records that the statistical value of the search times of one minute of the keyword 'mobile phone' which is the negative data of the periodic window of the periodic time 2016-3-1310: 20 generated by the statistics of the previous window statistical time 2016-3-1210:20 is-10, and the statistical value of the search times of one minute of the keyword 'household appliance' is-25. The incremental keyword data that may then be added to the 24 hour window length for the current one minute 2016-3-1310: 20 may include: the 24-hour window increment value of the keyword 'mobile phone' is 35+ (-10) ═ 25, the 24-hour window increment value of the keyword 'household appliance' is 20+ (-25) — 5, and the 24-hour window increment value of the keyword 'iPhone SE' is 0+50 ═ 50. Thus, as can be seen from the incremental data of keyword search, as the apple mobile SE is going to come into the market, the search popularity of the mobile SE and the SE model also increases. In comparison, the number of searches of the household appliance at 2016-3-1210:20 at the statistical time is 25, and the number of searches at 2016-3-1310: 20 at the statistical time on the next day is 20, which is relatively reduced by 5. Since the period window negative data of the 2016-3-1310: 20 keyword household appliance is set after the window time window length is set for 24 hours at the 2016-3-1210:20 statistical time, the 24-hour window increment value of the keyword household appliance at the current time can be accurately counted to be-5 when window statistics is carried out to the 2016-3-1310: 20.
In the application scenario of the embodiment of the application, the keyword increment data at the current moment can be obtained by calculating the minute-level data generated by the minute-level keyword statistics and the cycle window negative data obtained by correspondingly adding the 24-hour window time window length.
S3: and determining the window statistical result of the service dimension at the current moment based on the historical window statistical result and the service dimension increment data at the current moment.
Obviously, the calculation amount of the business dimension statistic increment data in unit time is usually much smaller than the calculation amount of the data of all the business dimension statistics in unit time in the window length. According to the embodiment based on the periodic window negative data, the overdue data in the window time window length are eliminated, so that the window statistical result at the current time can be obtained through calculation based on the service dimension statistical increment data of the window time window length at the current time. Specifically, the window statistical result determined based on the business dimension statistical incremental data in the present application may be obtained by performing corresponding calculation according to a specific application scenario or a design window statistical manner.
The data window statistical method provided by the embodiment of the application realizes the window statistics of the data by adopting an incremental data-based mode. Compared with the data needing to be processed in the prior art, the incremental data in unit time is much smaller in data volume level, so that the window statistics of the statistical data is completed in an incremental data-based mode, the memory overhead and the network overhead of the system can be greatly reduced, and the overall processing performance of the system is improved. In the embodiment of the present application, a negative statistical value of the period time of the current statistical time plus the statistical period (the window time window length described in the example) is generated at the same time when the business dimension statistics of the unit time is processed, and is stored in the database. Therefore, when the window statistics of unit time is carried out at the periodic time, the expired business dimension statistical data before one statistical period of the periodic time can be offset, and the accuracy of calculating the business dimension statistical incremental data in the method is effectively guaranteed. By adopting the embodiment of the application, the system load and the network overhead can be effectively and greatly reduced, and the data processing efficiency and the server performance of window statistics are improved.
Certainly, when the window statistics of the service dimension at the current time is performed, the periodic window negative data at the current time can also be generated at the same time and then stored in the database, so that the real-time window statistics of the subsequent data is facilitated. Therefore, in another embodiment of the method for window statistics of data described in this application, the method may further include:
s4: and taking a negative value of the service dimension statistic value in the unit time level data of the service dimension at the current moment, and taking the negative value as periodic window negative data of the service dimension at the corresponding periodic moment after the current moment is added with the set window time window length.
In an embodiment of the present application, the service dimension window statistics record data updated every unit time may be used to represent a window statistics result of the current real-time statistics data. For example, in the application scenario of the above embodiment, the "HBase keyword window calculation result table" may record and update the statistical result data of the statistical keyword search in the 24-hour window at the current minute time in real time, and may specifically include information about which keywords the user has searched in the past 24 hours, the number of searches for the keywords, and the like. Therefore, in an embodiment of the data window statistical method according to the present application, the determining the window statistical result of the business dimension at the current time based on the historical window statistical result and the business dimension increment data at the current time includes:
s301: inquiring service dimension window result data of the current time in a window counting time period from the stored historical service dimension window result data, combining the service dimension window result data and the service dimension increment data, and updating the combined operation result data into the historical service dimension window result data.
FIG. 4 is a flow chart of another embodiment of a method for window statistics of data provided herein. And inquiring the service dimension window result data of the current time in the window counting time period from the stored historical service dimension window result data, combining the service dimension window result data and the service dimension increment data for operation to obtain the service dimension window result data of the current time, and then updating the service dimension window result data of the current time into the stored historical service dimension window result data. The service dimension window time window length record data described in the present application may include the form of the aforementioned HBase keyword window calculation result table, and may also include other forms of storing calculated service dimension window result data.
In the embodiment of the application, the stored and recorded service dimension window result data can be used for representing the real-time window statistical result of the statistical data at the current moment. Specifically, business dimension window time window length record data in the window time window length at the current moment can be inquired from the set keyword window statistical record data of the unit time level, and historical business dimension window time window length record data in the window time window length and business dimension statistical increment data in the window time window length are combined to obtain business dimension window time window length record data in the updated window time window length at the current moment. For example, as shown in the application scenario of the above embodiment, an "HBase keyword window calculation result table" may be provided, where the "HBase keyword window calculation result table" may record minute-level keyword search records of 24-hour window statistics, and at the current time 2016-3-1310: 20, a keyword window data record of the historical 24-hour window time window length of the previous unit time 2016-3-1310: 19 may be queried from the "HBase keyword window calculation result table", assuming that the 24-hour window search time statistics include the keyword "mobile phone" of 15000, the keyword "home appliance" of 25000, and the keyword "iPhone SE" of 50000. The inquired historical keyword window data records are added with the keyword incremental data 25, -5 and 50 of the current time 2016-3-1310: 20 respectively, and after combination, the 24-hour keyword window data records of the current time 2016-3-1310: 20 comprise a 24-hour window search frequency statistic 15025 of the keyword 'mobile phone', a 24-hour window search frequency statistic 24995 of the keyword 'household appliance' and a 24-hour window search frequency statistic 50050 of the keyword 'iPhone SE'. And then updating the key window record data merged at the current time 2016-3-1310: 20 into the key window record data in the HBase key window calculation result table.
Of course, in some other application scenarios, the TOP ordering may be performed on the window statistics to obtain the final window statistics result. For example, the TOP N keywords concerned by the user hotspot are screened out according to the TOP N100 calculation results of the keywords of the clock machine counted and updated every minute according to the keyword search times. In an embodiment of the data window statistical method described in the present application, the top n ranking result of the current unit time may be calculated based on the traffic dimension statistical increment data counted in the current unit time and the top n result data of the last unit time. Therefore, in another embodiment of the data window statistical method according to the present application, the determining the window statistical result of the business dimension at the current time based on the historical window statistical result and the business dimension increment data at the current time includes:
s302: and obtaining a service dimension statistic search sequencing result of the last unit time, combining the service dimension statistic search sequencing result of the last unit time and the service dimension statistic incremental data of the current moment, and then sequencing to obtain a service dimension statistic search sequencing result of the current moment.
FIG. 5 is a flow chart of another embodiment of a method for window statistics of data provided herein. For example, in the application scenario of keyword statistics in the search application, the TOP100 calculation result of the previous minute may be retrieved from the HBase database with a unit time of minute (of course, in other application scenarios, the unit time of hour, day, week, etc. may also be used), and then merged with the keyword increment data calculated at the current time, and then reordered to obtain the latest TOP100 keyword search ranking result at the current time, and may be stored in the set "keyword TOP100 minute-level calculation result table". Specifically, for example, at time 09:55 TOP2 results are: the "home appliance" search statistic value is 200, and the "mobile phone" search statistic value is 100. The statistics of the "handset" 24 hour window key delta data at time 09:56 became 150 (200 searches of the current 09:56 plus 50 searches of the periodic window negative data-50 at 24 hours ago 9: 56), i.e. at current time 09:56, the last minute of TOP2 at 09:55 and the window time window length delta data at 09:56 were merged into: the search statistic value of the mobile phone is 100, the search statistic value of the household appliance is 200, the search statistic value of the mobile phone is 150, the current moment 09:56 obtained after the three data are combined is that the search statistic value of the mobile phone is 150, and the search statistic value of the household appliance is 200. The two results are then sorted to give a TOPN of 09:56 at the current time: the "home appliance" search statistic value is 200, and the "mobile phone" search statistic value is 150.
Compared with the prior art that a large amount of data is read, requested and calculated and the memory consumption of up to a plurality of G is usually consumed in window statistics of an addition mode or a subtraction mode, the TOPN sequencing at the current moment is performed based on the service dimension statistic increment data, and the method and the device can greatly reduce the system memory consumption and the network overhead when calculating the TOPN result of the service dimension statistics at the unit time level, and remarkably improve the window statistical efficiency.
In the embodiment provided by the application scenario of the above embodiment of the present application, when performing the window statistics calculation of data in the search system, the one-minute window statistics calculation may query only two minute-level data, where QPS is 2/60, assuming that there are 2000 search keywords per minute, the query data amount is maximum 2000 × 2 × 0.4 ═ 1.6MB data, the QPS for querying the "HBase keyword window calculation result table" is 2000 × 2/60 ≈ 66QPS, and finally, the processing data load for updating and storing to the HBase is TPS 2000 × 2/60 ≈ 66 TPS. The window statistics method can greatly reduce the system memory overhead due to the adoption of a mode that the order of magnitude is far smaller than that of the existing incremental data, the existing minute-level statistics method can load 1G or higher data into the memory, the system can not process the data within one minute or the overhead is huge, and the window statistics method can load even M memory data and can finish the real-time window statistics within one minute. Therefore, the window statistical method for the data provided by the embodiment of the application realizes the window statistics of the data by adopting a mode based on the incremental data, and the incremental data in unit time is much smaller in data volume level than the data to be processed by window statistics in the prior art, so that the window statistics of the statistical data is completed by adopting the mode based on the incremental data, the memory overhead and the network overhead of a system can be greatly reduced, and the overall processing performance of the system is improved. Therefore, the present application further provides a method applicable to real-time window statistics on keyword data, and the specific method may include:
acquiring unit time level data of a keyword at the current moment and a historical window statistical result of the keyword at the last unit time at the current moment;
inquiring periodic window negative data of the current keyword from the stored historical unit time level data, and calculating to obtain the keyword increment data of the current time according to the periodic window negative data and the unit time level data of the current keyword;
and determining the window statistical result of the keyword at the current moment based on the historical window statistical result and the keyword increment data at the current moment.
Of course, the embodiments provided in the above embodiments of the present application may be applied to an application scenario including, but not limited to, performing window statistics on search keyword data in a search system. In other application scenarios, the embodiment of performing window statistics based on incremental data described in this application may still be used, for example, window statistics of a user dimension of a running bill, traffic window statistics of business data stored in a database, and the like. Of course, in some application scenarios, the window statistics of the service data may be included, and the search keyword window statistics of the search service may also be included.
The window statistical method of the data can be applied to various server systems for data window statistics, and can effectively solve the problems of high TPS and QPS performance requirements on data storage and high server memory overhead in window statistics such as window statistics or TOP sorting. Therefore, the application provides a data window statistical device based on the provided data window statistical method. Fig. 6 is a schematic block diagram of an embodiment of a data window statistics apparatus provided in the present application, and as shown in fig. 6, the apparatus may include:
the data acquisition module 101 may be configured to acquire unit time level data of a service dimension at a current time and a historical window statistical result of a service dimension of a previous unit time at the current time;
the incremental data calculation module 102 may be configured to query the cycle window negative data of the current time from the stored historical unit time level data, and calculate, according to the cycle window negative data and the unit time level data of the business dimension of the current time, to obtain business dimension incremental data of the current time;
the window statistic result module 103 may be configured to determine a window statistic result of the current-time service dimension based on the historical window statistic result and the current-time service dimension increment data.
The window statistics device for the data provided by the embodiment of the application realizes the window statistics of the data by adopting a mode based on incremental data. Compared with the data needing to be processed in the prior art, the incremental data in unit time is much smaller in data volume level, so that the window statistics of the statistical data is completed in an incremental data-based mode, the memory overhead and the network overhead of the system can be greatly reduced, and the overall processing performance of the system is improved. In the embodiment of the application, a negative statistical value of the current statistical time plus the period time of the statistical period is generated when the business dimension statistics of unit time is processed and is stored in the database. Therefore, when the window statistics of unit time is carried out at the periodic time, the expired business dimension statistical data before one statistical period of the periodic time can be offset, and the accuracy of calculating the business dimension statistical incremental data in the method is effectively guaranteed.
The period window negative data can include period window negative data of corresponding period moments generated when the system performs unit time statistics of service dimensions, the period window negative data can be generated by the device, and can also be generated by other module devices of the system and stored in a database for the device to read and use. And when the unit time level data is generated by statistics of unit time every time, the periodic window negative data of the data can be correspondingly generated. Therefore, in an embodiment of the apparatus of the present application, the periodic window negative data of the current time may include:
and at the previous window counting time of the current time, generating a negative counting value of the service dimension at the current time based on the negative value of the service dimension counting value in the unit time level data of the service dimension at the previous window counting time.
As described above, when the window statistics at the current time is completed, the periodic window negative data at the periodic time corresponding to the current time after the current time is added with the set window time window length can be generated, so that the outdated data can be eliminated during the window statistics at the subsequent statistical time, and the incremental data can be accurately calculated. Fig. 7 is a schematic block diagram of another embodiment of a data window statistics apparatus provided in the present application. Therefore, in another embodiment of the apparatus of the present application, the apparatus may further include:
the window negative data generating module 104 may be configured to take a negative value of the service dimension statistic in the unit time level data of the service dimension at the current time, and use the negative value as periodic window negative data of a periodic time corresponding to the service dimension at the current time plus a set window time window length.
In the embodiment of the application, a negative statistical value of the current statistical time plus the period time of the statistical period is generated when the business dimension statistics of unit time is processed and is stored in the database. Therefore, when the window statistics of unit time is carried out at the periodic time, the expired business dimension statistical data before one statistical period of the periodic time can be offset, and the accuracy of calculating the business dimension statistical incremental data in the method is effectively guaranteed.
Fig. 8 is a schematic block structure diagram of another embodiment of a data window statistics apparatus provided in the present application, and as shown in fig. 8, in an implementation of the apparatus, the window statistics module 103 may include:
the service dimension window calculation result unit 1301 may be configured to store the calculated historical service dimension window result data, query the service dimension window result data in the window statistics time period at the current time from the stored historical service dimension window result data, and update the historical service dimension window result data after performing a merge operation on the service dimension window result data and the service dimension increment data.
The historical service dimension window result data described in this embodiment may include the form of the aforementioned HBase keyword window calculation result table, and may also include other forms of service dimension statistical search statistical results that store the calculated statistical window time length. The HBase keyword window calculation result table may record and update the statistical result data of the statistical keyword search in a 24-hour window at the current time in real time, and specifically may include information about which keywords the user has searched in the last 24 hours, the number of searches for these keywords, and the like.
In other application scenarios, TOP sorting can be performed on the window statistical data to generate a final real-time window statistical result at the current moment. For example, the TOP N keywords concerned by the user hotspot are screened out according to the TOP N100 calculation results of the keywords of the clock machine statistically updated every minute according to the keyword search times. In an embodiment of the data window statistical apparatus described herein, the top n ranking result of the current unit time may be calculated based on the traffic dimension statistical increment data counted in the current unit time and the top n result data of the last unit time. Fig. 9 is a schematic block structure diagram of another embodiment of a data window statistics apparatus provided in the present application, and as shown in fig. 9, in an implementation of the apparatus, the window statistics module 103 may include:
the service dimension window sorting result unit 1302 may be configured to obtain a service dimension statistical search sorting result of a previous unit time, merge the service dimension statistical search sorting result of the previous unit time and the service dimension statistical incremental data of the current time, and sort the result to obtain a service dimension statistical search sorting result of the current time.
Compared with the prior art that a large amount of data is read, requested and calculated and the memory consumption of up to a plurality of G is usually consumed in window statistics of an addition mode or a subtraction mode, the TOPN sequencing at the current moment is performed based on the service dimension statistic increment data, and the method and the device can greatly reduce the system memory consumption and the network overhead when calculating the TOPN result of the service dimension statistics at the unit time level, and remarkably improve the window statistical efficiency.
In the embodiment of the present application, the keywords may include application scenarios of keyword search with unpredictable properties updated in real time, for example, keywords searched by a user in each minute in a search service may be considered to be determined by the user and may not be predictable by the system. In other application fields, the business data may also be obtained in real time, and then the key fields specified in each piece of business data are searched and counted under the specified business dimension, and window counting results are obtained, such as the amount of money paid out in shopping data records of all users in the last month, or the number of commodities committed by all buyers in the last month are counted and updated every day. When window statistics of the service data like this is performed, a key field may be specified in advance, then window statistics of the specified key field may be performed on each piece of updated service data, and similarly, the increment data based on unit time described in the embodiment of the present application may be used to perform window statistics, thereby greatly reducing the amount of processed data, reducing the system memory overhead and network overhead, and improving the system server performance.
The window statistics device for the data provided by the embodiment of the application realizes the window statistics of the data by adopting a mode based on incremental data. Compared with the data needing to be processed in the prior art, the incremental data in unit time is much smaller in data volume level, so that the window statistics of the statistical data is completed in an incremental data-based mode, the memory overhead and the network overhead of the system can be greatly reduced, and the overall processing performance of the system is improved. In the embodiment of the application, a negative statistical value of the current statistical time plus the period time of the statistical period is generated when the business dimension statistics of unit time is processed and is stored in the database. Therefore, when the window statistics of unit time is carried out at the periodic time, the expired business dimension statistical data before one statistical period of the periodic time can be offset, and the accuracy of calculating the business dimension statistical incremental data in the method is effectively guaranteed. By adopting the embodiment of the application, the system load and the network overhead can be effectively and greatly reduced, and the data processing efficiency and the server performance of window statistics are improved.
As described above, the data window statistics method or apparatus provided in the present application may be used for window statistics performed by a server in a search system, and thus, when the search system performs window statistics on search data based on the service dimension statistical incremental data to obtain a service dimension statistical computation result table within a window time window length or service dimension TOPN unit time level computation results, etc., the memory overhead and network overhead of the system may be greatly reduced, the system load may be reduced, the overall performance of the system including a processing server and a database may be improved, and network resources and hardware implementation costs may be correspondingly saved. Therefore, based on the above-mentioned data window statistical method or device, the present application provides a data window statistical system, and fig. 10 is a schematic structural diagram of an embodiment of the data window statistical system provided in the present application. Specifically, the system may include, for example, a search service system or a window statistics processing system for business data. Specifically, as shown in fig. 10, the system for counting the windows of the data may include:
the first data processing unit 201 may be configured to obtain statistical data, perform statistics on business dimensions of unit time on the statistical data, generate unit time level data of the business dimensions at a statistical time, and generate periodic window negative data of the business dimensions at the statistical time plus a window time window length time; the method can also be used for calculating business dimension statistical increment data of the window time window length at the current moment based on the unit time level data and the periodic window negative data;
the database 202 can be used for storing the generated unit time level data and the corresponding cycle window negative data, and storing the window statistical result calculated by the second data processing unit 203;
the second data processing unit 203 may be configured to obtain unit time level data of a service dimension at a current time and a historical window statistical result of a service dimension of a previous unit time at the current time; the method can also be used for inquiring the periodic window negative data of the current moment from the stored historical unit time level data, and calculating to obtain the business dimension increment data of the current moment according to the periodic window negative data and the unit time level data of the business dimension of the current moment; the method can also be used for determining the window statistical result of the business dimension at the current moment based on the historical window statistical result and the business dimension increment data at the current moment.
Of course, as mentioned above, in some specific application scenarios, the service dimension statistical window calculation result or the service dimension top n search ranking result may be obtained by calculation based on the service dimension statistical incremental data. Fig. 11 is a schematic structural diagram of an application scenario of an embodiment of the system for window statistics of data provided by the present application. In a specific embodiment of the system according to the present application, the second data processing unit 203 is configured to perform at least one of the following:
inquiring service dimension window result data of the current time in a window counting time period from the stored historical service dimension window result data, and updating the historical service dimension window result data after merging and operating the service dimension window result data and the service dimension increment data;
and obtaining a service dimension statistic search sequencing result of the last unit time, combining the service dimension statistic search sequencing result of the last unit time and the service dimension statistic incremental data of the current moment, and then sequencing to obtain a service dimension statistic search sequencing result of the current moment.
Obtaining a service dimension statistic search sorting result of a previous unit time from the service dimension statistic search sorting results stored in the database 202, merging the service dimension statistic search sorting result of the previous unit time and the service dimension statistic incremental data of the current time statistic window, sorting to obtain a service dimension statistic search sorting result of a current time, and storing the service dimension statistic search sorting result of the current time into the database 202.
According to the data window counting method, device and system, the data window counting is achieved by adopting a mode of counting incremental data based on business dimensions, the memory overhead of the system can be greatly reduced, the system performance is improved, the network overhead is reduced, and the data processing efficiency of the window counting system is improved.
Although the present application refers to the description of data storage and information interaction methods, such as business dimension statistics of keywords, data storage in the form of data tables and HBase databases, data computation methods involved in window statistics, window statistics in addition or subtraction modes, and the like, the present application is not limited to the cases described in the industry standards, information interaction or computation standards or embodiments. Some embodiments with slight modifications based on the implementation of the window statistical method, data storage, information exchange or embodiment description can also achieve the same, equivalent or similar implementation effects or the expected implementation effects after modification. The application of these modified or deformed calculation methods, information interaction and information judgment feedback manners, data storage manners, etc., may still fall within the scope of the alternative embodiments of the present application.
Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
The units, devices or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of a plurality of sub-modules or sub-units, and the like.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.