HK1154967A1

HK1154967A1 - A query method, system and device for verticle search

Info

Publication number: HK1154967A1
Application number: HK11109117.6A
Authority: HK
Inventors: 何杰
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2009-11-02
Filing date: 2011-08-29
Publication date: 2012-05-04
Also published as: CN102053983A; CN102053983B

Abstract

The embodiment of the invention discloses a method, a system and a device for querying a vertical search, wherein the method comprises the following steps: a query server acquires the query information of a user; the query server acquires a query result from a lookup dictionary of a commodity category server according to the query information, wherein the query result refers to the commodity sub-categories under the commodity category matched with the query information and corresponding weight of the commodity sub-categories; and the query server sorts the commodity sub-categories in the query result according to the corresponding weight thereof, and sends the obtained sort result to the user, so that the user looks up the commodity sub-categories, and a log server generates a log according to the commodity category looked up by the user and the query information, and sends the log to an analysis server to carry out statistical analysis and then obtain a statistical analysis result, wherein the statistical analysis result is used for updating the lookup dictionary of the commodity category server for subsequent query. Through the method, the system and the device provided by the embodiment of the invention, the query result returned to the user according to the click records of the user is implemented, and the correlation between the query result and the user query is improved.

Description

Query method, system and device for vertical search

Technical Field

The present application relates to the field of network technologies, and in particular, to a vertical search query method, system, and apparatus.

Background

With the increasing development of the internet, the amount of information stored on the internet is becoming huge. When people need to obtain special information of a certain aspect, searching is carried out through a search engine. However, the information amount on the internet is too large, and the accuracy of the search result obtained by adopting the general search mode is poor, so that the vertical search mode is rapidly developed. The vertical search is a specialized search engine for a certain industry, is the subdivision and extension of the search engine, integrates certain special information in a webpage library once, extracts required data from directional subsections, processes the data and returns the data to a user in a certain form. Compared with a new search engine service mode which is provided by a general search engine and has the advantages of large information amount, inaccurate query, insufficient depth and the like, the method provides information and related services with certain value aiming at a certain specific field, a certain specific crowd or a certain specific requirement. The search engine is characterized by being special, precise and deep, has industrial colors, and is more concentrated, specific and deep compared with the mass information disorder of a general search engine.

The application directions of the vertical search engine are many, such as enterprise library search, supply and demand information search engine, shopping search, house property search, talent search, map search, mp3 search, picture search and the like, and various kinds of information of almost all industries can be further refined into various kinds of vertical search engines.

When vertical searches are used for shopping searches, a user enters a query term shopping at a B2C (Business to Customer, Business to Customer shopping mode) or C2C (Customer to Customer, Customer to Customer shopping mode) shopping website, typically returning two parts of results: 1. navigation information of commodity classification, 2. search results related to the query. The navigation commodity classification names are organized according to the tree structure, and a user can conveniently locate a more accurate search result through commodity classification information from top to bottom along the path of the tree structure.

The commodity category tree structure is stored in a data table corresponding to the database, data input and maintenance need to be carried out manually, and display of each commodity in the B2C or C2C website must belong to a certain node or a plurality of nodes of the commodity category tree.

The current e-commerce website is often too large in the number of commodities, resulting in excessive commodity classification. In the billion-scale quantity of goods, the goods category tree usually approaches ten thousand nodes, and the number of category nodes of each level is often as high as dozens. When the user inquires, the commodity classification information displayed to the user is too much, and the user cannot be informed of the commodity categories which are more important to the inquiry of the user. For the problem, the current mainstream solution is to count the number of returned results in each category one by one when the user queries. Then the commodity categories are sorted from large to small according to the quantity of commodities, and a certain threshold value is set. The categories for which the number of items is below this threshold are hidden. The purpose of reducing the classification quantity is achieved.

In the process of implementing the present application, the inventor finds that the prior art has at least the following problems:

(1) the displayed categories have low relevance to the user's query.

(2) There is no mechanism between the categories of goods to decide which category of goods is more important.

(3) The number of categories displayed for the goods is controlled only by a threshold value, so that the categories with high relevance are hidden.

Disclosure of Invention

The embodiment of the application provides a query method, a query system and a query device for vertical search, which are used for improving the correlation between a query result and a user query.

The embodiment of the application provides a query method for vertical search, which is applied to a system comprising a query server, an analysis server and a log server, and is characterized by comprising the following steps:

the query server acquires query information of a user;

the query server acquires a query result from a query dictionary of the commodity category server according to the query information, wherein the query result is a sub commodity category under the commodity category matched with the query information and a corresponding weight of the sub commodity category;

and the query server sorts the sub-commodity categories in the query result according to the corresponding weights, sends the sorting result to the user, enables the user to check, enables a log server to generate a log according to the commodity categories checked by the user and the query information, and sends the log to an analysis server for statistical analysis to obtain a statistical analysis result, wherein the statistical analysis result is used for updating a query dictionary of the commodity category server and is used for subsequent query.

Before the query server sorts the sub-commodity categories in the query result according to the corresponding weights, the method further includes: setting a weight threshold value, and sorting the sub-commodity categories with the weights larger than the weight threshold value in the query result according to the corresponding weights.

Before the query server obtains the query information of the user, the method further includes:

the method comprises the steps that a front-end server obtains query information of a user, wherein the query information comprises query words and commodity categories of the user;

the front-end server performs normalization processing on the query words and acquires commodity IDs corresponding to commodity categories;

and the front-end server forwards the normalized query word and the commodity category ID to the commodity category server.

Wherein the method is characterized in that the raw materials are mixed,

before the log server generates a log according to the commodity category viewed by the user and the query information, the method further comprises the following steps:

acquiring query information of the user forwarded by a front-end server;

the analysis server performs statistical analysis to obtain a statistical analysis result, wherein the statistical analysis result is used for updating the query dictionary of the commodity category server and is used for subsequent queries, and the statistical analysis result specifically comprises the following steps:

receiving logs in preset time sent by the log server at regular time;

performing statistical analysis according to the logs within a preset time to obtain a statistical analysis result, wherein the statistical analysis result is a commodity category checked by the user and a corresponding weight; the weight comprises the click times corresponding to the commodity categories viewed by the user and the click probabilities corresponding to the commodity categories in the same level;

generating a query file according to the commodity category tree and the statistical analysis result;

and sending the query file to the query server so that the query server updates the query dictionary of the commodity category server according to the query file, and the user can perform subsequent query.

Before the query server sorts the sub-commodity categories in the query result according to the corresponding weights thereof and sends the sorted result to the user, the method further includes:

and the query server splices the query result, wherein the splicing comprises the acquisition of the commodity category corresponding to the commodity category ID in the query result.

The embodiment of the present application provides a query system for vertical search, which is characterized by comprising:

the query server is used for acquiring query information of a user; acquiring a query result in a query dictionary of the commodity category server according to the query information, wherein the query result is a sub commodity category under the commodity category matched with the query information and a corresponding weight of the sub commodity category; sorting the sub-commodity categories in the query result according to the corresponding weights, and sending the sorting result to the user to enable the user to check the sorting result; acquiring a statistical analysis result sent by an analysis server, and updating a query dictionary of the commodity category server according to the statistical analysis result for subsequent query;

and the log server is used for generating a log according to the commodity category viewed by the user and the query information and sending the log to the analysis server.

The analysis server is used for receiving the log sent by the log server; performing statistical analysis on the log to obtain a statistical analysis result; and sending the statistical analysis result to the query server.

Wherein, the inquiry information includes inquiry word and commodity category, the log server includes:

the acquisition module is used for acquiring the query information of the user forwarded by the front-end server;

the generating module is used for generating a log according to the commodity category viewed by the user and the query information;

and the sending module is used for sending the log generated by the generating module to the analysis server for statistical analysis to obtain a statistical analysis result, and the statistical analysis result is used for updating the query dictionary of the commodity category server and is used for subsequent query.

Wherein the analysis server comprises:

the receiving module is used for receiving the log sent by the log server;

the statistical analysis module is used for performing statistical analysis on the logs received by the receiving module to obtain a statistical analysis result;

and the sending module is used for sending the statistical analysis result obtained by the statistical analysis module to a query server, so that the query server updates a query dictionary of the query server for subsequent query.

Wherein the statistical analysis module comprises:

the statistical analysis submodule is used for performing statistical analysis according to the log acquired by the acquisition module within preset time to acquire a statistical analysis result, wherein the statistical analysis result is the commodity category viewed by the user and the corresponding weight; the weight comprises the click times corresponding to the commodity categories viewed by the user and the click probabilities corresponding to the commodity categories in the same level;

generating a submodule: and the query file is used for generating the statistical analysis result obtained by the statistical analysis submodule according to the commodity category tree.

The embodiment of the present application provides a server, which is used as a query server and applied to a system including the query server, an analysis server and a log server, and is characterized in that the server includes:

the acquisition module is used for acquiring the query information of a user;

the query module is used for acquiring a query result in a query dictionary of the commodity category server according to the query information acquired by the acquisition module, wherein the query result is a sub-commodity category under the commodity category matched with the query information and a corresponding weight of the sub-commodity category;

the sending module is used for sequencing the sub commodity categories in the query result acquired by the query module according to the corresponding weights of the sub commodity categories, sending the sequencing result to the user, enabling the user to check the sequencing result, enabling the log server to generate a log according to the commodity categories checked by the user and the query information, and sending the log to an analysis server for statistical analysis to obtain a statistical analysis result, wherein the statistical analysis result is used for updating a query dictionary of the commodity category server for subsequent query;

and the updating module is used for updating the query dictionary of the commodity category server according to the statistical analysis result obtained by the obtaining module, and sending the updated query dictionary to the query module for subsequent query.

Wherein the sending module is further configured to: setting a weight threshold value, and sorting the sub-commodity categories with the weights larger than the weight threshold value in the query result according to the corresponding weights.

The query information comprises the query words subjected to normalization processing and commodity category IDs corresponding to the commodity categories;

the system further comprises a splicing module used for splicing the query result, wherein the splicing comprises the step of obtaining the commodity category corresponding to the commodity category ID in the query result.

According to the method and the device, the query result of the user is returned according to the user click record, so that the correlation between the query result and the user query is improved. Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the present application or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a query method for vertical search according to an embodiment of the present disclosure;

FIG. 2 is an interaction diagram of a query method for vertical search in an embodiment of the present application;

FIG. 3 is an interaction diagram of a query method for vertical search in an embodiment of the present application;

FIG. 4 is a flowchart of a query method for vertical search according to an embodiment of the present application;

FIG. 5 is a clicked category tree generated according to a commodity category clicked for viewing in the embodiment of the present application;

FIG. 6 is a click category tree generated according to the commodity categories and the number of times clicked to view in the embodiment of the present application;

FIG. 7 is a flowchart of a query method for vertical search according to an embodiment of the present application;

FIG. 8 is a flowchart of a query method for vertical search according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a log server in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an analysis server according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an analysis server according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a query server in an embodiment of the present application;

fig. 13 is a schematic structural diagram of a query server in an embodiment of the present application.

Detailed Description

The embodiment of the application provides that: the query server acquires query information of a user; the query server acquires a query result from a query dictionary of the commodity category server according to the query information, wherein the query result is a sub commodity category under the commodity category matched with the query information and a corresponding weight of the sub commodity category; and the query server sorts the sub-commodity categories in the query result according to the corresponding weights, sends the sorting result to the user, enables the user to check, enables a log server to generate a log according to the commodity categories checked by the user and the query information, and sends the log to an analysis server for statistical analysis to obtain a statistical analysis result, wherein the statistical analysis result is used for updating a query dictionary of the commodity category server and is used for subsequent query.

The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings in the present application, and it is obvious that the described embodiments are some, not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As described in the background, vertical search engines are widely used, such as enterprise repository search, supply and demand information search engine, shopping search, property search, talent search, map search, mp3 search, and picture search. For example, to illustrate, it will be more easily understood that, for example, a shopping search engine, the overall process is roughly as follows: after a webpage is captured according to the search requirement of a user, webpage commodity information is extracted, commodity names, prices and brief introduction are extracted, even the brief introduction of the notebook computer can be further subdivided into 'brands, models, CPUs, memories, hard disks, display screens', and search results are returned to the user. In order to improve the relevance between returned information and information to be searched according to the search requirement of a user, the application provides a vertical search query method.

The embodiment of the application provides a query method for vertical search, as shown in fig. 1, including the following steps:

step 101, the query server obtains query information of a user.

The query information may include the query terms input by the user and categories of goods input or selected by the user.

And step 102, the query server obtains a query result from a query dictionary of the commodity category server according to the query information, wherein the query result is a sub-commodity category under the commodity category matched with the query information and a corresponding weight thereof.

And 103, the query server sorts the sub-commodity categories in the query result according to the corresponding weights, sends the sorting result to the user, enables the user to check the sub-commodity categories, enables a log server to generate a log according to the commodity categories checked by the user and the query information, and sends the log to an analysis server to perform statistical analysis to obtain a statistical analysis result, wherein the statistical analysis result is used for updating a query dictionary of the commodity category server and is used for subsequent query.

Before the query server sorts the sub-commodity categories in the query result according to the weights corresponding to the sub-commodity categories, a weight threshold value can be set, and the sub-commodity categories with the weights larger than the weight threshold value in the query result are sorted according to the weights corresponding to the sub-commodity categories. The matched commodity category is obtained by inquiring the inquiry information in the inquiry dictionary, all the sub commodity categories under the commodity category can be sorted according to the weight, and only the sub commodity categories can be sorted according to the weight. When the partial commodity categories are sorted according to the weight, a weight threshold value can be preset, and only the sub-commodity categories with the weight larger than the weight threshold value in the query result are selected and sorted according to the corresponding weight. By sequencing the molecular commodity categories and sending the sequencing result to the user, the sequencing calculation amount can be reduced, and the sequencing calculation speed is improved.

The query method for vertical search provided in the embodiment of the present application is shown in fig. 2, where the query server is specifically a commodity category query server, and the analysis server is specifically a distributed file storage and parallel operation platform, and as shown in fig. 3, the query method specifically includes the following steps:

(1) the front-end server receives query information input by a user, the query information comprises query words and commodity categories, and the front-end server forwards the query words to the log query server for generating logs according to the query in the following process; and forwarding the query information to a commodity category query server for querying in the commodity category query server according to the query information, and returning a query result to the user for the user to click and view.

(2) The user clicks and checks the commodity category and the commodity according to the needs of the user at the front-end server, at the moment, a corresponding log record is generated by the log server in each clicking and checking action, and the log comprises the commodity clicked and checked by the user and the commodity category to which the commodity belongs. After a period of time, the log server imports all logs in the period of time into a distributed file storage and parallel operation platform for log storage and statistical analysis.

(3) The click analysis statistical program in the distributed file storage and parallel operation platform carries out statistical analysis on the logs in a period of time, wherein the analysis comprises the step of obtaining the weight of each commodity category click, and the weight reflects the correlation degree of the commodity category and the user query information, and preferably can be the click times or click probability corresponding to the query information. When a user clicks and checks commodities, each log records a corresponding user query word clicked and checked at the time and a commodity category to which the commodities clicked and checked belong, so that the commodity category and weight of the commodities checked by the user and the same query word can be obtained according to a large number of logs. It can be known from this that, when the user inputs the query term, the degree of correlation between each item category and the query term, that is, the degree of user's attention. And the click analysis statistical program outputs the statistical analysis result to the commodity category query server.

(4) And the commodity category query server compiles a query dictionary in a Key-Value form through a hash algorithm according to the statistical analysis result, so that the query speed is improved. The Key is a query word input by a user, the Value is a plurality of commodity categories corresponding to the query word and corresponding weights, and the query dictionary embodies the relevance of the query word and each commodity category. When the query words and the commodity categories of the user are obtained, the query words and the commodity categories can be queried in the query dictionary, the commodity categories related to the query words are returned to the user, and the commodity categories are arranged according to the relevance degree, namely the weight of the commodity categories, so that the user can select the commodity categories.

When it needs to be explained, updating the query dictionary according to the statistical analysis result of this time may be a query dictionary generated only according to the query result of this statistical analysis, or may add the statistical analysis result of this time to the original query dictionary to generate a new query dictionary, or may delete the data corresponding to the statistical analysis result in the query dictionary within the specified time period or before the specified time period as needed. For example: for the clothing commodities, when the season of season change comes, the previous statistical analysis result is not suitable for the subsequent query, so that the data corresponding to the statistical analysis result which is not used for the subsequent query is deleted to ensure the accuracy of the query.

As can be seen from the above description, the query method is a cyclic process. When a user inquires, the returned inquiry result is an inquiry result obtained by inquiring in an inquiry dictionary generated by the click check behavior of the commodity category inquiry server in the last time period; and the log generated according to the behavior of clicking and checking by the user in the query is subjected to statistical analysis in the next time period and is sent to the commodity category query server for subsequent commodity category query.

The embodiment of the application provides a query method for vertical search, as shown in fig. 4, including the following steps:

step 401, the user inputs query information.

Here, the query information required for performing a query is defined as a query term, and may also include a commodity category.

The front-end server provides a window for interaction with the user by inquiring the page. And accessing the query page by the user through the browser, and inputting query information to be queried in the query page for querying.

For example, when a user needs to buy a T-shirt, the user logs in to an online trading website, and inputs query information "T-shirt" that the user wants to query through a query page of the website. At this time, if the user only wants to check the information about the men's T-shirt, the user can select the commodity category through the drop-down box popped up by the system when inputting the "T-shirt", thereby inputting the commodity category while inputting the query word, and limiting the query range, for example: the user selects the ' T-shirt men ' in the drop-down frame, wherein the ' men ' is the commodity category, and the query information is ' T-shirt men's clothing '.

The user can also only input the query word "T-shirt", and when the system provides 'men's clothing, 'women's clothing and 'casual clothing' on the page returned according to the 'T-shirt', the user selects 'men' clothing ', wherein the' men 'clothing is a commodity category, and the query information is' T-shirt men 'clothing'.

Step 402, the front-end server obtains the query information input by the user.

The method comprises the following steps that a front-end server obtains query information input by a user through a query page, and specifically comprises the following steps:

(1) the front-end server acquires the query information input by the user through the query page.

(2) And the front-end server acquires the commodity category ID corresponding to the commodity category in the query information.

In order to facilitate the query of the rear-end commodity category query server, the front-end server does not directly send the commodity categories in the obtained query information to the commodity category query server, but sends the commodity category IDs corresponding to the commodity categories to the commodity category query server, so that the front-end server needs to obtain the commodity category IDs corresponding to the commodity categories.

At this time, if the query information is only a query word and there is no commodity category, it is not necessary to acquire a commodity category ID.

Of course, the front-end server may also forward the commodity category to the rear-end commodity category query server, and the rear-end commodity category query server obtains the commodity category ID corresponding to the commodity category and then queries the commodity category ID.

And step 403, the front-end server forwards the query word and the commodity category ID in the query information to a rear-end commodity category server.

The front-end server and the rear-end commodity category query server transmit data through the interfaces. And the input and the output of the front-end interface access and return results in an http mode. The input parameters adopt http protocol, and the parameters are submitted to a commodity category query server in a Get mode. The incoming parameters are mainly two: query terms and item category IDs.

The format of the input connection is as follows:

http:// hostquery ═ query term ═ commodity category id

The returned result adopts the format of XML, and the specific format is as follows:

<？xml version＝″1.0″encoding＝″GBK″？>

<conf>

</module>

</conf>

and step 404, the commodity category query server at the back end queries according to the received query information and sends a query result to the front-end server.

Step 405, the front-end server displays the query result sent by the rear-end commodity category query server to the user.

When the query information is only a query word, the query result is all the commodity categories matched with the query word and the corresponding weights (namely the click times or click probabilities of the commodity categories) of the commodity categories, and the commodity categories are arranged in the order of the weights from high to low; when the query information is the query word and the commodity category, the query result is all the sub commodity categories under the commodity category matched with the query information and the corresponding weights (namely the click times or click probabilities of the commodity categories), and the sub commodity categories are arranged in the order of the weights from high to low.

Preferably, the query result page displayed to the user includes both all sub-commodity categories under the commodity category matched with the query information and the detailed list of each item of commodity information under the sub-commodity categories, so that the user can directly select a specific commodity in the query result for viewing.

And 406, the user selects the commodity category to be checked from the query result to click to check, and the log server generates a log according to the click.

And the front-end server displays the query result returned by the rear-end commodity category query server to the user through the query result display page. And the user selects the commodity category to be checked from the displayed query result to click and check, clicks in the sub-category of the checked commodity category, and returns the commodity category to the user for checking according to the query method by the commodity category query server. And repeating the steps, searching the commodities to be checked by checking the commodity categories step by step, and checking by clicking.

Each item displayed above has added to it a connection to a "journal server". In the process of clicking and checking, after each commodity is clicked, a corresponding log is generated according to the clicking behavior and stored in a log server. Wherein, each click behavior is a click log, and the format of the log is shown in table 1:

table 1 Log Format

Query term

Query categories

Commodity ID

Categories of commodities

Commodity attributes

The query words are query words in query information input by a user, the query categories are commodity categories selected and clicked by the user from commodity categories returned by a page, commodities are selected from the page returned by the click of the commodity categories and clicked and checked, at the moment, the commodity categories are the query categories, so that the number of the query categories can be multiple, and only the query categories stored in the log are the commodity categories clicked last time before the user clicks and checks the commodities. After clicking the query category, the query result returned to the user comprises all the sub-commodity categories under the query category and the detailed list of each item information under the sub-commodity categories. The commodity ID is an ID number corresponding to each commodity and is used for uniquely identifying each commodity. The commodity category is a commodity category to which the commodity clicked and viewed by the user directly belongs. The item category may be a sub-item category of the query category. The commodity attribute is attached information corresponding to the commodity, for example: the brand name.

For example, when the query word input by the user is "T-shirt", in the query result returned according to the commodity category "long-sleeved T-shirt" clicked by the user, a T-shirt with the brand of poll is selected from the detailed commodity list, the commodity ID corresponding to the commodity is 12200021, the commodity category directly belonging to the T-shirt is men's T-shirt, "men's T-shirt" is the sub-category of "long-sleeved T-shirt", and a log record as shown in table 2 is generated according to the above information.

TABLE 2 Log records

Query term

Query categories

Commodity ID

Categories of commodities

Commodity attributes

T shirt

Long-sleeved T-shirt

12200021

Men's T shirt

Brand name: POLO

Step 407, the log server periodically imports the generated log into the distributed file storage and parallel operation platform.

The distributed file storage and parallel operation platform is used for storing the generated logs and calculating when all the logs are analyzed.

And each log record is generated simultaneously along with the click and the check of the user, and the log server leads all logs in the time period into the distributed file storage and parallel operation platform at regular time. Wherein the regular operation of the log server may be daily, or every 12 hours, etc.

And step 408, carrying out statistical analysis on the stored logs at regular time by the click analysis statistical program to obtain a statistical analysis result.

And the log server regularly imports the generated logs into a distributed file storage and parallel operation platform, and clicks an analysis statistical program to regularly perform statistical analysis on the stored logs to obtain a statistical analysis result. The statistical analysis object can be the log of the last ten days or the log of the last two weeks, and the specific time can be adjusted according to the empirical value or the statistical requirement.

Specifically, the click analysis statistical program performs statistical analysis on the stored logs at regular time to obtain a statistical analysis result, and the method specifically includes the following steps:

(1) and acquiring a log for statistical analysis.

The log server can update the logs in the distributed file storage and parallel operation platform at regular time, and when the statistical program is clicked to perform statistical analysis on the stored logs at regular time, the updated logs need to be acquired so as to acquire the latest logs, and the accuracy of statistics is improved.

(2) And carrying out normalization processing on the query words.

Because the query word input by each user does not necessarily meet the statistical standard of the click analysis statistical program, normalization processing needs to be performed on the query word in order to facilitate statistics according to the query word. The normalization process includes removing unnecessary words and redundant spaces in the query words, performing conversion between capital and lower cases of letters, conversion between full angles and half angles, conversion between simplified and traditional bodies, conversion between punctuations, conversion between Chinese numbers and the like. The normalized query terms can be used directly for statistics by the click analysis statistical program.

(3) And summarizing the data of the log to generate click distribution data.

When thousands of users query, the same query words are input and the same query categories are clicked, the query words and the query categories queried by the users are summarized according to logs in a period of time, and the times of clicking the commodity categories according to each query word are acquired.

For example, there are 400 items clicked to view by the query word "T-shirt", 200 of which belong to the category of men's wear items, 100 of which belong to the category of women's wear items, and 100 of which belong to the category of sports and leisure items. Of the 200 items belonging to the category of men's clothing, 200 items belong to the category of short-sleeved T-shirt items, and 0 item belongs to the category of long-sleeved T-shirt items. Of the 100 items belonging to the women's clothing category, 100 items belong to the short-sleeve T-shirt item category, and 0 item belongs to the long-sleeve T-shirt item category. Of the 100 items belonging to the sport and leisure items category, 60 items belong to the lovers' clothing item category, and 40 items belong to the sport T-shirt item category.

Each click viewing process generates a corresponding log, and click distribution data shown in table 3 is obtained according to the summary of the logs:

TABLE 3 click distribution data

Query term	Click categories: number of times	Click categories: number of times	Click categories: number of times	Click categories: number of times
					T shirt	Men's T shirt: 200	Women's dress T shirt: 100	Lovers' dress: 60	Sports T-shirt: 40

(4) And acquiring the commodity category tree.

In order to facilitate the query of a user and the management of a system, all commodities are classified according to attributes, each commodity has a commodity category to which the commodity belongs, and a commodity category tree is generated by all the commodity categories according to a logic sequence.

In order to generate the click tree according to the click distribution data, the click analysis statistical program must know the positions of the commodity category clicked by the user in all the commodity categories, namely the positions of the commodity category in the commodity category tree, so that the commodity category tree needs to be acquired.

(5) And combining the commodity category tree and generating a click category tree according to the click distribution data.

The commodity category tree includes all commodity categories, and the relationship between the commodity categories is visually represented in the form of a tree. The click distribution data comprises all commodity categories clicked by the user, and the click times of each commodity category are embodied in a text mode. The click category tree combines the information in the commodity category tree and the click distribution data, takes the commodity category tree as a representation form, adds each piece of data in the click distribution data to a corresponding position in the commodity category tree, associates the logical relationship between the commodity categories represented by the commodity category tree and the click times represented by the click distribution data, and jointly represents the commodity categories.

First, a click category tree is generated. The click category tree is generated by constructing according to the commodity category in the click distribution data and the corresponding position of the commodity category in the commodity category tree, and the click category tree generated according to the click distribution data in table 3 is shown in fig. 5.

And secondly, adding the click times of the commodity category in the click category tree. The click times corresponding to the respective commodity categories in the click analysis data are added to the commodity categories corresponding to the click category tree, and the generation of the click category tree is completed, and the click category tree including the click times generated according to the click distribution data in table 3 is shown in fig. 6.

(6) And generating a query file corresponding to the click category tree.

The click information of the user is simply and clearly embodied in the form of the click category tree, and in order to facilitate the query of the commodity category query server at the rear end, the click information of the user included in the click category tree needs to be expressed in the form of text, so that a query file corresponding to the click category tree is generated according to the number of the click categories and is used for updating a query dictionary of the commodity category query server.

It should be noted that, for the click category tree, when the commodity category in the click category tree is not clicked and viewed by the user, the click number of the commodity category in the click category tree is zero; for the query file, when the commodity category in the click category tree is not clicked and viewed by the user, the query file corresponding to the commodity category number will not be generated, that is, the query file only includes the data information corresponding to the commodity category clicked and queried by the user in the click category tree.

According to the click category tree including the number of clicks shown in fig. 6, the generated query file is:

t shirt root category men's clothing: 200 of women's dress: 100, sport and leisure: 100

T-shirt Men's short-sleeve T-shirt: 200

T-shirt female short-sleeve T-shirt: 100

T-shirt sports leisure lovers' clothing: 60 sports T-shirt: 40

Through the above steps 408(1) to 408(6), the statistical analysis of the log is completed, and the statistical analysis result, that is, the clicked item category, the logical relationship between the clicked item categories, and the click times of each item category, is obtained.

It should be noted that the number of clicks of a commodity category represents the weight of the commodity category in all sub-categories of the same-level commodity, that is, the parent category to which the commodity category belongs. In addition, the weight of the item category may also be embodied in the form of click probability in all sub-categories under the parent category to which the item category belongs, and the click probability is obtained from the corresponding number of clicks. When the weight of the query file is expressed in the form of click probability, the query file is:

t shirt root category men's clothing: 50% of women's dress: 25% sport and leisure: 25 percent of

T-shirt Men's short-sleeve T-shirt: 100 percent

T-shirt female short-sleeve T-shirt: 100 percent

T-shirt sports leisure lovers' clothing: 60% sports T-shirt: 40 percent of

And step 409, outputting the query file as a result by the click analysis statistical program. The output query file is sent to a rear-end commodity category query server, so that the rear-end commodity category query server generates a corresponding query dictionary according to the query file for subsequent query.

The embodiment of the present application provides a query method for vertical search, as shown in fig. 7, including the following steps:

step 701, the commodity category query server at the back end compiles the query file into a query dictionary.

When receiving the query information sent by the front-end server, the rear-end commodity category query server needs to query the commodity categories according to the query information and sends the query results meeting the conditions to the front-end server. Therefore, when the backend commodity category server performs query, the commodity category for query needs to be compiled into the query dictionary first, so that the backend commodity category query server performs query.

The query dictionary is obtained by compiling a query dictionary compiler, and specifically, the query dictionary compiler compiles an output result (namely, a query file) of the click analysis statistical program into a corresponding memory mapping file in a Key-Value form through a hash algorithm for subsequent query. Wherein, Key is the query word input by the user, and Value is a plurality of commodity categories and corresponding weights corresponding to the query word.

For example: the query file generated in step 408(6) is as follows:

T-shirt Men's short-sleeve T-shirt: 100 percent

T-shirt female short-sleeve T-shirt: 100 percent

T-shirt sports leisure lovers' clothing: 60% sports T-shirt: 40 percent of

The query dictionary compiled by a hash algorithm in a Key-Value form is as follows:

key: t-shirt root category Value: 50% of men's clothing; 25% of women's clothes; 25% of sport and leisure;

key: t shirt men's clothing Value: the T-shirt has the advantages that 100 percent of T-shirt;

key: t-shirt female dress Value: the T-shirt has the advantages that 100 percent of T-shirt;

key: t-shirt sports and leisure Value: 60% of lovers' clothes; sports T-shirt 40%.

And the query dictionary compiling program carries out query dictionary compiling on the output result query file of the click analysis statistical program, and compiles the output result query file into a hash algorithm memory mapping file in a Key-Value mode for subsequent query. It should be noted that the hash algorithm memory mapping file can be directly loaded to the memory during program initialization to improve the system initialization efficiency. In addition, for convenience of description, the commodity category in the embodiment of the present application all adopts an expression form of a commodity category name, but in actual operation, the expression form of the commodity category may be a commodity category name or a commodity category ID corresponding to the commodity category, where the form of the commodity category ID is convenient for querying the commodity category.

Step 702, the rear-end commodity category query server loads the query dictionary at regular time.

The loading of the query dictionary refers to that when the server is started, the query dictionary file compiled in the form of a Hash algorithm memory mapping file is directly mapped to a memory in a memory mapping mode.

Step 703, the user inputs the query information.

The front-end server provides a window for interaction with the user by inquiring the page. The user accesses the query page through the browser, inputs query information to be queried in the query page for querying, wherein the query information is a query word and can also comprise a commodity category.

For example, when a user needs to buy a T-shirt, the user logs in to an online trading website, and inputs query information "T-shirt" that the user wants to query through a query page of the website. At this time, if the user only wants to check the information about the men's T-shirt, the query range can be limited by inputting the query word and inputting the category of the goods, that is, "the T-shirt men's clothing" is input on the query page, wherein "the men's clothing" is the category of the goods.

Step 704, the front-end server obtains the query information input by the user and forwards the query information to the commodity category query server at the back end.

The method comprises the following steps that a front-end server obtains query information input by a user through a query page and forwards the query information to a commodity category query server at the rear end, and specifically comprises the following steps:

It should be noted that, if the query information is only a query term and there is no commodity category, it is not necessary to obtain the commodity category ID.

Step 705, the front-end server forwards the query word and the commodity category ID in the query information to the rear-end commodity category server.

The format of the input connection is as follows:

http:// hostquery ═ query term ═ commodity category id

<？xml version＝″1.0″encoding＝″GBK″？>

<conf>

</module>

</conf>

step 706, the rear-end commodity category query server obtains query information, and normalizes the query terms in the query information.

Because the query term input by each user does not necessarily accord with the query standard of the rear-end commodity category query server, in order to facilitate query according to the query term, normalization processing needs to be carried out on the query term. The normalization process includes removing unnecessary words and redundant spaces in the query words, performing conversion between capital and lower cases of letters, conversion between full angles and half angles, conversion between simplified and traditional bodies, conversion between punctuations, conversion between Chinese numbers and the like. The normalized query terms can be directly used for querying a commodity category query server at the back end.

And 707, the rear-end commodity category query server queries in the query dictionary according to the query information to obtain a query result.

The rear-end commodity category query server performs a query in the query dictionary loaded in step 702 based on the query information. Because the query dictionary file compiled in the form of the Hash algorithm memory mapping file is directly mapped to the memory in a memory mapping mode, the query is carried out in the memory by using Hash lookup in the whole query process, and the query efficiency is ensured.

For example: when the query word input by the user is a T-shirt, the commodity category query server at the rear end queries according to the query dictionary generated in the step 701, wherein the Key is the T-shirt, and the corresponding Value is obtained, namely the men's clothing is 50%; 25% of women's clothes; sports and leisure are 25 percent. When the query word input by the user is ' T-shirt sports and leisure ' or ' operation and leisure ' by clicking the commodity category, the commodity category query server at the rear end queries according to the query dictionary generated in the step 701, and at the moment, Key is ' T-shirt sports and leisure ', and obtains the corresponding Value, namely 60% of lovers ' clothes; sports T-shirt 40%.

Through the method, the rear-end commodity category query server obtains the query result of the query information, the query result comprises the commodity categories relevant to the query information and the weights corresponding to the commodity categories, and the query results are arranged according to the sequence of the weights from high to low.

And 708, splicing the inquired commodity category information by the rear-end commodity category inquiry server, and returning the information to the front-end server.

When the product category query server at the back end queries in the query dictionary according to the query information, the query word and the product category ID acquired in step 705 are used. When the rear-end commodity category query server sends the query result to the front-end server, if the commodity category ID queried in the query dictionary is directly sent to the front-end server and is displayed to the user through the front-end server, the user cannot know the commodity category corresponding to the displayed commodity category ID, and therefore the user cannot click to view the commodity category. Therefore, the rear-end product category query server needs to add the product category information corresponding to the queried product category ID to the query result, or replace the corresponding product category ID, and then send the query result containing the product category information to the front-end server, so that the user can query through the front-end server.

It should be noted that the operation of adding the information of the commodity category corresponding to the queried commodity category ID to the query result or replacing the corresponding commodity category ID may also be completed by the front-end server.

And 709, the front-end server receives the query result sent by the commodity category query server at the rear end, and screens according to the query result.

The log server regularly imports the generated logs into the distributed file storage and parallel operation platform, the statistical analysis is carried out on the stored logs regularly by clicking the analysis statistical program, and the query dictionary is regularly loaded by the commodity category query server at the rear end. Therefore, the query result sent by the back-end commodity category server does not correspond to the relevant information of the commodity in the current front-end server in real time. For example: the timing operation may be once a day, and corresponding operation is performed for 8:00 a day earlier, and the user performs query 20:00 a day later. At this time, the data on which the backend commodity category query server is based is the data that the user queries 8:00 earlier in the day. If the information about the front-end commodity changes from 8:00 a day to 20:00 a day later, for example: when the product ID is changed, and the product under a certain product category is placed in a cabinet due to season change, the user cannot correctly find the product to be searched.

Therefore, the front-end server needs to screen the received query result and compare the obtained information of the commodity category with all the current commodity categories which accord with the query word.

And when the obtained commodity category is in accordance with the current commodity category information, the front-end server displays the commodity category information, and the display sequence of the commodity categories is sequentially arranged from left to right according to the weight corresponding to the commodity category from high to low.

And when the obtained commodity category does not accord with the current commodity category information, the front-end server shields the commodity category information and does not display the commodity category information.

The embodiment of the present application provides a query method for vertical search, as shown in fig. 8, specifically including the following steps:

step 801, a user inputs information through a front-end server.

Step 802, the front-end server forwards the query information input by the user to the commodity category query server, and the commodity category query server queries according to the query information.

The front-end server forwards the query information input by the user to the commodity category query server, and the method comprises the following steps of:

(3) And the front-end server forwards the query words and the commodity category IDs in the query information to a rear-end commodity category server.

The query of the commodity category query server is queried according to statistical analysis data acquired in the last time period, and before the query, the method comprises the following steps of:

(1) and compiling the query file obtained by the statistical analysis result into a query dictionary by the commodity category query server.

(2) And the commodity category query server loads the query dictionary at regular time.

After the commodity category query server obtains the query information of the user, the query is carried out according to the data for query, namely the query dictionary.

And step 803, the commodity category query server sends the query result to the front-end server, and the user selects the commodity category or commodity to be viewed through the display page of the front-end server to click to view.

And step 804, the log server generates a log according to the click check of the user, wherein each click check behavior corresponds to one log record.

And after clicking the commodity, generating a corresponding log according to the clicking behavior, and storing the log in a log server. Each click behavior is a click log, and the format of the log is shown in table 1. The query words are query words in query information input by a user, the user clicks and checks the commodity category and selects commodities from a page returned by clicking the commodity category for click and checking, at the moment, the commodity category is the query category, namely, the user clicks and checks the commodity category clicked last time before the commodities are clicked. The commodity ID is an ID number corresponding to each commodity and is used for uniquely identifying each commodity. The commodity attribute is attached information corresponding to the commodity, for example: the brand name.

And step 805, the time for importing the distributed storage and parallel operation platform is reached.

Step 806, the log server stores the imported distributed file in the preset time and the parallel operation platform.

Step 807, arrive at the statistical analysis time.

And 808, clicking the analysis statistical program to perform statistical analysis on the stored logs to obtain a statistical analysis result, and sending the statistical analysis result to the commodity category query server for updating data used for querying in the commodity category query server.

And the click analysis statistical program is positioned in the distributed file storage and parallel operation platform and is used for carrying out statistical analysis on the stored logs. Specifically, the click analysis statistical program performs statistical analysis on the stored logs at regular time to obtain a statistical analysis result, and the method specifically includes the following steps:

(1) and acquiring a log for statistical analysis.

(2) And carrying out normalization processing on the query words.

(3) And summarizing the data of the log to generate click distribution data.

(4) And acquiring the commodity category tree.

(6) And generating a query file corresponding to the click category tree.

(7) And the click analysis statistical program takes the query file as a result and outputs the result. The output query file is sent to a rear-end commodity category query server, so that the rear-end commodity category query server generates a corresponding query dictionary according to the query file for subsequent query.

The embodiment of the present application provides a query system for vertical search, including:

the query server is used for acquiring query information of a user; acquiring a query result in a query dictionary of the commodity category server according to the query information, wherein the query result is a sub commodity category under the commodity category matched with the query information and a corresponding weight of the sub commodity category; sending the query result to the user to enable the user to check; acquiring a statistical analysis result sent by an analysis server, and updating a query dictionary of the commodity category server according to the statistical analysis result for subsequent query;

The query information includes query terms and categories of goods, and the log server 900, as shown in fig. 9, includes:

an obtaining module 910, configured to obtain query information of a user forwarded by a front-end server;

a generating module 920, configured to generate a log according to the category of the commodity viewed by the user and the query information;

a sending module 930, configured to send the log generated by the generating module 920 to the analysis server for statistical analysis to obtain a statistical analysis result, where the statistical analysis result is used to update the query dictionary of the commodity category server for subsequent queries.

As shown in fig. 10, the analysis server 1000 includes:

a receiving module 1010, configured to receive a log sent by the log server;

a statistical analysis module 1020, configured to perform statistical analysis on the log received by the receiving module 1010 to obtain a statistical analysis result;

a sending module 1030, configured to send the statistical analysis result obtained by the statistical analysis module 1020 to a query server, so that the query server updates a query dictionary of the query server for subsequent queries.

As shown in fig. 11, the statistical analysis module 1020 includes:

a statistical analysis submodule 1021, configured to perform statistical analysis according to the log obtained by the obtaining module within a preset time to obtain a statistical analysis result, where the statistical analysis result is a commodity category and a corresponding weight viewed by the user; the weight comprises the click times corresponding to the commodity categories viewed by the user and the click probabilities corresponding to the commodity categories in the same level;

the generation sub-module 1022: and the query module is configured to generate a query file from the statistical analysis result obtained by the statistical analysis sub-module 1021 according to the commodity category tree.

An embodiment of the present application provides a server 1200, which is used as a query server and applied to a system including the query server, an analysis server, and a log server, as shown in fig. 12, and includes:

an obtaining module 1210, configured to obtain query information of a user; the statistical analysis server is used for acquiring statistical analysis results sent by the analysis server;

the query module 1220 is configured to obtain a query result from the query dictionary of the commodity category server according to the query information obtained by the obtaining module 1210, where the query result is a sub-commodity category under the commodity category matched with the query information and a weight corresponding to the sub-commodity category;

a sending module 1230, configured to sort the sub-commodity categories in the query result obtained by the query module 1220 according to the corresponding weights of the sub-commodity categories, and send the sorted result to the user, so that the user views the sub-commodity categories, and a log server generates a log according to the commodity categories viewed by the user and the query information, and sends the log to an analysis server for statistical analysis to obtain a statistical analysis result, where the statistical analysis result is used to update a query dictionary of the commodity category server for subsequent queries;

the updating module 1240 is configured to update the query dictionary of the commodity category server according to the statistical analysis result obtained by the obtaining module 1210, and send the updated query dictionary to the querying module 1220 for subsequent querying.

Wherein, the sending module 1230 is further configured to: setting a weight threshold value, and sorting the sub-commodity categories with the weights larger than the weight threshold value in the query result according to the corresponding weights.

as shown in fig. 13, the system further includes a splicing module 1250 configured to splice the query result, where the splicing includes obtaining the commodity category corresponding to the commodity category ID in the query result.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present application may be substantially or partially embodied in the form of a software product stored in a storage medium, and including instructions for causing a terminal device (which may be a mobile phone, a personal computer, a server, or a network device) to execute the method according to the embodiments of the present application.

The foregoing is only a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be considered to be within the protection scope of the present application.

Claims

1. A query method of vertical search is applied to a system comprising a query server, an analysis server, a log server, a front-end server and a commodity category server, and is characterized by comprising the following steps:

the front-end server acquires query information of a user and forwards the query information to the query server, wherein the query information comprises query words and commodity categories of the user; the front-end server performs normalization processing on the query words and acquires the commodity ID corresponding to the commodity category; the front-end server forwards the normalized query word and the commodity category ID to a commodity category server;

the query server acquires the query information of the user sent by the front-end server;

2. The method of claim 1, before the query server ranks the sub-commodity categories in the query result according to their corresponding weights, further comprising: setting a weight threshold value, and sorting the sub-commodity categories with the weights larger than the weight threshold value in the query result according to the corresponding weights.

3. The method of claim 1,

acquiring query information of the user forwarded by a front-end server;

receiving logs in preset time sent by the log server at regular time;

4. The method of claim 1, wherein before the query server sorts the sub-commodity categories in the query result according to their corresponding weights and sends the sorted result to the user, the method further comprises:

5. A vertical search query system, comprising:

The analysis server is used for receiving the log sent by the log server; performing statistical analysis on the log to obtain a statistical analysis result; sending the statistical analysis node to the query server;

the front-end server is used for acquiring the query information of the user and forwarding the query information to the query server before the query server acquires the query information of the user, wherein the query information comprises the query words and the commodity categories of the user; the front-end server performs normalization processing on the query words and acquires commodity IDs corresponding to commodity categories;

and the commodity category server is used for receiving the query words and the commodity category IDs which are sent by the front-end server and are subjected to normalization processing.

6. The system of claim 5, wherein the query information includes query terms and categories of goods, the log server comprising:

7. The system of claim 5, wherein the analytics server comprises:

the receiving module is used for receiving the log sent by the log server;

8. The system of claim 7, wherein the statistical analysis module comprises:

9. A server, as a query server, applied to a system including a query server, an analysis server, a log server, a front-end server, and a commodity category server, comprising:

the acquisition module is used for acquiring the query information of the user forwarded by the front-end server; the statistical analysis server is used for acquiring statistical analysis results sent by the analysis server; before the obtaining module obtains the query information of the user, the method further comprises the following steps: the method comprises the steps that a front-end server obtains query information of a user, wherein the query information comprises query words and commodity categories of the user; the front-end server performs normalization processing on the query words and acquires commodity IDs corresponding to commodity categories; the front-end server forwards the query words and the commodity category IDs subjected to normalization processing to the commodity category server;

10. The server of claim 9, wherein the sending module is further to: setting a weight threshold value, and sorting the sub-commodity categories with the weights larger than the weight threshold value in the query result according to the corresponding weights.

11. The server according to claim 9,

the query information comprises the normalized query words and commodity category IDs corresponding to the commodity categories;