Nothing Special   »   [go: up one dir, main page]

US10242112B2 - Search result filters from resource content - Google Patents

Search result filters from resource content Download PDF

Info

Publication number
US10242112B2
US10242112B2 US15/183,455 US201615183455A US10242112B2 US 10242112 B2 US10242112 B2 US 10242112B2 US 201615183455 A US201615183455 A US 201615183455A US 10242112 B2 US10242112 B2 US 10242112B2
Authority
US
United States
Prior art keywords
query
candidate
filter
filters
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US15/183,455
Other versions
US20170017724A1 (en
Inventor
Ian MacGillivray
Kaylin Spitz
Selena Sunling Yang
Varun Jasjit Singh
Emma S. Persky
Yonatan Erez
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US15/183,455 priority Critical patent/US10242112B2/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PERSKY, EMMA S., EREZ, YONATAN, SINGH, Varun Jasjit, YANG, Selena Sunling, MACGILLIVRAY, Ian, SPITZ, Kaylin
Publication of US20170017724A1 publication Critical patent/US20170017724A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Priority to US16/265,714 priority patent/US11372941B2/en
Application granted granted Critical
Publication of US10242112B2 publication Critical patent/US10242112B2/en
Priority to US17/850,655 priority patent/US11797626B2/en
Priority to US18/244,158 priority patent/US20240143679A1/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06F17/30867
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • G06F17/3064

Definitions

  • the Internet provides access to a wide variety of resources, for example, video files, image files, audio files, or Web pages, including content for particular subjects, book articles, or news articles.
  • a search system can select one or more resources in response to receiving a search query.
  • a search query is data that a user submits to a search engine to satisfy the user's informational needs.
  • the search queries are usually in the form of text, e.g., one or more query terms.
  • the search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources to provide search results that link to the selected resources.
  • the search results are typically ordered according to the scores and presented according to this order.
  • a search query is often an incomplete expression of a user's informational need.
  • a user may often refine a search query after reviewing search results, or may select a “suggested query” that is provided by a search engine to conduct another search.
  • a user may also attempt to filter within a set of search results.
  • the user may need to generate a filter term or operation, or rely on “hardcoded” filters that require expert knowledge and programming ahead of time, together with manual internationalization, in order to be effective.
  • new filtering terms may be emergent and escape the notice of both the user and resource curators.
  • a user can request information by inputting a query to a search engine.
  • the search engine can process the query and can provide information including query filters for output to the user in response to the query.
  • the queries are dynamically determined, in part, from the content of the resources that are responsive to the query.
  • one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving, for a first query, data identifying a set of resources that are determined to be responsive to the first query; extracting, from the set of resources, a first set of keywords from the contents of the resources; determining, from the first set of keywords, a set of candidate filters from the keywords, each candidate filter derived from one or more keywords in the set of keywords, and wherein the set of candidate filters are a proper subset of the first set of keywords; determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold that is indicative of a filtered set of content resulting from applying the query filter to the set of resources and a filtered set of content resulting from applying another query filter to the set of resources meeting a difference threshold; and providing, in response to the first query, for display on a user device and with content results that identify content in the set of resources, the set of query filters for the first query.
  • Search query filters can be automatically learned offline and/or generated at serving time, improving the search engine system performance and saving users a large degree of human effort.
  • the filters can be learned from any relevant metadata or text. For example, in the context of an application that is used to provide reviews for certain businesses, e.g., restaurants, learned filters from item reviews and descriptions may be used to narrow a user's search query and lead a user closer towards their end goal.
  • learned filters from item reviews and descriptions enables presented filters to be more tailored to both the specific user need at the time, and the available results to be filtered.
  • Learning filters from item reviews and descriptions enable a search engine system to provide search results in specific domains which vary not just with the categorical query but also with the results available at the time of the search.
  • FIG. 1 is a block diagram of an example environment in which filters from item reviews and descriptions are provided.
  • FIG. 2 is a block diagram of an example process for generating query filters.
  • FIG. 3 is a flow diagram of an example process for providing query filters.
  • FIG. 4 is a flow diagram of an example process for determining a set of candidate filters from a set of keywords.
  • FIG. 5 is a flow diagram of an example process for determining a set of query filters from a set of candidate filters.
  • a search engine system provides user-selectable search query result filters for display on a user device in response to a user-input search query.
  • the system receives data identifying a set of resources that are determined to be responsive to the search query and extracts a set of keywords from the contents of the resources.
  • the keywords are processed according to candidate selection criteria, and a set of candidate query filters are determined.
  • the set of candidate query filters is trimmed using diversity criteria, ensuring that remaining candidate query filters have a reasonable degree of diversity in the sets of search query results that they represent. For example, in some implementations, pairs of candidate query filters are grouped into a single candidate filter if the filtered sets of search query results resulting from applying both candidate query filters are substantially similar.
  • the diversified set of candidate query filters are provided for display on the user device in response to the search query, together with search query results.
  • the features are described in the context of a general search engine.
  • the features can be applied to any system or application that searches a data store.
  • the features described below can be applied to an application that searches a corpus specific to the application.
  • An example of the latter is a mobile phone application that is used to search, provide reviews for, and make reservations at restaurants; or alternatively can be applied to search a large web corpus.
  • FIG. 1 is a block diagram of an example environment 100 in which filters from item reviews and descriptions are provided.
  • a computer network 102 such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connects publisher web sites 104 , user devices 106 , and the search engine 110 .
  • the online environment 100 may include many thousands of publisher web sites 104 and user devices 106 .
  • a publisher website 104 includes one or more resources 105 associated with a domain and hosted by one or more servers in one or more locations.
  • a website is a collection of web pages formatted in hypertext markup language (HTML) that can contain text, images, multimedia content, and programming elements, for example, scripts.
  • HTML hypertext markup language
  • Each web site 104 is maintained by a content publisher, which is an entity that controls, manages and/or owns the website 104 .
  • a resource is any data that can be provided by a publisher website 104 over the network 102 and that has a resource address, e.g., a uniform resource locator (URL).
  • Resources may be HTML pages, electronic documents, image files, video files, audio files, and feed sources, to name just a few.
  • the resources may include embedded information, e.g., meta information and hyperlinks, and/or embedded instructions, e.g., client-side scripts.
  • a user device 106 is an electronic device that is under the control of a user and is capable of requesting and receiving resources over the network 102 .
  • Example user devices 106 include personal computers, mobile communication devices, and other devices that can send and receive data over the network 102 .
  • a user device 106 typically includes a user application, e.g., a web browser, to facilitate the sending and receiving of data over the network 102 .
  • the web browser can enable a user to display and interact with text, images, videos, music and other information typically located on a web page at a website on the world wide web or a local area network.
  • the search engine 110 identifies the resources by crawling the publisher web sites 104 and indexing the resources provided by the publisher web sites 104 .
  • the resources are indexed and the index data are stored in an index 112 .
  • the user devices 106 submit search queries to the search engine 110 .
  • the search queries are submitted in the form of a search request that includes the search request and, optionally, a unique identifier that identifies the user device 106 that submits the request.
  • the unique identifier can be data from a cookie stored at the user device, or a user account identifier if the user maintains an account with the search engine 110 , or some other identifier that identifies the user device 106 or the user using the user device.
  • the search engine 110 uses the index 112 to identify resources that are relevant to the queries.
  • the search engine 110 identifies the resources in the form of search results and returns the search results to the user devices 106 in a search results page resource.
  • a search result is data generated by the search engine 110 that identifies a resource or provides information that satisfies a particular search query.
  • a search result for a resource can include a web page title, a snippet of text extracted from the web page, and a resource locator for the resource, e.g., the URL of a web page.
  • the search results are ranked based on scores related to the resources identified by the search results, such as information retrieval (“IR”) scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score).
  • IR information retrieval
  • the search results are ordered according to these scores and provided to the user device according to the order.
  • the filter subsystem 108 identifies search query filters that are relevant for the identified resources.
  • the filter subsystem 108 identifies the search query filters in the form of search query filter results and returns the search query filter results to the user devices 106 in the search results page resource.
  • a search query filter result is data generated by the filter subsystem 108 that can be used to filter the search results that satisfy the search query to a set of filtered search results that satisfies the search query and the selected filter.
  • the user devices 106 receive the search results pages, including the search query filter results, and render the pages for presentation to users.
  • the user device 106 requests the resource identified by the resource locator included in the selected search result.
  • the publisher of the web site 104 hosting the resource receives the request for the resource from the user device 106 and provides the resource to the requesting user device 106 .
  • the user device 106 In response to the user selecting a search query filter at a user device 106 , the user device 106 requests a set of filtered search results identified by the resource locators included in the selected search query filter.
  • the search engine system 110 receives the request for the subset of search results from the user device 106 and provides the subset of search results to the requesting user device 106 .
  • a set of search results ⁇ SR 1 . . . SRN ⁇ are shown in the search results page 107 a, along with a set of filters ⁇ F 1 . . . F 4 ⁇ .
  • the filters F 1 and F 2 are selected by the user on the user device, resulting in the filtered set of search results ⁇ SR 1 , SR 3 , . . . SRM ⁇ .
  • the filtered set of search results ⁇ SR 1 , SR 3 , . . . SRM ⁇ are a proper subset of the search results ⁇ SR 1 . . . SRN ⁇ .
  • the queries submitted from user devices are stored in query logs 114 .
  • the query logs 114 define search history data that include data from and related to previous search requests associated with unique identifiers.
  • the query logs 114 can be used to map queries submitted by user devices to resources that were identified in search results and the actions taken by users when presented with the search results in response to the queries.
  • data are associated with the identifiers from the search requests so that a search history for each identifier can be accessed.
  • the query logs 114 can also include selection data that can be used by the search engine to determine the respective sequences of queries submitted by the user devices, the actions taken in response to the queries, and how often the queries have been submitted. Likewise, the selection data can also be used to determine for each particular resource the queries for which users find the resource to be most useful.
  • FIG. 2 is a block diagram 200 of an example process for generating query filters.
  • the process 200 can be performed by the system 100 in response to receiving a search query input by a user.
  • the process 200 can be implemented, for example, in a data processing apparatus that is used to realize the filter subsystem 108 .
  • the system receives a search query input by a user at a user device, such as the user device 106 of FIG. 1 ( 202 ).
  • the search query may include one or more terms, e.g., words, numbers or symbols.
  • the process is invoked only when the search query is a categorical query, i.e. a query for which search results are highly indicative of a particular category, e.g., food, entertainment, etc.
  • the query “burgers” may be a categorical query related to one or more of the categories of “dining,” “food,” and “restaurants,” for example.
  • Categorical queries may be predefined by the search engine 110 , or may be identified at query time based on, for example, a dominant intent derived from the content of responsive resources.
  • the system performs a corpus search in order to determine a set of resources that are responsive to the received search query ( 204 ).
  • the corpus may be a collection of available resources and text found at a number of publisher websites, for example the publisher websites 104 and resources 105 of FIG. 1 .
  • the system identifies responsive resources ( 206 ).
  • the responsive resources are those resources determined to be responsive to the received search query by at least a threshold measure, e.g., the top 1,000 ranked resources. For example, in response to receiving the search query “burgers,” the identified set of responsive resources may include restaurant menus, restaurant reviews and descriptions.
  • the system mines the responsive resources corpus to determine an associated set of keywords ( 208 ).
  • Each keyword may include one or more words, numbers or symbols.
  • the associated set of keywords mined from the responsive set of resources may include several thousands of nearby food items available on food menus.
  • the reviews, descriptions and other metadata can be mined to find the most frequently used keywords in the corpus of responsive resources.
  • the system generates a keyword corpus from the mining ( 210 ).
  • the keyword corpus includes keywords, for example, the most frequently used keywords in the responsive resources 206 , such as keywords that meet a frequency threshold relative to the frequencies of other keywords in the responsive resources.
  • the keyword corpus can be filtered to generate a set of candidate keywords according to candidate criteria ( 211 ).
  • candidate criteria can include queries to which the resources 206 are responsive. For example, for the resources responsive to the query “burgers,” the query logs 114 are processed by the filter subsystem 108 to identify other queries to which one or more resources are selected at least a threshold rate. In the example above, for the query “burger,” the resources, based on the query log 114 , may be responsive to the other queries “guac burgers,” “barbeque burger restaurants,” etc. Likewise, queries that are determined to be related to the query “burgers” can also be used.
  • the candidate criteria 211 may include additional keywords corresponding to categorical search queries related to the search query input by the user 202 .
  • a language model 116 may facilitate query-similarity findings. Similarities may be based on stemming, synonyms, and even behavioral indicators, such as similar click patterns for different terms. For example, the term “guac” may be determined to be similar to “California style” in the context of restaurants.
  • the filtering system 108 can also implement stop word filtering in order to remove keywords which are not useful or related to the search query received by the user and/or the queries from the resources.
  • the keywords of these queries are compared to the keywords in the corpus 210 to determine which keywords should be discarded.
  • the corpus may include the term “heart healthy.” However, this keyword may not be in queries, or may be in the queries but at a very low frequency relative to other keywords. Accordingly, the term “heart healthy” will not be selected as a candidate keyword.
  • the system generates a candidate keyword corpus ( 212 ).
  • the candidate keyword corpus includes the set of keywords generated according to the candidate criteria 211 .
  • the candidate keyword corpus can be filtered to generate a set of filter terms according to filter criteria ( 213 ).
  • the candidate keyword corpus may implement a diversity filter to the candidate keywords.
  • the diversity filter enables the system to determine filter terms that have a high degree of diversity in the sets of search results that they represent.
  • the candidate keyword corpus may implement a term-prominence filter in order to filter out and remove candidate keywords that only appear in metadata, or in inconspicuous locations in the corresponding responsive resource 206 .
  • the system generates a filter term corpus using the filtered candidate keyword corpus ( 214 ).
  • the filter terms in the filter term corpus may be provided to a user device.
  • the filter terms may be shown on the user device in some user interface or interactive format, and used to narrow a search query in order to lead a user closer towards their end goal.
  • FIG. 3 is a flow diagram 300 of another example process for providing query filters.
  • the process 300 can be implemented in a data processing apparatus that is used to realize the filter system 108 .
  • the filter subsystem 108 receives data identifying a set of resources that are determined to be responsive to a search query ( 302 ).
  • the search query may be a categorical query.
  • the set of resources can include HTML pages, electronic documents, image files, video files, audio files, and feed sources which may include embedded information, e.g., meta information and hyperlinks.
  • a user may have input the query “burgers” and the filter subsystem 108 may in turn receive data identifying a set of HTML pages or electronic documents including reviews, descriptions and other meta information pertaining to nearby food items available on food menus.
  • the filter subsystem 108 extracts a first set of keywords from the contents of the set of resources ( 304 ).
  • a keyword can include one or more words, symbols or numbers that are associated with the search query.
  • the first set of keywords may include a set of words, symbols or numbers that occur most often in the contents of the set of resources that are determined to be responsive to the search query.
  • the filter subsystem 108 determines a set of candidate filters from the first set of keywords ( 306 ). Each candidate filter is derived from one or more of the keywords in the first set of keywords. The set of candidate keywords are a proper subset of the first set of keywords.
  • the filter subsystem 108 may determine a set of candidate filters from the first set of keywords by determining a set of queries from the resources in the set of resources, where each query in the set of queries is a query for which at least one of the resources has been selected by a user. For example, a top-ranked resource may be highly relevant to the queries “guac burgers” and “whiskey barbeque burgers.” Thus, the queries “guac burgers” and “whiskey barbeque burgers” may be used as candidate selection criteria.
  • the filter subsystem may determine a set of candidate filters from the first set of keywords by determining a set of queries from the first query where each query in the set of queries is a query that is determined to be related to the first query. For example, a user may have input the search query “burgers,” and the filter subsystem 108 may determine that the search query “hotdogs” is related to the search query “burgers” and include the search query “hotdogs” in the set of candidate selection criteria. Processing related queries to identify candidate filters is described in more detail with reference to FIG. 4 below.
  • the candidate filters are determined by removing, from the first set of keywords, keywords that are determined to not be relevant to the candidate set of queries from the resources and/or queries related to the received query.
  • the keywords may be determined to be relevant to the query keywords based on an exact match, or based on meeting a similarity threshold to the query terms. For example, a keyword “guacamole” will be relevant to the query keyword “guac,” as the two keywords are determined to be similar.
  • a language model 116 may facilitate query-similarity findings based on stemming, synonyms, behavioral indicators, and other semantic and/or behavioral data that indicate a similarity of terms or concepts.
  • the filter subsystem 108 determines a set of query filters from the set of candidate filters ( 308 ).
  • each query filter in the set of query filters meets a diversity threshold that is indicative of a filtered set of content resulting from applying a query filter to the set of resources and a filtered set of content resulting from applying another query filter to the set of resources meeting a difference threshold.
  • the set of candidate filters may include the keywords “guacamole” and “guac.”
  • the system may determine that the set of content resulting from applying the query filter “guacamole” to the set of resources for the search query “burgers” may be similar, if not identical, to the set of content resulting from applying the query filter “guac” to the set of resources for the search query “burgers.”
  • the set of query filters will not include both query filters “guacamole” and “guac.”
  • the filter subsystem 108 provides the set of query filters for display on a user device and with content results that identify content in the set of resources in response to the first query ( 310 ).
  • the set of query filters may be displayed in a user interface such as the user interface 107 a described with reference to FIG. 1 .
  • the user interface may be presented to users in response to a user-input query, in a web browser or other application that is capable of providing users with a query feature, e.g., in search results pages provided by a search engine that is accessible to users via a web browser.
  • the user interface includes a query input, one or more user-selectable query filters, e.g., filters F 1 -F 4 , and a list of content results or search results, e.g., SR 1 -SRN.
  • the query input may be a textual field if text queries are input, or may be a drop location if an image query is input, or may be any other input that supports a user interaction for a given input media.
  • each content result in the list of content results is a search result that identifies a corresponding resource in the set of resources.
  • each content result in the list of content results is a subset of content included in a resource in the set of resources.
  • the filter subsystem 108 receives a selection of one or more of the query filters from the user device ( 312 ). For example, the filter subsystem 108 may receive information identifying a selection of the filters F 1 and F 2 , as described with reference to user interface 107 b of FIG. 1 .
  • the filter subsystem 108 provides a filtered set of content that identifies a set of content results that is different from an unfiltered set of content results for display on the user device ( 314 ).
  • the filtered set of content that identifies a set of content results is a proper subset of the unfiltered set of content results. For example, as described with reference to FIG. 1 , the filtered subsystem may determine that the query filters F 1 and F 2 have been selected, and in response to determining that the query filters F 1 and F 2 have been selected, may provide a different listing of content results SR 1 ′-SRM′.
  • the user device 106 may filter results locally on the user device.
  • the user device may receive a set of N search results, e.g., N being 100, and display subsets of M search results, e.g., M being 10.
  • the N filters may be used to filter the N search results stored at the user device to modify the displayed search results.
  • FIG. 4 is a flow diagram of an example process 400 for determining a set of candidate filters from a set of keywords.
  • the process 400 can be implemented in a data processing apparatus that is used to realize the filter subsystem 108 .
  • the filter subsystem 108 determines a set of queries from the resources in the set of resources that are determined to be responsive to a first search query ( 402 ). Each query in the set of queries is a query for which at least one of the resources has been selected by a user.
  • the filter subsystem 108 determines query stop terms from the set of queries ( 404 ). Each query stop term is a term in the set of queries having a frequency that meets a query stop term frequency threshold. In some implementations, the filter subsystem 108 may use a grammar learned from common, related, or specified queries to calculate a query stop term frequency for each of the set of queries. Each query that achieves or exceeds a predetermined query stop term threshold may be deemed useless for a query search in this domain and classified as a query stop term.
  • a user may input the query “find me cheese and guac burgers” and the filter subsystem may extract the keywords “Find me,” “cheese,” “and,”“guac.”
  • the keywords “cheese” and “guac” may occur in other food-related searches, whereas the keywords “Find me” and “and,” which do not identify any types of food, have a higher frequency of occurrence, e.g., in many cases unrelated to food searches.
  • the filter subsystem could therefore determine that the keywords “Find me” and “and” are query stop terms.
  • the filter subsystem 108 excludes the query stop terms from the set of candidate filters ( 406 ). For example, continuing with the above example, the filter subsystem may exclude the terms “Find me” and “and” from the set of candidate filters.
  • the system determines informational terms from the set of queries ( 408 ).
  • Each informational term is a term having a frequency in the set of queries that is less than or equal to an informational term threshold.
  • Each query that does not achieve or exceed a predetermined informational term threshold may be considered useful for a query search in this domain and classified as an informational term. For example, continuing the example above, a user may input the query “find me cheese and guac burgers” and the filter subsystem may extract the keywords “Find me,” “cheese,” “and,” “guac.” The keywords “cheese” and “guac” may have a lower frequency of occurrence in other query searches than the keywords “Find me” and “and,” which do not identify any types of food. The filter subsystem could therefore determine that the keywords “cheese” and “guac” are informational terms.
  • the system includes the informational terms in the set of candidate filters ( 410 ).
  • the filter subsystem may include the terms “cheese” and “guac” in the set of candidate filters.
  • the candidate filters found by the processes of FIGS. 3 and 4 may optionally be rated based on the frequency of the keywords in the keyword corpus 210 , and based on term prominence in the resources, and on other criteria.
  • the term “Guacamole” may appear often in the corpus and in title sections. However, the term “beef,” while also appearing often, may only appear in body sections subordinate to the titles. Thus the term “Guacamole” may be rated higher as a candidate filter than the term “beef.”
  • FIG. 5 is a flow diagram of an example process 500 for determining a set of query filters from a set of candidate filters.
  • the process 500 can be implemented in a data processing apparatus that is used to realize the filter subsystem 108 .
  • the filter subsystem For each candidate query filter in the set of candidate filters, the filter subsystem applies the candidate query filter to the set of resources to obtain a corresponding filtered set of content results ( 502 ).
  • the set of candidate filters may include the candidate query filters “guacamole” and “guac,” and the filter subsystem may apply both the candidate query filter “guacamole” and the candidate query filter “guac” to obtain two corresponding filtered sets of content results.
  • the filter subsystem groups a pair of candidate query filters for which respective filtered sets of content results meet a similarity threshold that is indicative of the respective filtered sets of content results being substantially similar ( 504 ). For example, the filter subsystem may determine that the filtered set of content results resulting from applying the query filter “guacamole” meets or exceeds a similarity threshold to the filtered set of content results resulting from applying the query filter “guac.” The filter subsystem may therefore group the candidate query filters “guacamole” and “guac.” In some implementations, the filter subsystem may select a representative candidate query filter for the group of candidate query filters.
  • the filter subsystem determines quality scores for the candidate query filters based on the locations of the candidate query filters in the resources ( 506 ). For example, a candidate query filter that appears in a prominent position of a resource, such as in the title of a resource, may be assigned a higher quality score than a different candidate query filter that appears in meta data associated with the resource.
  • the filter subsystem determines a set of query filters from the set of candidate filters ( 508 ).
  • the set of query filters are selected from the set of candidate filters based on the query filter's determined quality scores and diversity. As described above with reference to step 308 of FIG. 3 , each query filter in the set of query filters meets the diversity threshold when respective filtered sets of content resulting from applying a respective query filter to the set of resources are sufficiently different from each other.
  • the set of candidate filters may include the keywords “guacamole” and “guac.”
  • the system may determine that the set of content resulting from applying the query filter “guacamole” to the set of resources for the search query “burgers” may be similar, if not identical, to the set of content resulting from applying the query filter “guac” to the set of resources for the search query “burgers”. Thus, only one of the keywords “guacamole” and “guac” would be selected.
  • Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • a computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them.
  • a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal.
  • the computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).
  • the operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
  • the term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing
  • the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them.
  • the apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
  • a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
  • Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a user computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an internetwork (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
  • LAN local area network
  • WAN wide area network
  • Internet internetwork
  • peer-to-peer networks
  • the computing system can include users and servers.
  • a user and server are generally remote from each other and typically interact through a communication network. The relationship of user and server arises by virtue of computer programs running on the respective computers and having a user-server relationship to each other.
  • a server transmits data (e.g., an HTML page) to a user device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the user device).
  • Data generated at the user device e.g., a result of the user interaction

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing filters from resource content. In one aspect, a system receives data identifying a set of resources that are determined to be responsive to a search query and extracts a set of keywords from the contents of the resources and related queries. The keywords are processed according to candidate selection criteria, and a set of candidate query filters are determined. The candidate filters may be used to filter the resources that are responsive to the query.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Patent Application No. 62/192,713, entitled “SEARCH RESULT FILTERS FROM RESOURCE CONTENT,” filed Jul. 15, 2015. The disclosure of the foregoing application is incorporated herein by reference in its entirety for all purposes.
BACKGROUND
The Internet provides access to a wide variety of resources, for example, video files, image files, audio files, or Web pages, including content for particular subjects, book articles, or news articles. A search system can select one or more resources in response to receiving a search query. A search query is data that a user submits to a search engine to satisfy the user's informational needs. The search queries are usually in the form of text, e.g., one or more query terms. The search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources to provide search results that link to the selected resources. The search results are typically ordered according to the scores and presented according to this order.
A search query, however, is often an incomplete expression of a user's informational need. Thus, a user may often refine a search query after reviewing search results, or may select a “suggested query” that is provided by a search engine to conduct another search. A user may also attempt to filter within a set of search results. However, the user may need to generate a filter term or operation, or rely on “hardcoded” filters that require expert knowledge and programming ahead of time, together with manual internationalization, in order to be effective. Furthermore, given the dynamic nature of the corpus of resources available over the Internet, new filtering terms may be emergent and escape the notice of both the user and resource curators.
SUMMARY
This specification describes technologies relating to search engines. In general, a user can request information by inputting a query to a search engine. The search engine can process the query and can provide information including query filters for output to the user in response to the query. The queries are dynamically determined, in part, from the content of the resources that are responsive to the query.
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving, for a first query, data identifying a set of resources that are determined to be responsive to the first query; extracting, from the set of resources, a first set of keywords from the contents of the resources; determining, from the first set of keywords, a set of candidate filters from the keywords, each candidate filter derived from one or more keywords in the set of keywords, and wherein the set of candidate filters are a proper subset of the first set of keywords; determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold that is indicative of a filtered set of content resulting from applying the query filter to the set of resources and a filtered set of content resulting from applying another query filter to the set of resources meeting a difference threshold; and providing, in response to the first query, for display on a user device and with content results that identify content in the set of resources, the set of query filters for the first query. Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Search query filters can be automatically learned offline and/or generated at serving time, improving the search engine system performance and saving users a large degree of human effort. Generally, the filters can be learned from any relevant metadata or text. For example, in the context of an application that is used to provide reviews for certain businesses, e.g., restaurants, learned filters from item reviews and descriptions may be used to narrow a user's search query and lead a user closer towards their end goal. Furthermore, learned filters from item reviews and descriptions enables presented filters to be more tailored to both the specific user need at the time, and the available results to be filtered. Learning filters from item reviews and descriptions enable a search engine system to provide search results in specific domains which vary not just with the categorical query but also with the results available at the time of the search.
The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an example environment in which filters from item reviews and descriptions are provided.
FIG. 2 is a block diagram of an example process for generating query filters.
FIG. 3 is a flow diagram of an example process for providing query filters.
FIG. 4 is a flow diagram of an example process for determining a set of candidate filters from a set of keywords.
FIG. 5 is a flow diagram of an example process for determining a set of query filters from a set of candidate filters.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
Overview
A search engine system provides user-selectable search query result filters for display on a user device in response to a user-input search query. The system receives data identifying a set of resources that are determined to be responsive to the search query and extracts a set of keywords from the contents of the resources. The keywords are processed according to candidate selection criteria, and a set of candidate query filters are determined. The set of candidate query filters is trimmed using diversity criteria, ensuring that remaining candidate query filters have a reasonable degree of diversity in the sets of search query results that they represent. For example, in some implementations, pairs of candidate query filters are grouped into a single candidate filter if the filtered sets of search query results resulting from applying both candidate query filters are substantially similar. The diversified set of candidate query filters are provided for display on the user device in response to the search query, together with search query results.
These features and additional features are described in more detail below. In the examples provided below, the features are described in the context of a general search engine. However, the features can be applied to any system or application that searches a data store. For example, the features described below can be applied to an application that searches a corpus specific to the application. An example of the latter is a mobile phone application that is used to search, provide reviews for, and make reservations at restaurants; or alternatively can be applied to search a large web corpus.
Example Operating Environment
FIG. 1 is a block diagram of an example environment 100 in which filters from item reviews and descriptions are provided. A computer network 102, such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connects publisher web sites 104, user devices 106, and the search engine 110. The online environment 100 may include many thousands of publisher web sites 104 and user devices 106.
A publisher website 104 includes one or more resources 105 associated with a domain and hosted by one or more servers in one or more locations. Generally, a website is a collection of web pages formatted in hypertext markup language (HTML) that can contain text, images, multimedia content, and programming elements, for example, scripts. Each web site 104 is maintained by a content publisher, which is an entity that controls, manages and/or owns the website 104.
A resource is any data that can be provided by a publisher website 104 over the network 102 and that has a resource address, e.g., a uniform resource locator (URL). Resources may be HTML pages, electronic documents, image files, video files, audio files, and feed sources, to name just a few. The resources may include embedded information, e.g., meta information and hyperlinks, and/or embedded instructions, e.g., client-side scripts.
A user device 106 is an electronic device that is under the control of a user and is capable of requesting and receiving resources over the network 102. Example user devices 106 include personal computers, mobile communication devices, and other devices that can send and receive data over the network 102. A user device 106 typically includes a user application, e.g., a web browser, to facilitate the sending and receiving of data over the network 102. The web browser can enable a user to display and interact with text, images, videos, music and other information typically located on a web page at a website on the world wide web or a local area network.
To facilitate searching of these resources 105, the search engine 110 identifies the resources by crawling the publisher web sites 104 and indexing the resources provided by the publisher web sites 104. The resources are indexed and the index data are stored in an index 112.
The user devices 106 submit search queries to the search engine 110. The search queries are submitted in the form of a search request that includes the search request and, optionally, a unique identifier that identifies the user device 106 that submits the request. The unique identifier can be data from a cookie stored at the user device, or a user account identifier if the user maintains an account with the search engine 110, or some other identifier that identifies the user device 106 or the user using the user device.
In response to the search request, the search engine 110 uses the index 112 to identify resources that are relevant to the queries. The search engine 110 identifies the resources in the form of search results and returns the search results to the user devices 106 in a search results page resource. A search result is data generated by the search engine 110 that identifies a resource or provides information that satisfies a particular search query. A search result for a resource can include a web page title, a snippet of text extracted from the web page, and a resource locator for the resource, e.g., the URL of a web page.
The search results are ranked based on scores related to the resources identified by the search results, such as information retrieval (“IR”) scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score). The search results are ordered according to these scores and provided to the user device according to the order.
In addition, in response to the search request, the filter subsystem 108 identifies search query filters that are relevant for the identified resources. The filter subsystem 108 identifies the search query filters in the form of search query filter results and returns the search query filter results to the user devices 106 in the search results page resource. A search query filter result is data generated by the filter subsystem 108 that can be used to filter the search results that satisfy the search query to a set of filtered search results that satisfies the search query and the selected filter.
The user devices 106 receive the search results pages, including the search query filter results, and render the pages for presentation to users. In response to the user selecting a search result at a user device 106, the user device 106 requests the resource identified by the resource locator included in the selected search result. The publisher of the web site 104 hosting the resource receives the request for the resource from the user device 106 and provides the resource to the requesting user device 106.
In response to the user selecting a search query filter at a user device 106, the user device 106 requests a set of filtered search results identified by the resource locators included in the selected search query filter. The search engine system 110 receives the request for the subset of search results from the user device 106 and provides the subset of search results to the requesting user device 106. For example, in FIG. 1, a set of search results {SR1 . . . SRN} are shown in the search results page 107 a, along with a set of filters {F1 . . . F4}. However, in the search results page 107 b, the filters F1 and F2 are selected by the user on the user device, resulting in the filtered set of search results {SR1, SR3, . . . SRM}. The filtered set of search results {SR1, SR3, . . . SRM} are a proper subset of the search results {SR1 . . . SRN}.
In some implementations, the queries submitted from user devices are stored in query logs 114. The query logs 114 define search history data that include data from and related to previous search requests associated with unique identifiers. The query logs 114 can be used to map queries submitted by user devices to resources that were identified in search results and the actions taken by users when presented with the search results in response to the queries. In some implementations, data are associated with the identifiers from the search requests so that a search history for each identifier can be accessed. The query logs 114 can also include selection data that can be used by the search engine to determine the respective sequences of queries submitted by the user devices, the actions taken in response to the queries, and how often the queries have been submitted. Likewise, the selection data can also be used to determine for each particular resource the queries for which users find the resource to be most useful.
Generating Filters from Resource Content
Operation of the system 100 is described with reference to FIG. 2 below, which is a block diagram 200 of an example process for generating query filters. For example, the process 200 can be performed by the system 100 in response to receiving a search query input by a user. The process 200 can be implemented, for example, in a data processing apparatus that is used to realize the filter subsystem 108.
The system receives a search query input by a user at a user device, such as the user device 106 of FIG. 1 (202). The search query may include one or more terms, e.g., words, numbers or symbols. In some implementations the process is invoked only when the search query is a categorical query, i.e. a query for which search results are highly indicative of a particular category, e.g., food, entertainment, etc. For example, the query “burgers” may be a categorical query related to one or more of the categories of “dining,” “food,” and “restaurants,” for example. Categorical queries may be predefined by the search engine 110, or may be identified at query time based on, for example, a dominant intent derived from the content of responsive resources.
The system performs a corpus search in order to determine a set of resources that are responsive to the received search query (204). The corpus may be a collection of available resources and text found at a number of publisher websites, for example the publisher websites 104 and resources 105 of FIG. 1.
The system identifies responsive resources (206). The responsive resources are those resources determined to be responsive to the received search query by at least a threshold measure, e.g., the top 1,000 ranked resources. For example, in response to receiving the search query “burgers,” the identified set of responsive resources may include restaurant menus, restaurant reviews and descriptions.
The system mines the responsive resources corpus to determine an associated set of keywords (208). Each keyword may include one or more words, numbers or symbols. For example, upon receiving the search query “burgers,” the associated set of keywords mined from the responsive set of resources may include several thousands of nearby food items available on food menus. In some implementations, the reviews, descriptions and other metadata can be mined to find the most frequently used keywords in the corpus of responsive resources.
The system generates a keyword corpus from the mining (210). The keyword corpus includes keywords, for example, the most frequently used keywords in the responsive resources 206, such as keywords that meet a frequency threshold relative to the frequencies of other keywords in the responsive resources.
The keyword corpus can be filtered to generate a set of candidate keywords according to candidate criteria (211). Candidate criteria can include queries to which the resources 206 are responsive. For example, for the resources responsive to the query “burgers,” the query logs 114 are processed by the filter subsystem 108 to identify other queries to which one or more resources are selected at least a threshold rate. In the example above, for the query “burger,” the resources, based on the query log 114, may be responsive to the other queries “guac burgers,” “barbeque burger restaurants,” etc. Likewise, queries that are determined to be related to the query “burgers” can also be used. In yet further examples, the candidate criteria 211 may include additional keywords corresponding to categorical search queries related to the search query input by the user 202.
The use of a language model 116, for example, may facilitate query-similarity findings. Similarities may be based on stemming, synonyms, and even behavioral indicators, such as similar click patterns for different terms. For example, the term “guac” may be determined to be similar to “California style” in the context of restaurants.
The filtering system 108 can also implement stop word filtering in order to remove keywords which are not useful or related to the search query received by the user and/or the queries from the resources.
The keywords of these queries are compared to the keywords in the corpus 210 to determine which keywords should be discarded. For example, the corpus may include the term “heart healthy.” However, this keyword may not be in queries, or may be in the queries but at a very low frequency relative to other keywords. Accordingly, the term “heart healthy” will not be selected as a candidate keyword.
The system generates a candidate keyword corpus (212). The candidate keyword corpus includes the set of keywords generated according to the candidate criteria 211. The candidate keyword corpus can be filtered to generate a set of filter terms according to filter criteria (213). For example, the candidate keyword corpus may implement a diversity filter to the candidate keywords. The diversity filter enables the system to determine filter terms that have a high degree of diversity in the sets of search results that they represent. In other examples, the candidate keyword corpus may implement a term-prominence filter in order to filter out and remove candidate keywords that only appear in metadata, or in inconspicuous locations in the corresponding responsive resource 206.
The system generates a filter term corpus using the filtered candidate keyword corpus (214). The filter terms in the filter term corpus may be provided to a user device. The filter terms may be shown on the user device in some user interface or interactive format, and used to narrow a search query in order to lead a user closer towards their end goal.
FIG. 3 is a flow diagram 300 of another example process for providing query filters. The process 300 can be implemented in a data processing apparatus that is used to realize the filter system 108.
The filter subsystem 108 receives data identifying a set of resources that are determined to be responsive to a search query (302). In some implementations, the search query may be a categorical query. The set of resources can include HTML pages, electronic documents, image files, video files, audio files, and feed sources which may include embedded information, e.g., meta information and hyperlinks. For example, a user may have input the query “burgers” and the filter subsystem 108 may in turn receive data identifying a set of HTML pages or electronic documents including reviews, descriptions and other meta information pertaining to nearby food items available on food menus.
The filter subsystem 108 extracts a first set of keywords from the contents of the set of resources (304). A keyword can include one or more words, symbols or numbers that are associated with the search query. For example, the first set of keywords may include a set of words, symbols or numbers that occur most often in the contents of the set of resources that are determined to be responsive to the search query.
The filter subsystem 108 determines a set of candidate filters from the first set of keywords (306). Each candidate filter is derived from one or more of the keywords in the first set of keywords. The set of candidate keywords are a proper subset of the first set of keywords.
For example, in some implementations, the filter subsystem 108 may determine a set of candidate filters from the first set of keywords by determining a set of queries from the resources in the set of resources, where each query in the set of queries is a query for which at least one of the resources has been selected by a user. For example, a top-ranked resource may be highly relevant to the queries “guac burgers” and “whiskey barbeque burgers.” Thus, the queries “guac burgers” and “whiskey barbeque burgers” may be used as candidate selection criteria.
In other implementations, the filter subsystem may determine a set of candidate filters from the first set of keywords by determining a set of queries from the first query where each query in the set of queries is a query that is determined to be related to the first query. For example, a user may have input the search query “burgers,” and the filter subsystem 108 may determine that the search query “hotdogs” is related to the search query “burgers” and include the search query “hotdogs” in the set of candidate selection criteria. Processing related queries to identify candidate filters is described in more detail with reference to FIG. 4 below.
The candidate filters are determined by removing, from the first set of keywords, keywords that are determined to not be relevant to the candidate set of queries from the resources and/or queries related to the received query. The keywords may be determined to be relevant to the query keywords based on an exact match, or based on meeting a similarity threshold to the query terms. For example, a keyword “guacamole” will be relevant to the query keyword “guac,” as the two keywords are determined to be similar. Again, as described above, the use of a language model 116 may facilitate query-similarity findings based on stemming, synonyms, behavioral indicators, and other semantic and/or behavioral data that indicate a similarity of terms or concepts.
The filter subsystem 108 determines a set of query filters from the set of candidate filters (308). In some implementations, each query filter in the set of query filters meets a diversity threshold that is indicative of a filtered set of content resulting from applying a query filter to the set of resources and a filtered set of content resulting from applying another query filter to the set of resources meeting a difference threshold. For example, the set of candidate filters may include the keywords “guacamole” and “guac.” The system may determine that the set of content resulting from applying the query filter “guacamole” to the set of resources for the search query “burgers” may be similar, if not identical, to the set of content resulting from applying the query filter “guac” to the set of resources for the search query “burgers.” Upon determining that the filtered sets of content resulting from applying the query filters “guacamole” and “guac” do not meet a difference threshold, the set of query filters will not include both query filters “guacamole” and “guac.”
The filter subsystem 108 provides the set of query filters for display on a user device and with content results that identify content in the set of resources in response to the first query (310). For example, the set of query filters may be displayed in a user interface such as the user interface 107 a described with reference to FIG. 1. The user interface may be presented to users in response to a user-input query, in a web browser or other application that is capable of providing users with a query feature, e.g., in search results pages provided by a search engine that is accessible to users via a web browser. The user interface includes a query input, one or more user-selectable query filters, e.g., filters F1-F4, and a list of content results or search results, e.g., SR1-SRN. The query input may be a textual field if text queries are input, or may be a drop location if an image query is input, or may be any other input that supports a user interaction for a given input media. In some implementations, each content result in the list of content results is a search result that identifies a corresponding resource in the set of resources. In other implementations, each content result in the list of content results is a subset of content included in a resource in the set of resources.
The filter subsystem 108 receives a selection of one or more of the query filters from the user device (312). For example, the filter subsystem 108 may receive information identifying a selection of the filters F1 and F2, as described with reference to user interface 107 b of FIG. 1.
The filter subsystem 108 provides a filtered set of content that identifies a set of content results that is different from an unfiltered set of content results for display on the user device (314). The filtered set of content that identifies a set of content results is a proper subset of the unfiltered set of content results. For example, as described with reference to FIG. 1, the filtered subsystem may determine that the query filters F1 and F2 have been selected, and in response to determining that the query filters F1 and F2 have been selected, may provide a different listing of content results SR1′-SRM′.
In other implementations, the user device 106 may filter results locally on the user device. For example, the user device may receive a set of N search results, e.g., N being 100, and display subsets of M search results, e.g., M being 10. When a user selects a particular filter, the N filters may be used to filter the N search results stored at the user device to modify the displayed search results.
FIG. 4 is a flow diagram of an example process 400 for determining a set of candidate filters from a set of keywords. The process 400 can be implemented in a data processing apparatus that is used to realize the filter subsystem 108.
The filter subsystem 108 determines a set of queries from the resources in the set of resources that are determined to be responsive to a first search query (402). Each query in the set of queries is a query for which at least one of the resources has been selected by a user.
The filter subsystem 108 determines query stop terms from the set of queries (404). Each query stop term is a term in the set of queries having a frequency that meets a query stop term frequency threshold. In some implementations, the filter subsystem 108 may use a grammar learned from common, related, or specified queries to calculate a query stop term frequency for each of the set of queries. Each query that achieves or exceeds a predetermined query stop term threshold may be deemed useless for a query search in this domain and classified as a query stop term. For example, a user may input the query “find me cheese and guac burgers” and the filter subsystem may extract the keywords “Find me,” “cheese,” “and,”“guac.” The keywords “cheese” and “guac” may occur in other food-related searches, whereas the keywords “Find me” and “and,” which do not identify any types of food, have a higher frequency of occurrence, e.g., in many cases unrelated to food searches. The filter subsystem could therefore determine that the keywords “Find me” and “and” are query stop terms.
The filter subsystem 108 excludes the query stop terms from the set of candidate filters (406). For example, continuing with the above example, the filter subsystem may exclude the terms “Find me” and “and” from the set of candidate filters.
The system determines informational terms from the set of queries (408). Each informational term is a term having a frequency in the set of queries that is less than or equal to an informational term threshold. Each query that does not achieve or exceed a predetermined informational term threshold may be considered useful for a query search in this domain and classified as an informational term. For example, continuing the example above, a user may input the query “find me cheese and guac burgers” and the filter subsystem may extract the keywords “Find me,” “cheese,” “and,” “guac.” The keywords “cheese” and “guac” may have a lower frequency of occurrence in other query searches than the keywords “Find me” and “and,” which do not identify any types of food. The filter subsystem could therefore determine that the keywords “cheese” and “guac” are informational terms.
The system includes the informational terms in the set of candidate filters (410). For example, continuing with the above example, the filter subsystem may include the terms “cheese” and “guac” in the set of candidate filters.
The candidate filters found by the processes of FIGS. 3 and 4 may optionally be rated based on the frequency of the keywords in the keyword corpus 210, and based on term prominence in the resources, and on other criteria. For example, the term “Guacamole” may appear often in the corpus and in title sections. However, the term “beef,” while also appearing often, may only appear in body sections subordinate to the titles. Thus the term “Guacamole” may be rated higher as a candidate filter than the term “beef.”
FIG. 5 is a flow diagram of an example process 500 for determining a set of query filters from a set of candidate filters. The process 500 can be implemented in a data processing apparatus that is used to realize the filter subsystem 108.
For each candidate query filter in the set of candidate filters, the filter subsystem applies the candidate query filter to the set of resources to obtain a corresponding filtered set of content results (502). For example, the set of candidate filters may include the candidate query filters “guacamole” and “guac,” and the filter subsystem may apply both the candidate query filter “guacamole” and the candidate query filter “guac” to obtain two corresponding filtered sets of content results.
The filter subsystem groups a pair of candidate query filters for which respective filtered sets of content results meet a similarity threshold that is indicative of the respective filtered sets of content results being substantially similar (504). For example, the filter subsystem may determine that the filtered set of content results resulting from applying the query filter “guacamole” meets or exceeds a similarity threshold to the filtered set of content results resulting from applying the query filter “guac.” The filter subsystem may therefore group the candidate query filters “guacamole” and “guac.” In some implementations, the filter subsystem may select a representative candidate query filter for the group of candidate query filters.
The filter subsystem determines quality scores for the candidate query filters based on the locations of the candidate query filters in the resources (506). For example, a candidate query filter that appears in a prominent position of a resource, such as in the title of a resource, may be assigned a higher quality score than a different candidate query filter that appears in meta data associated with the resource.
The filter subsystem determines a set of query filters from the set of candidate filters (508). The set of query filters are selected from the set of candidate filters based on the query filter's determined quality scores and diversity. As described above with reference to step 308 of FIG. 3, each query filter in the set of query filters meets the diversity threshold when respective filtered sets of content resulting from applying a respective query filter to the set of resources are sufficiently different from each other. Again, the set of candidate filters may include the keywords “guacamole” and “guac.” The system may determine that the set of content resulting from applying the query filter “guacamole” to the set of resources for the search query “burgers” may be similar, if not identical, to the set of content resulting from applying the query filter “guac” to the set of resources for the search query “burgers”. Thus, only one of the keywords “guacamole” and “guac” would be selected.
Additional Implementation Details
Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's user device in response to requests received from the web browser.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a user computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an internetwork (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
The computing system can include users and servers. A user and server are generally remote from each other and typically interact through a communication network. The relationship of user and server arises by virtue of computer programs running on the respective computers and having a user-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a user device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the user device). Data generated at the user device (e.g., a result of the user interaction) can be received from the user device at the server.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims (19)

What is claimed is:
1. A computer-implemented method, comprising:
receiving, for a first query, data identifying a set of resources that are determined to be responsive to the first query;
extracting, from the set of resources, a first set of keywords from the contents of the resources that have been identified as responsive to the first query;
determining, from the first set of keywords, a set of candidate filters from the keywords, each candidate filter derived from one or more keywords in the set of keywords, and wherein the set of candidate filters are a proper subset of the first set of keywords;
determining, from the set of candidate filters, a set of query filters for the first query, the determining comprising:
for each candidate filter, determining a respective filtered set of content resulting from applying the candidate filter to the set of resources;
for each candidate filter, determining a difference between the respective filtered set of content for the candidate filter to another respective filtered set of content for another, different candidate filter;
for each candidate filter that meets a diversity threshold that indicates the difference of the filtered set of content for the candidate filter and the filtered set of content for another respective filtered set of content for another different candidate filter meets a difference threshold, selecting the candidate filter as a query filter;
providing, in response to the first query, for display on a user device and with content results that identify content in the set of resources, the set of query filters for the first query;
receiving a selection of a particular query filter of the set of query filters for the first query; and
in response to receiving a selection of the particular query filter of the set of query filters, providing, for display on the user device, a filtered set of content that identifies a set of content results for the particular query filter that is different than an unfiltered set of content results, and that is a proper subset of the unfiltered set of content results.
2. The computer-implemented method of claim 1, wherein each content result is a search result that identifies a corresponding resource in the set of resources.
3. The computer-implemented method of claim 1, wherein each content result is a subset of content included in a resource in the set of resources.
4. The method of claim 1, wherein determining, from the first set of keywords, a set of candidate filters from the keywords comprises:
determining, from the resources in the set of resources, a set of queries, each query in the set of queries being a query for which at least one of the resources has been selected by a user in response to the query.
5. The method of claim 4, further comprising:
determining, from the set of queries, query stop terms, each query stop term being a term having a frequency in the set of queries that meets a query stop term frequency threshold; and
excluding the query stop terms in the first set of keywords from the set of candidate filters.
6. The method of claim 4, further comprising:
determining, from the set of queries, informational terms, each information term being a term having a frequency in the set of queries that is less than or equal to an informational term threshold;
including the informational terms in the first set of keywords in the set of candidate filters.
7. The method of claim 1, wherein determining, from the first set of keywords, a set of candidate filters from the keywords comprises:
determining, from the first query, a set of queries, each query in the set of queries being a query that is determined to be related to the first query.
8. The method of claim 1, wherein determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold comprises:
for each candidate query filter, applying the candidate query filter to the set of resources to obtain a corresponding filtered set of content results; and
grouping, into a single candidate query filter, a pair of candidate query filters for which respective filtered sets of content results meet a similarity threshold that is indicative of the respective filtered sets of content results being substantially similar.
9. The method of claim 1, wherein determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold comprises:
determining a quality score for each candidate query filter based on locations of the candidate query filter in the resources.
10. A system, comprising:
a data processing apparatus; and
a non-transitory computer readable medium storing instructions executable by the data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising:
receiving, for a first query, data identifying a set of resources that are determined to be responsive to the first query;
extracting, from the set of resources, a first set of keywords from the contents of the resources that have been identified as responsive to the first query;
determining, from the first set of keywords, a set of candidate filters from the keywords, each candidate filter derived from one or more keywords in the set of keywords, and wherein the set of candidate filters are a proper subset of the first set of keywords;
determining, from the set of candidate filters, a set of query filters for the first query, the determining comprising:
for each candidate filter, determining a respective filtered set of content resulting from applying the candidate filter to the set of resources;
for each candidate filter, determining a difference between the respective each query filter in the set of query filters meeting a diversity threshold that is indicative of a filtered set of content for the candidate filter to another respective resulting from applying the query filter to the set of resources and a filtered set of content resulting from applying for another, different candidate filter; query filter to the set of resources meeting a difference threshold; and
for each candidate filter that meets a diversity threshold that indicates the difference of the filtered set of content for the candidate filter and the filtered set of content for another respective filtered set of content for another different candidate filter meets a difference threshold, selecting the candidate filter as a query filter;
providing, in response to the first query, for display on a user device and with content results that identify content in the set of resources, the set of query filters for the first query;
receiving a selection of a particular query filter of the set of query filters for the first query; and
in response to receiving a selection of the particular query filter of the set of query filters, providing, for display on the user device, a filtered set of content that identifies a set of content results for the particular query filter that is different than an unfiltered set of content results, and that is a proper subset of the unfiltered set of content results.
11. The system of claim 10, wherein each content result is a search result that identifies a corresponding resource in the set of resources.
12. The system of claim 10, wherein each content result is a subset of content included in a resource in the set of resources.
13. The system of claim 10, wherein determining, from the first set of keywords, a set of candidate filters from the keywords comprises:
determining, from the resources in the set of resources, a set of queries, each query in the set of queries being a query for which at least one of the resources has been selected by a user in response to the query.
14. The system of claim 13, further comprising:
determining, from the set of queries, query stop terms, each query stop term being a term having a frequency in the set of queries that meets a query stop term frequency threshold; and
excluding the query stop terms in the first set of keywords from the set of candidate filters.
15. The system of claim 13, further comprising:
determining, from the set of queries, informational terms, each information term being a term having a frequency in the set of queries that is less than or equal to an informational term threshold;
including the informational terms in the first set of keywords in the set of candidate filters.
16. The system of claim 10, wherein determining, from the first set of keywords, a set of candidate filters from the keywords comprises:
determining, from the first query, a set of queries, each query in the set of queries being a query that is determined to be related to the first query.
17. The system of claim 10, wherein determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold comprises:
for each candidate query filter, applying the candidate query filter to the set of resources to obtain a corresponding filtered set of content results; and
grouping, into a single candidate query filter, a pair of candidate query filters for which respective filtered sets of content results meet a similarity threshold that is indicative of the respective filtered sets of content results being substantially similar.
18. The system of claim 10, wherein determining, from the set of candidate filters, a set of query filters, each query filter in the set of query filters meeting a diversity threshold comprises:
determining a quality score for each candidate query filter based on locations of the candidate query filter in the resources.
19. A non-transitory computer readable medium storing instructions executable by a data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising:
receiving, for a first query, data identifying a set of resources that are determined to be responsive to the first query;
extracting, from the set of resources, a first set of keywords from the contents of the resources that have been identified as responsive to the first query;
determining, from the first set of keywords, a set of candidate filters from the keywords, each candidate filter derived from one or more keywords in the set of keywords, and wherein the set of candidate filters are a proper subset of the first set of keywords;
determining, from the set of candidate filters, a set of query filters for the first query, the determining comprising:
for each candidate filter, determining a respective filtered set of content resulting from applying the candidate filter to the set of resources;
for each candidate filter, determining a difference between the respective filtered set of content for the candidate filter to another respective filtered set of content for another, different candidate filter;
for each candidate filter that meets a diversity threshold that indicates the difference of the filtered set of content for the candidate filter and the filtered set of content for another respective filtered set of content for another different candidate filter meets a difference threshold, selecting the candidate filter as a query filter;
providing, in response to the first query, for display on a user device and with content results that identify content in the set of resources, the set of query filters for the first query;
receiving a selection of a particular query filter of the set of query filters for the first query; and
in response to receiving a selection of the particular query filter of the set of query filters, providing, for display on the user device, a filtered set of content that identifies a set of content results for the particular query filter that is different than an unfiltered set of content results, and that is a proper subset of the unfiltered set of content results.
US15/183,455 2015-07-15 2016-06-15 Search result filters from resource content Expired - Fee Related US10242112B2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US15/183,455 US10242112B2 (en) 2015-07-15 2016-06-15 Search result filters from resource content
US16/265,714 US11372941B2 (en) 2015-07-15 2019-02-01 Search result filters from resource content
US17/850,655 US11797626B2 (en) 2015-07-15 2022-06-27 Search result filters from resource content
US18/244,158 US20240143679A1 (en) 2015-07-15 2023-09-08 Search result filters from resource content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562192713P 2015-07-15 2015-07-15
US15/183,455 US10242112B2 (en) 2015-07-15 2016-06-15 Search result filters from resource content

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/265,714 Continuation US11372941B2 (en) 2015-07-15 2019-02-01 Search result filters from resource content

Publications (2)

Publication Number Publication Date
US20170017724A1 US20170017724A1 (en) 2017-01-19
US10242112B2 true US10242112B2 (en) 2019-03-26

Family

ID=57758303

Family Applications (4)

Application Number Title Priority Date Filing Date
US15/183,455 Expired - Fee Related US10242112B2 (en) 2015-07-15 2016-06-15 Search result filters from resource content
US16/265,714 Active 2037-10-09 US11372941B2 (en) 2015-07-15 2019-02-01 Search result filters from resource content
US17/850,655 Active US11797626B2 (en) 2015-07-15 2022-06-27 Search result filters from resource content
US18/244,158 Pending US20240143679A1 (en) 2015-07-15 2023-09-08 Search result filters from resource content

Family Applications After (3)

Application Number Title Priority Date Filing Date
US16/265,714 Active 2037-10-09 US11372941B2 (en) 2015-07-15 2019-02-01 Search result filters from resource content
US17/850,655 Active US11797626B2 (en) 2015-07-15 2022-06-27 Search result filters from resource content
US18/244,158 Pending US20240143679A1 (en) 2015-07-15 2023-09-08 Search result filters from resource content

Country Status (5)

Country Link
US (4) US10242112B2 (en)
EP (1) EP3274879A4 (en)
CN (1) CN107924402A (en)
RU (2) RU2691840C1 (en)
WO (1) WO2017011661A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11797626B2 (en) 2015-07-15 2023-10-24 Google Llc Search result filters from resource content

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445371B2 (en) * 2011-06-23 2019-10-15 FullContact, Inc. Relationship graph
US11748416B2 (en) * 2017-06-19 2023-09-05 Equifax Inc. Machine-learning system for servicing queries for digital content
JP7196393B2 (en) * 2017-12-01 2022-12-27 株式会社リコー Information presentation device, information presentation system, information presentation method and program
CN110674387B (en) * 2018-06-15 2023-09-22 伊姆西Ip控股有限责任公司 Method, apparatus and computer storage medium for data search
US11537672B2 (en) * 2018-09-17 2022-12-27 Yahoo Assets Llc Method and system for filtering content
KR102605448B1 (en) * 2018-10-30 2023-11-22 삼성에스디에스 주식회사 Search method and apparatus thereof
WO2020132693A1 (en) * 2018-12-21 2020-06-25 Waymo Llc Searching an autonomous vehicle sensor data repository
CN110781392B (en) * 2019-10-22 2022-08-12 深圳墨世科技有限公司 Dynamically scalable filtering method and device, computer equipment and storage medium
US11594213B2 (en) 2020-03-03 2023-02-28 Rovi Guides, Inc. Systems and methods for interpreting natural language search queries
US11914561B2 (en) * 2020-03-03 2024-02-27 Rovi Guides, Inc. Systems and methods for interpreting natural language search queries using training data
US11507572B2 (en) 2020-09-30 2022-11-22 Rovi Guides, Inc. Systems and methods for interpreting natural language search queries
CN112541362B (en) * 2020-12-08 2022-08-23 北京百度网讯科技有限公司 Generalization processing method, device, equipment and computer storage medium
US11663279B2 (en) * 2021-05-05 2023-05-30 Capital One Services, Llc Filter list generation system
CN113761426B (en) * 2021-09-24 2024-02-13 南方电网数字平台科技(广东)有限公司 System, method, device, equipment and medium for page service authentication access center
JP2024063484A (en) * 2022-10-26 2024-05-13 キヤノン株式会社 Program, information processing apparatus, method for controlling information processing apparatus, server, and method for controlling server

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194166A1 (en) 2001-05-01 2002-12-19 Fowler Abraham Michael Mechanism to sift through search results using keywords from the results
US20090259646A1 (en) 2008-04-09 2009-10-15 Yahoo!, Inc. Method for Calculating Score for Search Query
US20100114928A1 (en) 2008-11-06 2010-05-06 Yahoo! Inc. Diverse query recommendations using weighted set cover methodology
US20110179021A1 (en) 2010-01-21 2011-07-21 Microsoft Corporation Dynamic keyword suggestion and image-search re-ranking
US8280900B2 (en) 2010-08-19 2012-10-02 Fuji Xerox Co., Ltd. Speculative query expansion for relevance feedback
US20120317141A1 (en) 2007-10-12 2012-12-13 Lexxe Pty Ltd System and method for ordering of semantic sub-keys
CN103150409A (en) 2013-04-08 2013-06-12 深圳市宜搜科技发展有限公司 Method and system for recommending user search word
US20130159348A1 (en) * 2011-12-16 2013-06-20 Sas Institute, Inc. Computer-Implemented Systems and Methods for Taxonomy Development
RU2487404C2 (en) 2006-12-19 2013-07-10 Молдтэк Онтверпен Б.В. Method of classifying web pages and organising corresponding information content
US20130238587A1 (en) 2010-10-30 2013-09-12 Blekko, Inc. Search Query Transformations
CN103577595A (en) 2013-11-15 2014-02-12 北京奇虎科技有限公司 Keyword pushing method and device based on current browse webpage
CN104090963A (en) 2014-07-14 2014-10-08 百度在线网络技术(北京)有限公司 Search information recommendation method and device
US20140330813A1 (en) 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Display apparatus and searching method
US8930356B2 (en) 2007-09-20 2015-01-06 Yahoo! Inc. Techniques for modifying a query based on query associations
US20150026155A1 (en) 2013-07-19 2015-01-22 Ebay Inc. Methods, systems, and apparatus for generating search results
RU2542936C2 (en) 2009-01-15 2015-02-27 Майкрософт Корпорейшн Indexing and searching dynamically changing search corpora
US20150081656A1 (en) 2013-09-13 2015-03-19 Sap Ag Provision of search refinement suggestions based on multiple queries
US20160012052A1 (en) * 2014-07-08 2016-01-14 Microsoft Corporation Ranking tables for keyword search

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024467A1 (en) * 2007-07-20 2009-01-22 Marcus Felipe Fontoura Serving Advertisements with a Webpage Based on a Referrer Address of the Webpage
US8458171B2 (en) * 2009-01-30 2013-06-04 Google Inc. Identifying query aspects
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
US8589399B1 (en) * 2011-03-25 2013-11-19 Google Inc. Assigning terms of interest to an entity
CN103544190A (en) * 2012-07-17 2014-01-29 祁勇 Method and system for acquiring personalized features of users and documents
CN103294815B (en) * 2013-06-08 2017-06-06 北京邮电大学 Based on key class and there are a search engine device and method of various presentation modes
US10242112B2 (en) 2015-07-15 2019-03-26 Google Llc Search result filters from resource content

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194166A1 (en) 2001-05-01 2002-12-19 Fowler Abraham Michael Mechanism to sift through search results using keywords from the results
RU2487404C2 (en) 2006-12-19 2013-07-10 Молдтэк Онтверпен Б.В. Method of classifying web pages and organising corresponding information content
US8930356B2 (en) 2007-09-20 2015-01-06 Yahoo! Inc. Techniques for modifying a query based on query associations
US20120317141A1 (en) 2007-10-12 2012-12-13 Lexxe Pty Ltd System and method for ordering of semantic sub-keys
US20090259646A1 (en) 2008-04-09 2009-10-15 Yahoo!, Inc. Method for Calculating Score for Search Query
US20100114928A1 (en) 2008-11-06 2010-05-06 Yahoo! Inc. Diverse query recommendations using weighted set cover methodology
RU2542936C2 (en) 2009-01-15 2015-02-27 Майкрософт Корпорейшн Indexing and searching dynamically changing search corpora
US20110179021A1 (en) 2010-01-21 2011-07-21 Microsoft Corporation Dynamic keyword suggestion and image-search re-ranking
US8280900B2 (en) 2010-08-19 2012-10-02 Fuji Xerox Co., Ltd. Speculative query expansion for relevance feedback
US20130238587A1 (en) 2010-10-30 2013-09-12 Blekko, Inc. Search Query Transformations
US20130159348A1 (en) * 2011-12-16 2013-06-20 Sas Institute, Inc. Computer-Implemented Systems and Methods for Taxonomy Development
CN103150409A (en) 2013-04-08 2013-06-12 深圳市宜搜科技发展有限公司 Method and system for recommending user search word
US20140330813A1 (en) 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Display apparatus and searching method
US20150026155A1 (en) 2013-07-19 2015-01-22 Ebay Inc. Methods, systems, and apparatus for generating search results
US20150081656A1 (en) 2013-09-13 2015-03-19 Sap Ag Provision of search refinement suggestions based on multiple queries
CN103577595A (en) 2013-11-15 2014-02-12 北京奇虎科技有限公司 Keyword pushing method and device based on current browse webpage
US20160012052A1 (en) * 2014-07-08 2016-01-14 Microsoft Corporation Ranking tables for keyword search
CN104090963A (en) 2014-07-14 2014-10-08 百度在线网络技术(北京)有限公司 Search information recommendation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EP Extended European Search Report issued in European Application No. 16825180.9, dated Sep. 19, 2018, 8 pages.
International Search Report and Written Opinion in Application No. PCT/US2016/042289, dated Oct. 3, 2016, 12 pages.
RU Office Action issued in Russian Application No. 2017137752/08(065914) dated Dec. 6, 2018, 10 pages (with English translation).

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11797626B2 (en) 2015-07-15 2023-10-24 Google Llc Search result filters from resource content

Also Published As

Publication number Publication date
RU2019114229A (en) 2019-06-27
EP3274879A4 (en) 2018-10-17
US20190163713A1 (en) 2019-05-30
EP3274879A1 (en) 2018-01-31
RU2019114229A3 (en) 2019-12-16
US11372941B2 (en) 2022-06-28
WO2017011661A1 (en) 2017-01-19
US20240143679A1 (en) 2024-05-02
US11797626B2 (en) 2023-10-24
RU2691840C1 (en) 2019-06-18
US20220327175A1 (en) 2022-10-13
CN107924402A (en) 2018-04-17
RU2719443C2 (en) 2020-04-17
US20170017724A1 (en) 2017-01-19

Similar Documents

Publication Publication Date Title
US11797626B2 (en) Search result filters from resource content
US9336277B2 (en) Query suggestions based on search data
US9336318B2 (en) Rich content for query answers
US9727603B1 (en) Query refinements using search data
US9842167B2 (en) Search suggestion and display environment
CA2732733C (en) Providing posts to discussion threads in response to a search query
US9183277B1 (en) Providing intent sensitive search results
US8332426B2 (en) Indentifying referring expressions for concepts
US9213748B1 (en) Generating related questions for search queries
US9916384B2 (en) Related entities
CN109952571B (en) Context-based image search results
US10691746B2 (en) Images for query answers
US9251202B1 (en) Corpus specific queries for corpora from search query
US20140365466A1 (en) Search result claiming
US9811592B1 (en) Query modification based on textual resource context
CN107851114B (en) Method, system, and medium for automatic information retrieval
US9390183B1 (en) Identifying navigational resources for informational queries
US10055463B1 (en) Feature based ranking adjustment
US20160019226A1 (en) Identifying video files of a video file storage system

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MACGILLIVRAY, IAN;SPITZ, KAYLIN;YANG, SELENA SUNLING;AND OTHERS;SIGNING DATES FROM 20150901 TO 20150910;REEL/FRAME:039064/0777

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044129/0001

Effective date: 20170929

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20230326