Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/956863.956925acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Categorizing web queries according to geographical locality

Published: 03 November 2003 Publication History

Abstract

Web pages (and resources, in general) can be characterized according to their geographical locality. For example, a web page with general information about wildflowers could be considered a global page, likely to be of interest to a geographically broad audience. In contrast, a web page with listings on houses for sale in a specific city could be regarded as a local page, likely to be of interest only to an audience in a relatively narrow region. Similarly, some search engine queries (implicitly) target global pages, while other queries are after local pages. For example, the best results for query [wildflowers] are probably global pages about wildflowers such as the one discussed above. However, local pages that are relevant to, say, San Francisco are likely to be good matches for a query [houses for sale] that was issued by a San Francisco resident or by somebody moving to that city. Unfortunately, search engines do not analyze the geographical locality of queries and users, and hence often produce sub-optimal results. Thus query [wildflowers] might return pages that discuss wildflowers in specific U.S. states (and not general information about wildflowers), while query [houses for sale] might return pages with real estate listings for locations other than that of interest to the person who issued the query. Deciding whether an unseen query should produce mostly local or global pages---without placing this burden on the search engine users---is an important and challenging problem, because queries are often ambiguous or underspecify the information they are after. In this paper, we address this problem by first defining how to categorize queries according to their (often implicit) geographical locality. We then introduce several alternatives for automatically and efficiently categorizing queries in our scheme, using a variety of state-of-the-art machine learning tools. We report a thorough evaluation of our classifiers using a large sample of queries from a real web search engine, and conclude by discussing how our query categorization approach can help improve query result quality.

References

[1]
D. M. Bates and D. G. Watts. Nonlinear Regression Analysis and its Applications. Wiley, New York, 1988.
[2]
B. E. Boser, I. M. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 1992.
[3]
A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7):1145--1159, 1998.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7), Apr. 1998.
[5]
C. Buckley, J. Allan, G. Salton, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Proceedings of the Third Text REtrieval Conference (TREC-3), pages 69--80, April 1995. NIST Special Publication 500-225.
[6]
O. Buyukkokten, J. Cho, H. Gracía-Molina, L. Gravano, and N. Shivakumar. Exploiting geographical location information of web pages. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99), June 1999.
[7]
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the Seventh International World Wide Web Conference (WWW7), Apr. 1998.
[8]
W. W. Cohen. Learning trees and rules with set-valued functions. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, 1996.
[9]
J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the Twenty-sixth International Conference on Very Large Databases (VLDB'00), 2000.
[10]
G. W. Flake, E. J. Glover, S. Lawrence, and C. L. Giles. Extracting query modifications from nonlinear SVMs. In Proceedings of the Eleventh International World-Wide Web Conference, Dec. 2002.
[11]
M. A. Hearst. Trends and controversies: Support vector machines. IEEE Intelligent Systems, 13(4):18--28, July 1998.
[12]
T. Joachims. Estimating the generalization of performance of an SVM efficiently. In Proceedings of the Fourteenth International Conference on Machine Learning, 2000.
[13]
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the Ninth Annual ACM - SIAM Symposium on Discrete Algorithms, pages 668--677, Jan. 1998.
[14]
Geospatial mapping and navigation of the web. In Proceedings of the Tenth International World Wide Web Conference (WWW10), May 2001.
[15]
M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning, Sept. 1997.
[16]
R. Purves, A. Ruas, M. Sanderson, M. Sester, M. van Kreveld, and R. Weibel. Spatial information retrieval and geographical ontologies: An overview of the SPIRIT project. In Proceedings of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR'02), 2002.
[17]
R. J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
[18]
G. Salton. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989.
[19]
T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.
[20]
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 2nd edition, 1979.
[21]
G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Computer Science Department, Rutgers University, Aug. 2001.

Cited By

View all
  • (2023)Query sampler: generating query sets for analyzing search engines using keyword research toolsPeerJ Computer Science10.7717/peerj-cs.14219(e1421)Online publication date: 7-Jun-2023
  • (2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
  • (2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
  • Show More Cited By

Index Terms

  1. Categorizing web queries according to geographical locality

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 November 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information retrieval
      2. query classification
      3. query modification
      4. search engines
      5. web search

      Qualifiers

      • Article

      Conference

      CIKM03

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 19 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Query sampler: generating query sets for analyzing search engines using keyword research toolsPeerJ Computer Science10.7717/peerj-cs.14219(e1421)Online publication date: 7-Jun-2023
      • (2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
      • (2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
      • (2019)A Document Ranking Method With Query-Related Web ContextIEEE Access10.1109/ACCESS.2019.29471667(150168-150174)Online publication date: 2019
      • (2018)Semantic Location in Email Query SuggestionThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210116(977-980)Online publication date: 27-Jun-2018
      • (2018)Geo-Targeted Web SearchEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_176(1625-1630)Online publication date: 7-Dec-2018
      • (2017)Using deep learning for short text understandingJournal of Big Data10.1186/s40537-017-0095-24:1Online publication date: 23-Oct-2017
      • (2017)Big Search in CyberspaceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269967529:9(1793-1805)Online publication date: 1-Sep-2017
      • (2017)Geographical labeling of web objects through density estimator model2017 International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC.2017.8282649(1130-1135)Online publication date: Jul-2017
      • (2016)Automatic Identification and Contextual Reformulation of Implicit System-Related QueriesProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914701(761-764)Online publication date: 7-Jul-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media