Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1183614.1183711acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Coupling feature selection and machine learning methods for navigational query identification

Published: 06 November 2006 Publication History

Abstract

It is important yet hard to identify navigational queries in Web search due to a lack of sufficient information in Web queries, which are typically very short. In this paper we study several machine learning methods, including naive Bayes model, maximum entropy model, support vector machine (SVM), and stochastic gradient boosting tree (SGBT), for navigational query identification in Web search. To boost the performance of these machine techniques, we exploit several feature selection methods and propose coupling feature selection with classification approaches to achieve the best performance. Different from most prior work that uses a small number of features, in this paper, we study the problem of identifying navigational queries with thousands of available features, extracted from major commercial search engine results, Web search user click data, query log, and the whole Web's relational content. A multi-level feature extraction system is constructed.Our results on real search data show that 1) Among all the features we tested, user click distribution features are the most important set of features for identifying navigational queries. 2) In order to achieve good performance, machine learning approaches have to be coupled with good feature selection methods. We find that gradient boosting tree, coupled with linear SVM feature selection is most effective. 3) With carefully coupled feature selection and classification approaches, navigational queries can be accurately identified with 88.1% F1 score, which is 33% error rate reduction compared to the best uncoupled system, and 40% error rate reduction compared to a well tuned system without feature selection.

References

[1]
S. Beitzel, E. Jensen, D. Lewis, A. Chowdhury, A. Kolcz, and O. Frieder. Improving Automatic Query Classification via Semi-supervised Learning. In The Fifth IEEE International Conference on Data Mining, pages 27--30, New Orleans, Louisiana, November 2005.
[2]
C. Bhattacharyya, L. R. Grate, M. I. Jordan, L. El Ghaoui, and I. S. Mian. Robust Sparse Hyperplane Classifiers: Application to Uncertain Molecular Profiling Data. Journal of Computational Biology, 11(6):1073--1089, 2004.
[3]
A. Broder. A Taxonomy of Web Search. In ACM SIGIR Forum, pages 3--10, 2002.
[4]
S. della Pietra, V. della Pietra, and J. Lafferty. Inducing Features of Random Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 1995.
[5]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, New York, NY, 2nd edition, 2000.
[6]
J. H. Friedman. Stochastic Gradient Boosting. Computational Statistics and Data Analysis, 38(4):367--378, 2002.
[7]
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing Web Queries According to Geographical Locality. In ACM 12th Conference on Information and Knowledge Management (CIKM), pages 27--30, New Orleans, Louisiana, November 2003.
[8]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Predication. Springer Verlag, New York, 2001.
[9]
E. T. Jaynes. Papers on Probability, Statistics, and Statistical Physics. D. Reidel, Dordrecht, Holland and Boston and Hingham, MA, 1983.
[10]
T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 137--142, Chemnitz, Germany, 1998.
[11]
I.-H. Kang and G. Kim. Query Type Classication for Web Document Retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 64--71, Toronto Canada, July 2003.
[12]
U. Lee, Z. Liu, and J. Cho. Automatic Identification of User Goals in Web Search. In Proceedings of the 14th International World Wide Web Conference (WWW), Chiba, Japan, 2005.
[13]
R. Malouf. A Comparison of Algorithms for Maximum Entropy Parameter Estimation. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL), Taipei, China, 2002.
[14]
D. E. Rose and D. Levinson. Understanding User Goals in Web Search. In Proceedings of The 13th International World Wide Web Conference (WWW), 2004.
[15]
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2C at UST: Our Winning Solution to Query Classification in KDDCUP 2005. SIGKDD Explorations, 7(2):100--110, 2005.
[16]
L. Sherman and J. Deighton. Banner advertising: Measuring effectiveness and optimizing placement. Journal of Interactive Marketing, 15(2):60--64, 2001.
[17]
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
[18]
Y. Yang and J. Pedersen. An Comparison Study on Feature Selection in Text Categorization. In Proceedings of the 20th annual international ACMSIGIR conference on Research and development in informaion retrieval, Philadelphia, PA, USA, 1997.
[19]
S. C. Zhu. Statistical modeling and conceptualization of visual patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6):619--712,2003.

Cited By

View all
  • (2022)ORCAS-IProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531737(3057-3066)Online publication date: 6-Jul-2022
  • (2020)Eliminating Search Intent Bias in Learning to Rank2020 IEEE 14th International Conference on Semantic Computing (ICSC)10.1109/ICSC.2020.00022(108-115)Online publication date: Feb-2020
  • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
  • Show More Cited By

Index Terms

  1. Coupling feature selection and machine learning methods for navigational query identification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
    November 2006
    916 pages
    ISBN:1595934332
    DOI:10.1145/1183614
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. navigational query classification

    Qualifiers

    • Article

    Conference

    CIKM06
    CIKM06: Conference on Information and Knowledge Management
    November 6 - 11, 2006
    Virginia, Arlington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)ORCAS-IProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531737(3057-3066)Online publication date: 6-Jul-2022
    • (2020)Eliminating Search Intent Bias in Learning to Rank2020 IEEE 14th International Conference on Semantic Computing (ICSC)10.1109/ICSC.2020.00022(108-115)Online publication date: Feb-2020
    • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
    • (2019)A Clicked-URL Feature for Transactional Query Identification2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00156(950-951)Online publication date: Jul-2019
    • (2018)Automatic prediction of news intent for search queriesThe Electronic Library10.1108/EL-06-2017-013436:5(938-958)Online publication date: Oct-2018
    • (2016)The Goal Behind the ActionACM Transactions on Database Systems10.1145/293466641:4(1-43)Online publication date: 8-Nov-2016
    • (2016)Towards Better Understanding of Academic SearchProceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries10.1145/2910896.2910922(111-114)Online publication date: 19-Jun-2016
    • (2016)Goal-aware data management for retrieval and recommendations2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2016.7495651(216-220)Online publication date: May-2016
    • (2012)Recipe recommendation using ingredient networksProceedings of the 4th Annual ACM Web Science Conference10.1145/2380718.2380757(298-307)Online publication date: 22-Jun-2012
    • (2012)Deriving query intents from web search engine queriesJournal of the American Society for Information Science and Technology10.1002/asi.2270663:9(1773-1788)Online publication date: 1-Sep-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media