Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1367497.1367505acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Mining the search trails of surfing crowds: identifying relevant websites from user activity

Published: 21 April 2008 Publication History

Abstract

The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that user interactions with search engines can be employed to improve document ranking, browsing behavior that occurs beyond search result pages has been largely overlooked in prior work. The paper demonstrates that users' post-search browsing activity strongly reflects implicit endorsement of visited pages, which allows estimating topical relevance of Web resources by mining large-scale datasets of search trails. We present heuristic and probabilistic algorithms that rely on such datasets for suggesting authoritative websites for search queries. Experimental evaluation shows that exploiting complete post-search browsing trails outperforms alternatives in isolation (e.g., clickthrough logs), and yields accuracy improvements when employed as a feature in learning to rank for Web search.

References

[1]
A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 14--23, 2006.
[2]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 19--26, 2006.
[3]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 3--10, 2006.
[4]
E. Agichtein and Z. Zheng. Identifying "best bet" web search results by mining past user behavior. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 902--908, 2006.
[5]
S. Agrawal, C. Cortes, and R. Herbrich, editors. Proceedings of the NIPS 2005 Workshop on Learning to Rank, 2005. http://web.mit.edu/shivani/www/Ranking-NIPS-05.
[6]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 407--416, 2000.
[7]
C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender. Learning to rank using gradient descent. In Proceedings of 22nd International Conference on Machine Learning (ICML-2005), pages 89--96, 2005.
[8]
V. Bush. As we may think. Atlantic Monthly, 3(2):37--46, 1945.
[9]
S. K. Card, P. Pirolli, M. V. D. Wege, J. B. Morrison, R. W. Reeder, P. K. Schraedley, and J. Boshart. Information scent as a driver of web behavior graphs: results of a protocol analysis method for web usability. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, pages 498--505, 2001.
[10]
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference (WWW-98), pages 65--74, 1998.
[11]
M. Chalmers, K. Rodden, and D. Brodbeck. The order of things: activity-centered information access. In Proceedings of the 7th International Conference on the World Wide Web (WWW-98), pages 359--367, 1998.
[12]
S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems, 23(2):147--168, 2005.
[13]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the 30th International Conference on Very Large Databases (VLDB-04), pages 576--587, 2004.
[14]
T. Haveliwala. Topic-sensitive PageRank. In Proceedings of the 11th International World Wide Web Conference (WWW-02), pages 517--526, 2002.
[15]
B. J. Jansen and A. Spink. How are we searching the world wide web?: a comparison of nine search engine transaction logs. Information Processing and Management, 42(1):248--263, 2006.
[16]
K. Järvelin and J. Kekäläinen. Cumulated Gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[17]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02), Edmonton, Canada, 2002.
[18]
T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 154--161, 2006.
[19]
T. Joachims, H. Li, T.-Y. Liu, and C. Zhai, editors. Proceedings of the ACM SIGIR 2007 Workshop on Learningto Rank for Information Retrieval, 2007. http://research.microsoft.com/users/LR4IR-2007.
[20]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 16th International Conference on World Wide Web (WWW-2006), pages 387--396, 2006.
[21]
D. Kelly and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-04), pages 377--384, 2004.
[22]
D. Kelly and J. Teevan. Implicit feedback for inferring user preference: a bibliography. ACM SIGIR Forum, 37(2):18--28, 2003.
[23]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the Association for Computing Machinery, 46(5):604--632, 1999.
[24]
M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3):216--244, 1960.
[25]
I. Matveeva, C. Burges, T. Burkard, A. Laucius, and L. Wong. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 437--444, 2006.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1998.
[27]
S. Pandit and C. Olston. Navigationaided retrieval. In Proceedings of the 16th International World Wide Web Conference (WWW-07), pages 391--400, 2007.
[28]
P. Pirolli, J. Pitkow, and R. Rao. Silk from a sow?s ear: extracting usable structures from the web. In Proceedings of the ACM SIGCHI conference on Human Factors in Computing Systems, pages 118--125, 1996.
[29]
J. Pitkow and P. Pirolli. Life, death, and lawfulness on the electronic frontier. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pages 383--390, 1997.
[30]
F. Radlinsky and T. Joachims. Query chains: Learning to rank from implicit feedback. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), pages 239--248, 2005.
[31]
P. Resnick, N. Iacovou, M. Sushak, P. Bergstrom, and J. Reidl. GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 Computer Supported Cooperative Work Conference, New York, 1994. ACM.
[32]
M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14, pages 1441--1448, 2002.
[33]
M. Richardson, A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In Proceedings of the 15th International World Wide Web Conference (WWW-06), pages 707--715, 2006.
[34]
S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM-04), pages 42--49, 2004.
[35]
S. E. Robertson. The probability ranking principle in IR. In Readings in Information Retrieval, pages 281--286. Morgan Kaufmann Publishers Inc., 1997.
[36]
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, New York, 1983.
[37]
B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Analysis of recommendation algorithms for e-commerce. In ACM Conference on Electronic Commerce, pages 158--167, 2000.
[38]
U. Shardanand and P. Maes. Social information filtering: algorithms for automating word of mouth. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 210--217, 1995.
[39]
X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-05), pages 43--50, 2005.
[40]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.
[41]
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01), pages 66--73, 2001.
[42]
B. Tan, X. Shen, and C. Zhai. Mining long-term search history to improve search accuracy. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 718--723, 2006.
[43]
J. Teevan, C. Alvarado, M. S. Ackerman, and D. R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI-04), pages 415--422, 2004.
[44]
A. Wexelblat and P. Maes. Footprints: history-rich tools for information foraging. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pages 270--277, 1999.
[45]
R. W. White, M. Bilenko, and S. Cucerzan. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-07), pages 159--166, 2007.
[46]
R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th International Conference on World Wide Web (WWW-2006).
[47]
R. W. White, I. Ruthven, and J. M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 57--64, 2002.
[48]
G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM-04), pages 118--126, 2004.
[49]
C. Zhai and J. Lafferty. A study of smoothing methods for language models. ACM Transactions on Information Systems, 22(2):179--214, 2004.

Cited By

View all
  • (2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • (2022)InterWeave: Presenting Search Suggestions in Context Scaffolds Information Search and SynthesisProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545696(1-16)Online publication date: 29-Oct-2022
  • Show More Cited By

Index Terms

  1. Mining the search trails of surfing crowds: identifying relevant websites from user activity

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '08: Proceedings of the 17th international conference on World Wide Web
      April 2008
      1326 pages
      ISBN:9781605580852
      DOI:10.1145/1367497
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 April 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. implicit feedback
      2. learning from user behavior
      3. mining search and browsing logs
      4. web search

      Qualifiers

      • Research-article

      Conference

      WWW '08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
      • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
      • (2022)InterWeave: Presenting Search Suggestions in Context Scaffolds Information Search and SynthesisProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545696(1-16)Online publication date: 29-Oct-2022
      • (2022)User’s Emotion Profiling in Web Browsing BehaviorAdvances in Intelligent Networking and Collaborative Systems10.1007/978-3-031-14627-5_1(1-8)Online publication date: 17-Aug-2022
      • (2021)The Living Lab on Media Content and Platforms: Results from six months of web browser trackingO Living Lab on Media Content and Platforms: Resultados de seis meses de web browser trackingComunicação pública10.4000/cp.12665Online publication date: 30-Jun-2021
      • (2021)Information Search Trail Recommendation Based on Markov Chain Model and Case-based ReasoningData and Information Management10.2478/dim-2020-00475:1(228-241)Online publication date: Jan-2021
      • (2021)CoNotate: Suggesting Queries Based on Notes Promotes Knowledge DiscoveryProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445618(1-14)Online publication date: 6-May-2021
      • (2021)Directing and Combining Multiple Queries for Exploratory Search by Visual Interactive Intent ModelingHuman-Computer Interaction – INTERACT 202110.1007/978-3-030-85613-7_34(514-535)Online publication date: 26-Aug-2021
      • (2020)Search Support ToolsUnderstanding and Improving Information Search10.1007/978-3-030-38825-6_8(139-160)Online publication date: 30-May-2020
      • (2019)In Search of a Stochastic Model for the E-News ReaderACM Transactions on Knowledge Discovery from Data10.1145/336269513:6(1-27)Online publication date: 13-Nov-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media