research-article

Mining the search trails of surfing crowds: identifying relevant websites from user activity

Authors:

Mikhail Bilenko,

Ryen W. WhiteAuthors Info & Claims

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 51 - 60

https://doi.org/10.1145/1367497.1367505

Published: 21 April 2008 Publication History

Abstract

The paper proposes identifying relevant information sources from the history of combined searching and browsing behavior of many Web users. While it has been previously shown that user interactions with search engines can be employed to improve document ranking, browsing behavior that occurs beyond search result pages has been largely overlooked in prior work. The paper demonstrates that users' post-search browsing activity strongly reflects implicit endorsement of visited pages, which allows estimating topical relevance of Web resources by mining large-scale datasets of search trails. We present heuristic and probabilistic algorithms that rely on such datasets for suggesting authoritative websites for search queries. Experimental evaluation shows that exploiting complete post-search browsing trails outperforms alternatives in isolation (e.g., clickthrough logs), and yields accuracy improvements when employed as a feature in learning to rank for Web search.

References

[1]

A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 14--23, 2006.

Digital Library

[2]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 19--26, 2006.

Digital Library

[3]

E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 3--10, 2006.

Digital Library

[4]

E. Agichtein and Z. Zheng. Identifying "best bet" web search results by mining past user behavior. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 902--908, 2006.

Digital Library

[5]

S. Agrawal, C. Cortes, and R. Herbrich, editors. Proceedings of the NIPS 2005 Workshop on Learning to Rank, 2005. http://web.mit.edu/shivani/www/Ranking-NIPS-05.

[6]

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 407--416, 2000.

Digital Library

[7]

C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender. Learning to rank using gradient descent. In Proceedings of 22nd International Conference on Machine Learning (ICML-2005), pages 89--96, 2005.

Digital Library

[8]

V. Bush. As we may think. Atlantic Monthly, 3(2):37--46, 1945.

Digital Library

[9]

S. K. Card, P. Pirolli, M. V. D. Wege, J. B. Morrison, R. W. Reeder, P. K. Schraedley, and J. Boshart. Information scent as a driver of web behavior graphs: results of a protocol analysis method for web usability. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, pages 498--505, 2001.

Digital Library

[10]

S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference (WWW-98), pages 65--74, 1998.

Digital Library

[11]

M. Chalmers, K. Rodden, and D. Brodbeck. The order of things: activity-centered information access. In Proceedings of the 7th International Conference on the World Wide Web (WWW-98), pages 359--367, 1998.

Digital Library

[12]

S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems, 23(2):147--168, 2005.

Digital Library

[13]

Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the 30th International Conference on Very Large Databases (VLDB-04), pages 576--587, 2004.

Digital Library

[14]

T. Haveliwala. Topic-sensitive PageRank. In Proceedings of the 11th International World Wide Web Conference (WWW-02), pages 517--526, 2002.

Digital Library

[15]

B. J. Jansen and A. Spink. How are we searching the world wide web?: a comparison of nine search engine transaction logs. Information Processing and Management, 42(1):248--263, 2006.

Digital Library

[16]

K. Järvelin and J. Kekäläinen. Cumulated Gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.

Digital Library

[17]

T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02), Edmonton, Canada, 2002.

Digital Library

[18]

T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 154--161, 2006.

Digital Library

[19]

T. Joachims, H. Li, T.-Y. Liu, and C. Zhai, editors. Proceedings of the ACM SIGIR 2007 Workshop on Learningto Rank for Information Retrieval, 2007. http://research.microsoft.com/users/LR4IR-2007.

[20]

R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 16th International Conference on World Wide Web (WWW-2006), pages 387--396, 2006.

Digital Library

[21]

D. Kelly and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-04), pages 377--384, 2004.

Digital Library

[22]

D. Kelly and J. Teevan. Implicit feedback for inferring user preference: a bibliography. ACM SIGIR Forum, 37(2):18--28, 2003.

Digital Library

[23]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the Association for Computing Machinery, 46(5):604--632, 1999.

Digital Library

[24]

M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3):216--244, 1960.

Digital Library

[25]

I. Matveeva, C. Burges, T. Burkard, A. Laucius, and L. Wong. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), pages 437--444, 2006.

Digital Library

[26]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1998.

[27]

S. Pandit and C. Olston. Navigationaided retrieval. In Proceedings of the 16th International World Wide Web Conference (WWW-07), pages 391--400, 2007.

Digital Library

[28]

P. Pirolli, J. Pitkow, and R. Rao. Silk from a sow?s ear: extracting usable structures from the web. In Proceedings of the ACM SIGCHI conference on Human Factors in Computing Systems, pages 118--125, 1996.

Digital Library

[29]

J. Pitkow and P. Pirolli. Life, death, and lawfulness on the electronic frontier. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pages 383--390, 1997.

Digital Library

[30]

F. Radlinsky and T. Joachims. Query chains: Learning to rank from implicit feedback. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), pages 239--248, 2005.

Digital Library

[31]

P. Resnick, N. Iacovou, M. Sushak, P. Bergstrom, and J. Reidl. GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 Computer Supported Cooperative Work Conference, New York, 1994. ACM.

Digital Library

[32]

M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14, pages 1441--1448, 2002.

[33]

M. Richardson, A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In Proceedings of the 15th International World Wide Web Conference (WWW-06), pages 707--715, 2006.

Digital Library

[34]

S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM-04), pages 42--49, 2004.

Digital Library

[35]

S. E. Robertson. The probability ranking principle in IR. In Readings in Information Retrieval, pages 281--286. Morgan Kaufmann Publishers Inc., 1997.

Digital Library

[36]

G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, New York, 1983.

Digital Library

[37]

B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Analysis of recommendation algorithms for e-commerce. In ACM Conference on Electronic Commerce, pages 158--167, 2000.

Digital Library

[38]

U. Shardanand and P. Maes. Social information filtering: algorithms for automating word of mouth. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 210--217, 1995.

Digital Library

[39]

X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-05), pages 43--50, 2005.

Digital Library

[40]

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.

Digital Library

[41]

I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01), pages 66--73, 2001.

Digital Library

[42]

B. Tan, X. Shen, and C. Zhai. Mining long-term search history to improve search accuracy. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), pages 718--723, 2006.

Digital Library

[43]

J. Teevan, C. Alvarado, M. S. Ackerman, and D. R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI-04), pages 415--422, 2004.

Digital Library

[44]

A. Wexelblat and P. Maes. Footprints: history-rich tools for information foraging. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pages 270--277, 1999.

Digital Library

[45]

R. W. White, M. Bilenko, and S. Cucerzan. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-07), pages 159--166, 2007.

Digital Library

[46]

R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th International Conference on World Wide Web (WWW-2006).

Digital Library

[47]

R. W. White, I. Ruthven, and J. M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 57--64, 2002.

Digital Library

[48]

G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM-04), pages 118--126, 2004.

Digital Library

[49]

C. Zhai and J. Lafferty. A study of smoothing methods for language models. ACM Transactions on Information Systems, 22(2):179--214, 2004.

Digital Library

Cited By

Häglund EBjörklund J(2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
https://doi.org/10.1080/10641734.2024.2334939
Piccardi TGerlach MArora AWest R(2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
https://dl.acm.org/doi/10.1145/3580318
Palani SZhou YZhu SDow S(2022)InterWeave: Presenting Search Suggestions in Context Scaffolds Information Search and SynthesisProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545696(1-16)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545696
Show More Cited By

Index Terms

Mining the search trails of surfing crowds: identifying relevant websites from user activity
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
  2. Information systems applications
    1. Data mining

Recommendations

PSkip: estimating relevance ranking quality from web search clickthrough data
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to ...
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
Mining Web search engines for query suggestion

Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '08: Proceedings of the 17th international conference on World Wide Web

April 2008

1326 pages

ISBN:9781605580852

DOI:10.1145/1367497

General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '08

Sponsor:

ACM

WWW '08: The 17th International World Wide Web Conference

April 21 - 25, 2008

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

116
Total Citations
View Citations
1,356
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Häglund EBjörklund J(2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
https://doi.org/10.1080/10641734.2024.2334939
Piccardi TGerlach MArora AWest R(2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
https://dl.acm.org/doi/10.1145/3580318
Palani SZhou YZhu SDow S(2022)InterWeave: Presenting Search Suggestions in Context Scaffolds Information Search and SynthesisProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545696(1-16)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545696
Yoshida YMasuda KTakano KLi K(2022)User’s Emotion Profiling in Web Browsing BehaviorAdvances in Intelligent Networking and Collaborative Systems10.1007/978-3-031-14627-5_1(1-8)Online publication date: 17-Aug-2022
https://doi.org/10.1007/978-3-031-14627-5_1
Montargil F(2021)The Living Lab on Media Content and Platforms: Results from six months of web browser trackingO Living Lab on Media Content and Platforms: Resultados de seis meses de web browser trackingComunicação pública10.4000/cp.12665Online publication date: 30-Jun-2021
https://doi.org/10.4000/cp.12665
Wang AZhao YChen Y(2021)Information Search Trail Recommendation Based on Markov Chain Model and Case-based ReasoningData and Information Management10.2478/dim-2020-00475:1(228-241)Online publication date: Jan-2021
https://doi.org/10.2478/dim-2020-0047
Palani SDing ZNguyen AChuang AMacNeil SDow SKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)CoNotate: Suggesting Queries Based on Notes Promotes Knowledge DiscoveryProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445618(1-14)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445618
Strahl JPeltonen JFloréen P(2021)Directing and Combining Multiple Queries for Exploratory Search by Visual Interactive Intent ModelingHuman-Computer Interaction – INTERACT 202110.1007/978-3-030-85613-7_34(514-535)Online publication date: 26-Aug-2021
https://doi.org/10.1007/978-3-030-85613-7_34
Umemoto KYamamoto TTanaka K(2020)Search Support ToolsUnderstanding and Improving Information Search10.1007/978-3-030-38825-6_8(139-160)Online publication date: 30-May-2020
https://doi.org/10.1007/978-3-030-38825-6_8
Veloso BAssunção RFerreira AZiviani N(2019)In Search of a Stochastic Model for the E-News ReaderACM Transactions on Knowledge Discovery from Data10.1145/336269513:6(1-27)Online publication date: 13-Nov-2019
https://dl.acm.org/doi/10.1145/3362695
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten