Article

Essential Pages

Authors:

Ashwin Swaminathan,

Cherian V. Mathew,

Darko KirovskiAuthors Info & Claims

WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Pages 173 - 182

https://doi.org/10.1109/WI-IAT.2009.33

Published: 15 September 2009 Publication History

Abstract

Results to Web search queries are ranked using heuristics that typically analyze the global link topology, user behavior, and content relevance. We point to a particular inefficiency of such methods: information redundancy. In queries where learning about a subject is an objective, modern search engines return relatively unsatisfactory results as they consider the query coverage by each page individually, not a set of pages as a whole. We address this problem using essential pages. If we denote as $\mathbb{S}_Q$ the total knowledge that exists on the Web about a given query $Q$, we want to build a search engine that returns a set of essential pages $E_Q$ that maximizes the information covered over $\mathbb{S}_Q$. We present a preliminary prototype that optimizes the selection of essential pages; we draw some informal comparisons with respect to existing search engines; and finally, we evaluate our prototype using a blind-test user study.

References

[1]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW, 1998.

Digital Library

[2]

S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Document Retrieval Systems, Vol.3, 1988.

Digital Library

[3]

S. Lawrence and L. Giles. Context and page analysis for improved Web search. IEEE Internet Computing, Vol.2, (no.4), pp. 38-46, 1998.

Digital Library

[4]

E. Agichtein, et al. Improving Web Search Ranking by Incorporating User Behavior. ACM SIGIR, 2006.

Digital Library

[5]

A.N. Langville and C.D. Meyer. Deeper Inside PageRank. Internet Mathematics, Vol.1, (no.3), pp. 335-80, 2003.

[6]

J. Kleinberg. Authoritative sources in a hyperlinked environment. ACM SODA, 1998.

Digital Library

[7]

K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. HLT/NAACL, 2004.

[8]

D. Harman. Overview of the TREC 2002 novelty track. TREC, 2003.

[9]

O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks, Vol.31, pp. 1361-74, 1999.

Digital Library

[10]

A. Broder. A Taxonomy of Web Search. SIGIR Forum, Vol.36, (no.2), 2002.

Digital Library

[11]

D.E. Rose and D. Levinson. Understanding User Goals in Web Search. WWW, 2004.

Digital Library

[12]

P. Pudil, et al. Floating search methods in feature selection. Pattern Recognition Letters, Vol.15, (no.11), pp. 1119-25, 1994.

Digital Library

[13]

H. Chen and D.R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. ACM SIGIR, 2006.

Digital Library

[14]

J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. ACM SIGIR, 1998.

Digital Library

[15]

C. Zhai, et al. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. ACM SIGIR, 2003.

Digital Library

[16]

B. Zhang, et al. Improving Web Search Results Using Affinity Graph. ACM SIGIR, 2005.

Digital Library

[17]

S.E. Robertson and K.S. Jones. Simple Proven Approaches to Text Retrieval. Tech. Report TR356, Cambridge University Computer Laboratory, 1997.

[18]

W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice Hall, 1992.

Digital Library

[19]

C.J. van Rijsbergen, et al. New models in probabilistic information retrieval. London British Library R&D Report, no.5587, 1980.

[20]

P. Schauble. Multimedia Information Retrieval. Springer, 1997.

Digital Library

[21]

List of stop words, http://www.dcs.gla.ac.uk/idom/ir_ resources.

[22]

R.O. Duda, et al. Pattern Classification. John Wiley & Sons, Inc., 2000.

Digital Library

[23]

T.H. Cormen, et al. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.

Digital Library

[24]

R.A. Baeza-Yates and B.A. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.

Digital Library

[25]

D.K. Harman. Common Evaluation Measures. Text Retrieval Conference, 2005.

[26]

K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. ACM SIGIR, 2000.

Digital Library

Cited By

Balkanski ERubinstein ASinger Y(2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3511018
Balkanski ERubinstein ASinger YHatami HMcKenzie PKing V(2017)The limitations of optimization from samplesProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3055399.3055406(1016-1027)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3055399.3055406
Baeza-Yates RBoldi PChierichetti FGangemi ALeonardi SPanconesi A(2015)Essential Web Pages Are Easy to FindProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741100(97-107)Online publication date: 18-May-2015
https://dl.acm.org/doi/10.1145/2736277.2741100
Show More Cited By

Index Terms

Essential Pages
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Evaluation of retrieval results

Recommendations

An approach to use query-related web context on document ranking
ICUIMC '11: Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication

With the development of Web search engines, it is considered as an important task to provide retrieved documents in a proper manner. Many search engines have used various document ranking algorithms to provide their retrieved documents in a more ...
Ranking Web Pages Using Machine Learning Approaches
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

One of the key components which ensures the acceptance of web search service is the web page ranker - a component which is said to have been the main contributing factor to the early successes of Google. It is well established that a machine learning ...
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

September 2009

726 pages

ISBN:9780769538013

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

IEEE Computer Society

United States

Publication History

Published: 15 September 2009

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Balkanski ERubinstein ASinger Y(2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3511018
Balkanski ERubinstein ASinger YHatami HMcKenzie PKing V(2017)The limitations of optimization from samplesProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3055399.3055406(1016-1027)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3055399.3055406
Baeza-Yates RBoldi PChierichetti FGangemi ALeonardi SPanconesi A(2015)Essential Web Pages Are Easy to FindProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741100(97-107)Online publication date: 18-May-2015
https://dl.acm.org/doi/10.1145/2736277.2741100
Raman KBennett PCollins-Thompson K(2014)Understanding Intrinsic Diversity in Web SearchACM Transactions on Information Systems10.1145/262955332:4(1-45)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2629553
Cattelan RKirovski D(2012)Towards improving the online shopping experienceWeb Intelligence and Agent Systems10.5555/2589968.258997410:2(209-231)Online publication date: 1-Apr-2012
https://dl.acm.org/doi/10.5555/2589968.2589974
Sipos RShivaswamy PJoachims TDaelemans W(2012)Large-margin learning of submodular summarization modelsProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380846(224-233)Online publication date: 23-Apr-2012
https://dl.acm.org/doi/10.5555/2380816.2380846
Zheng WFang HCheng HWang X(2012)Diversifying Search Results through Pattern-Based Subtopic ModelingInternational Journal on Semantic Web & Information Systems10.4018/jswis.20121001038:4(37-56)Online publication date: 1-Oct-2012
https://dl.acm.org/doi/10.4018/jswis.2012100103
Di Marco ANavigli R(2011)Clustering web search results with maximum spanning treesProceedings of the 12th international conference on Artificial intelligence around man and beyond10.5555/2041977.2042002(201-212)Online publication date: 15-Sep-2011
https://dl.acm.org/doi/10.5555/2041977.2042002
Stein BGollub THoppe D(2011)Beyond precision@10Proceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063910(2141-2144)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063910
Marco ANavigli R(2011)Clustering Web Search Results with Maximum Spanning TreesProceedings of the XIIth International Conference on AI*IA 2011: Artificial Intelligence Around Man and Beyond - Volume 693410.1007/978-3-642-23954-0_20(201-212)Online publication date: 15-Sep-2011
https://dl.acm.org/doi/10.1007/978-3-642-23954-0_20
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents