Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/WI-IAT.2009.33acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
Article

Essential Pages

Published: 15 September 2009 Publication History

Abstract

Results to Web search queries are ranked using heuristics that typically analyze the global link topology, user behavior, and content relevance. We point to a particular inefficiency of such methods: information redundancy. In queries where learning about a subject is an objective, modern search engines return relatively unsatisfactory results as they consider the query coverage by each page individually, not a set of pages as a whole. We address this problem using essential pages. If we denote as $\mathbb{S}_Q$ the total knowledge that exists on the Web about a given query $Q$, we want to build a search engine that returns a set of essential pages $E_Q$ that maximizes the information covered over $\mathbb{S}_Q$. We present a preliminary prototype that optimizes the selection of essential pages; we draw some informal comparisons with respect to existing search engines; and finally, we evaluate our prototype using a blind-test user study.

References

[1]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW, 1998.
[2]
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Document Retrieval Systems, Vol.3, 1988.
[3]
S. Lawrence and L. Giles. Context and page analysis for improved Web search. IEEE Internet Computing, Vol.2, (no.4), pp. 38-46, 1998.
[4]
E. Agichtein, et al. Improving Web Search Ranking by Incorporating User Behavior. ACM SIGIR, 2006.
[5]
A.N. Langville and C.D. Meyer. Deeper Inside PageRank. Internet Mathematics, Vol.1, (no.3), pp. 335-80, 2003.
[6]
J. Kleinberg. Authoritative sources in a hyperlinked environment. ACM SODA, 1998.
[7]
K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. HLT/NAACL, 2004.
[8]
D. Harman. Overview of the TREC 2002 novelty track. TREC, 2003.
[9]
O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks, Vol.31, pp. 1361-74, 1999.
[10]
A. Broder. A Taxonomy of Web Search. SIGIR Forum, Vol.36, (no.2), 2002.
[11]
D.E. Rose and D. Levinson. Understanding User Goals in Web Search. WWW, 2004.
[12]
P. Pudil, et al. Floating search methods in feature selection. Pattern Recognition Letters, Vol.15, (no.11), pp. 1119-25, 1994.
[13]
H. Chen and D.R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. ACM SIGIR, 2006.
[14]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. ACM SIGIR, 1998.
[15]
C. Zhai, et al. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. ACM SIGIR, 2003.
[16]
B. Zhang, et al. Improving Web Search Results Using Affinity Graph. ACM SIGIR, 2005.
[17]
S.E. Robertson and K.S. Jones. Simple Proven Approaches to Text Retrieval. Tech. Report TR356, Cambridge University Computer Laboratory, 1997.
[18]
W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice Hall, 1992.
[19]
C.J. van Rijsbergen, et al. New models in probabilistic information retrieval. London British Library R&D Report, no.5587, 1980.
[20]
P. Schauble. Multimedia Information Retrieval. Springer, 1997.
[21]
List of stop words, http://www.dcs.gla.ac.uk/idom/ir_ resources.
[22]
R.O. Duda, et al. Pattern Classification. John Wiley & Sons, Inc., 2000.
[23]
T.H. Cormen, et al. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.
[24]
R.A. Baeza-Yates and B.A. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
[25]
D.K. Harman. Common Evaluation Measures. Text Retrieval Conference, 2005.
[26]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. ACM SIGIR, 2000.

Cited By

View all
  • (2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
  • (2017)The limitations of optimization from samplesProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3055399.3055406(1016-1027)Online publication date: 19-Jun-2017
  • (2015)Essential Web Pages Are Easy to FindProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741100(97-107)Online publication date: 18-May-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
September 2009
726 pages
ISBN:9780769538013

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 15 September 2009

Check for updates

Author Tags

  1. Web page ranking
  2. Web search
  3. coverage
  4. learning queries
  5. redundancy elimination

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
  • (2017)The limitations of optimization from samplesProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3055399.3055406(1016-1027)Online publication date: 19-Jun-2017
  • (2015)Essential Web Pages Are Easy to FindProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741100(97-107)Online publication date: 18-May-2015
  • (2014)Understanding Intrinsic Diversity in Web SearchACM Transactions on Information Systems10.1145/262955332:4(1-45)Online publication date: 28-Oct-2014
  • (2012)Towards improving the online shopping experienceWeb Intelligence and Agent Systems10.5555/2589968.258997410:2(209-231)Online publication date: 1-Apr-2012
  • (2012)Large-margin learning of submodular summarization modelsProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380846(224-233)Online publication date: 23-Apr-2012
  • (2012)Diversifying Search Results through Pattern-Based Subtopic ModelingInternational Journal on Semantic Web & Information Systems10.4018/jswis.20121001038:4(37-56)Online publication date: 1-Oct-2012
  • (2011)Clustering web search results with maximum spanning treesProceedings of the 12th international conference on Artificial intelligence around man and beyond10.5555/2041977.2042002(201-212)Online publication date: 15-Sep-2011
  • (2011)Beyond precision@10Proceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063910(2141-2144)Online publication date: 24-Oct-2011
  • (2011)Clustering Web Search Results with Maximum Spanning TreesProceedings of the XIIth International Conference on AI*IA 2011: Artificial Intelligence Around Man and Beyond - Volume 693410.1007/978-3-642-23954-0_20(201-212)Online publication date: 15-Sep-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media