Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1183614.1183713acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Document re-ranking using cluster validation and label propagation

Published: 06 November 2006 Publication History

Abstract

This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the top ones via cluster validation-based k-means clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. Evaluation on benchmark corpora shows that the approach can achieve significant improvement over standard baselines and performs better than other related approaches.

References

[1]
Balinski, J., Danilowicz, C. 2005. Re-ranking Method Based on Inter-document Distance. Information Processing and Management 41(2005) 759--775.
[2]
Bear J., Israel, D., Petit J., Martin D. Using Information Extraction to Improve Document Retrieval. Proceedings of the Sixth Text Retrieval Conference. 1997.
[3]
Belkin, M., & Niyogi, P. 2002. Using Manifold Structure for Partially Labeled Classification. Advances in Neural Information Processing Systems 15.
[4]
Crouch, C., Crouch, D., Chen, Q. and Holtz, S. 2002. Improving the Retrieval Effectiveness of Very Short Queries. Information Processing and Management, 38(2002).
[5]
Diaz, F., Regularizing Ad Hoc Retrieval Scores. In the Proceedings of the Fourteenth International Conference on Information and Knowledge Management (CIKM), 2005.
[6]
Kamps, J. 2004. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary. The 21th European Conference on In-formation Retrieval.
[7]
Kurland O., Lee L. 2005. PageRank without Hyper-links: Structural Re-ranking using Links Induced by Language models. In the Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[8]
Lee K., Park Y., Choi, K. S. 2001. Document Re-ranking Model Using Clusters. Information Processing and Management. V. 37 n.1, p1--14.
[9]
Lin, J. 1991. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory, 37:1, 145--150.
[10]
Liu, X.Y and Croft W.B, 2004. Cluster Based Retrieval Using Language Models. In Proceedings of SIGIR, pp. 186--193.
[11]
Luk, R. W. P., Wong, K. F. 2004. Pseudo-Relevance Feedback and Title Re-Ranking for Chinese IR. In Proceedings of NTCIR Workshop 4.
[12]
M. Mitra., A. Singhal. and C. Buckley. 1998. Improving Automatic Query Expansion. In Proc. ACM SIGIR'98.
[13]
Niu Z. Y., Ji D. H., and Tan C. L. 2005. Word Sense Disambiguation Using Label Propagation Based Semi-supervised Learning. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL05), Ann Arbor, Michigan, US, pp.395--402.
[14]
Niu Z. Y., Ji D. H., and Tan C. L. 2004. Document Clustering based on Cluster Validation. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM-2004), Washington, DC, USA, pp.501--506.
[15]
Qu, Y. L., Xu, G. W., Wang J. 2000. Rerank Method Based on Individual Thesaurus. Proceedings of NTCIR2 Workshop.
[16]
Szummer, M., & Jaakkola, T. 2001. Partially Labeled Classification with Markov Random Walks. Advances in Neural Information Processing Systems 14.
[17]
Xu J., Croft, W. B. 1996. Query Expansion Using Local and Global Document Analysis. In Proc. ACM SIGIR'96.
[18]
Xu J., Croft, W. B. 2000. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Transactions on Information Systems, 18(1):79--112, 2000.
[19]
Yang L. P., Ji D. H. 2005(a). Chinese Information Retrieval Based on Terms and Relevant terms. ACM Transactions on Asian Language Information Processing. Vol. 4, Issue 3 (2005). pp. 357--374.
[20]
Yang L.P. Ji D.H. and Leong M.K. 2005(b). Chinese Document Re-ranking Based on Term Distribution and Maximal Marginal Relevance. Second Asia Information Retrieval Symposium (AIRS 2005). LNCS 3689, Pp. 299--311.
[21]
Zhang B. Y, .Li H., Liu Y., Ji L., Xi W., Fan W., Chen Z., Ma W. 2005. Improving Search Results using Affinity Graph. In the Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[22]
Zhu, X. & Ghahramani, Z. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. CMU CALD technical report CMU-CALD-02-107.
[23]
Zhu, X., Ghahramani, Z., & Lafferty, J. 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In Proceedings of the 20th International Conference on Machine Learning.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
November 2006
916 pages
ISBN:1595934332
DOI:10.1145/1183614
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data manifold structure
  2. document re-ranking
  3. information retrieval
  4. label propagation

Qualifiers

  • Article

Conference

CIKM06
CIKM06: Conference on Information and Knowledge Management
November 6 - 11, 2006
Virginia, Arlington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)As Stable As You AreProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209567(33-37)Online publication date: 3-Jul-2018
  • (2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
  • (2017)A Normalized Framework Based on Multiple Relationships for Document Re-rankingInformation Retrieval10.1007/978-3-319-68699-8_10(122-135)Online publication date: 21-Oct-2017
  • (2016)Selective Cluster-Based Document RetrievalProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983737(1473-1482)Online publication date: 24-Oct-2016
  • (2016)Multi-level reranking approach for bug localizationExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1215033:3(286-294)Online publication date: 1-Jun-2016
  • (2015)Learning Asymmetric Co-RelevanceProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809454(281-290)Online publication date: 27-Sep-2015
  • (2015)Graph-Based Label Propagation in Digital MediaACM Computing Surveys10.1145/270038147:3(1-35)Online publication date: 1-Apr-2015
  • (2014)The Cluster Hypothesis in Information RetrievalProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.1007/978-3-319-06028-6_105(823-826)Online publication date: 13-Apr-2014
  • (2013)A deterministic resampling method using overlapping document clusters for pseudo-relevance feedbackInformation Processing and Management: an International Journal10.1016/j.ipm.2013.01.00149:4(792-806)Online publication date: 1-Jul-2013
  • (2012)Exploring the cluster hypothesis, and cluster-based retrieval, over the webProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398678(2507-2510)Online publication date: 29-Oct-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media