Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2808194.2809454acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Learning Asymmetric Co-Relevance

Published: 27 September 2015 Publication History

Abstract

Several applications in information retrieval rely on asymmetric co-relevance estimation; that is, estimating the relevance of a document to a query under the assumption that another document is relevant. We present a supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as document-quality measures. Empirical evaluation demonstrates the merits of using the co-relevance estimate in various applications, including cluster-based and graph-based document retrieval. Specifically, the resultant performance transcends that of using a wide variety of alternative estimates, mostly symmetric inter-document similarity measures that dominate past work.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D., and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proc. of TREC, 2004.
[2]
J. A. Aslam and M. Frost. An information-theoretic measure for document similarity. In Proc. of SIGIR, pages 449--450, 2003.
[3]
J. A. Aslam and M. Montague. Models for metasearch. In Proc. of SIGIR, pages 276--284, 2001.
[4]
M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011.
[5]
M. Bendersky and O. Kurland. Utilizing passage-based language models for ad hoc document retrieval. Information Retrieval, 13(2):157--187, 2010.
[6]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5):441--465, 2011.
[7]
F. Diaz. Regularizing query-based retrieval scores. Information Retrieval, 10(6):531--562, 2007.
[8]
E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proc. of TREC, 1994.
[9]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 28(5):1379--1389, 2001.
[10]
N. Fuhr, M. Lechtenfeld, B. Stein, and T. Gollub. The optimum clustering framework: implementing the cluster hypothesis. Information Retrieval, 15(2):93--115, 2012.
[11]
A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science, 37(1):3--11, 1986.
[12]
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proc. of SIGIR, pages 76--84, 1996.
[13]
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
[14]
T. Joachims. Training linear SVMs in linear time. In Proc. of KDD, pages 217--226, 2006.
[15]
E. Krikon, O. Kurland, and M. Bendersky. Utilizing inter-passage and inter-document similarities for re-ranking search results. ACM Transactions on Information Systems, 29(1), 2010.
[16]
O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008.
[17]
O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, 2009.
[18]
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proc. of SIGIR, pages 194--201, 2004.
[19]
O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006.
[20]
O. Kurland and L. Lee. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM Transactions on information systems, 28(4):18, 2010.
[21]
J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Language Modeling and Information Retrieval, pages 1--10. Kluwer Academic Publishers, 2003.
[22]
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.
[23]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.
[24]
T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 2009.
[25]
X. Liu and W. B. Croft. Passage retrieval based on language models. In Proc. of CIKM, pages 375--382, 2002.
[26]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004.
[27]
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval, University of Massachusetts, 2006.
[28]
X. Liu and W. B. Croft. Representing clusters for retrieval. In Proc. of SIGIR, pages 671--672, 2006.
[29]
X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.
[30]
S.-H. Na. Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval. Information Processing and Management, 49(2):558--575, 2013.
[31]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. of WWW, pages 83--92, 2006.
[32]
S. Paliwal and V. Pudi. Investigating usage of text segmentation and inter-passage similarities to improve text document clustering. In Proc. of MLDM, pages 555--565, 2012.
[33]
F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013.
[34]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of TREC, 1994.
[35]
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[36]
K. Sparck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments - part 1. Information Processing and Management, 36(6):779--808, 2000.
[37]
A. Tombros and C. J. van Rijsbergen. Query-sensitive similarity measures for information retrieval. Knowledge and Information Systems, 6(5):617--642, 2004.
[38]
C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
[39]
C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999.
[40]
E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985.
[41]
X. Wan. A novel document similarity measure based on earth mover's distance. Information Sciences, 177(18):3718--3730, 2007.
[42]
X. Wan. Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowledge and Information Systems, 15(1):55--73, 2008.
[43]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In Proc. of SIGIR, pages 178--185, 2006.
[44]
J. S. Whissell and C. L. A. Clarke. Improving document clustering using Okapi BM25 feature weighting. Information Retrieval, 14(5):466--487, 2011.
[45]
J. S. Whissell and C. L. A. Clarke. Effective measures for inter-document similarity. In Proc. of CIKM, pages 1361--1370, 2013.
[46]
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.
[47]
L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006.
[48]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.
[49]
B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In Proc. of SIGIR, pages 504--511, 2005.

Cited By

View all
  • (2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
  • (2019)Query Performance Prediction for Pseudo-Feedback-Based RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331369(1261-1264)Online publication date: 18-Jul-2019
  • (2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
  • Show More Cited By

Index Terms

  1. Learning Asymmetric Co-Relevance

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
    September 2015
    402 pages
    ISBN:9781450338332
    DOI:10.1145/2808194
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. asymmetric co-relevance

    Qualifiers

    • Research-article

    Funding Sources

    • byMicrosoft Research

    Conference

    ICTIR '15
    Sponsor:

    Acceptance Rates

    ICTIR '15 Paper Acceptance Rate 29 of 57 submissions, 51%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
    • (2019)Query Performance Prediction for Pseudo-Feedback-Based RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331369(1261-1264)Online publication date: 18-Jul-2019
    • (2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
    • (2018)Enhanced Performance Prediction of Fusion-based RetrievalProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234950(195-198)Online publication date: 10-Sep-2018
    • (2018)Utilizing Inter-Passage Similarities for Focused RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210222(1453-1453)Online publication date: 27-Jun-2018
    • (2018)Testing the Cluster Hypothesis with Focused and Graded Relevance JudgmentsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210120(1173-1176)Online publication date: 27-Jun-2018
    • (2017)Clustering small-sized collections of short textsInformation Retrieval Journal10.1007/s10791-017-9324-821:4(273-306)Online publication date: 30-Nov-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media