research-article

Learning Asymmetric Co-Relevance

Authors:

Filip Radlinski,

Milad ShokouhiAuthors Info & Claims

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pages 281 - 290

https://doi.org/10.1145/2808194.2809454

Published: 27 September 2015 Publication History

Abstract

Several applications in information retrieval rely on asymmetric co-relevance estimation; that is, estimating the relevance of a document to a query under the assumption that another document is relevant. We present a supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as document-quality measures. Empirical evaluation demonstrates the merits of using the co-relevance estimate in various applications, including cluster-based and graph-based document retrieval. Specifically, the resultant performance transcends that of using a wide variety of alternative estimates, mostly symmetric inter-document similarity measures that dominate past work.

References

[1]

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D., and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proc. of TREC, 2004.

[2]

J. A. Aslam and M. Frost. An information-theoretic measure for document similarity. In Proc. of SIGIR, pages 449--450, 2003.

Digital Library

[3]

J. A. Aslam and M. Montague. Models for metasearch. In Proc. of SIGIR, pages 276--284, 2001.

Digital Library

[4]

M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011.

Digital Library

[5]

M. Bendersky and O. Kurland. Utilizing passage-based language models for ad hoc document retrieval. Information Retrieval, 13(2):157--187, 2010.

Digital Library

[6]

G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5):441--465, 2011.

Digital Library

[7]

F. Diaz. Regularizing query-based retrieval scores. Information Retrieval, 10(6):531--562, 2007.

Digital Library

[8]

E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proc. of TREC, 1994.

[9]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 28(5):1379--1389, 2001.

[10]

N. Fuhr, M. Lechtenfeld, B. Stein, and T. Gollub. The optimum clustering framework: implementing the cluster hypothesis. Information Retrieval, 15(2):93--115, 2012.

Digital Library

[11]

A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science, 37(1):3--11, 1986.

[12]

M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proc. of SIGIR, pages 76--84, 1996.

Digital Library

[13]

N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.

[14]

T. Joachims. Training linear SVMs in linear time. In Proc. of KDD, pages 217--226, 2006.

Digital Library

[15]

E. Krikon, O. Kurland, and M. Bendersky. Utilizing inter-passage and inter-document similarities for re-ranking search results. ACM Transactions on Information Systems, 29(1), 2010.

Digital Library

[16]

O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008.

Digital Library

[17]

O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, 2009.

Digital Library

[18]

O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proc. of SIGIR, pages 194--201, 2004.

Digital Library

[19]

O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006.

Digital Library

[20]

O. Kurland and L. Lee. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM Transactions on information systems, 28(4):18, 2010.

Digital Library

[21]

J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Language Modeling and Information Retrieval, pages 1--10. Kluwer Academic Publishers, 2003.

[22]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.

Digital Library

[23]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.

Digital Library

[24]

T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 2009.

[25]

X. Liu and W. B. Croft. Passage retrieval based on language models. In Proc. of CIKM, pages 375--382, 2002.

Digital Library

[26]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004.

Digital Library

[27]

X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval, University of Massachusetts, 2006.

[28]

X. Liu and W. B. Croft. Representing clusters for retrieval. In Proc. of SIGIR, pages 671--672, 2006.

Digital Library

[29]

X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.

Digital Library

[30]

S.-H. Na. Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval. Information Processing and Management, 49(2):558--575, 2013.

Digital Library

[31]

A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. of WWW, pages 83--92, 2006.

Digital Library

[32]

S. Paliwal and V. Pudi. Investigating usage of text segmentation and inter-passage similarities to improve text document clustering. In Proc. of MLDM, pages 555--565, 2012.

Digital Library

[33]

F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013.

Digital Library

[34]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of TREC, 1994.

[35]

G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.

Digital Library

[36]

K. Sparck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments - part 1. Information Processing and Management, 36(6):779--808, 2000.

Digital Library

[37]

A. Tombros and C. J. van Rijsbergen. Query-sensitive similarity measures for information retrieval. Knowledge and Information Systems, 6(5):617--642, 2004.

Digital Library

[38]

C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.

Digital Library

[39]

C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999.

Digital Library

[40]

E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985.

Digital Library

[41]

X. Wan. A novel document similarity measure based on earth mover's distance. Information Sciences, 177(18):3718--3730, 2007.

Digital Library

[42]

X. Wan. Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowledge and Information Systems, 15(1):55--73, 2008.

Digital Library

[43]

X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In Proc. of SIGIR, pages 178--185, 2006.

Digital Library

[44]

J. S. Whissell and C. L. A. Clarke. Improving document clustering using Okapi BM25 feature weighting. Information Retrieval, 14(5):466--487, 2011.

Digital Library

[45]

J. S. Whissell and C. L. A. Clarke. Effective measures for inter-document similarity. In Proc. of CIKM, pages 1361--1370, 2013.

Digital Library

[46]

P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.

[47]

L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006.

Digital Library

[48]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.

Digital Library

[49]

B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In Proc. of SIGIR, pages 504--511, 2005.

Digital Library

Cited By

Guti´errez-Soto CPalomino MCuriel ACerda HRain F(2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
https://doi.org/10.25046/aj0506105
Roitman HKurland OPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Query Performance Prediction for Pseudo-Feedback-Based RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331369(1261-1264)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331369
Gutierrez-Soto CDiaz AHubert G(2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
https://doi.org/10.1109/SCCC49216.2019.8966432
Show More Cited By

Index Terms

Learning Asymmetric Co-Relevance
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Learning-Based pseudo-relevance feedback for patent retrieval
IRFC'12: Proceedings of the 5th conference on Multidisciplinary Information Retrieval

Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor ...
Enhancing relevance models with adaptive passage retrieval
ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval

Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous ...
A context-dependent relevance model

Numerous past studies have demonstrated the effectiveness of the relevance modelRM for information retrieval IR. This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

September 2015

402 pages

ISBN:9781450338332

DOI:10.1145/2808194

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Bruce Croft
University of Massachusetts Amherst, USA
,
Program Chairs:
Arjen de Vries
CWI Amsterdam, The Netherlands
,
Chengxiang Zhai
University of Illinois at Urbana-Champaign, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

asymmetric co-relevance

Qualifiers

Research-article

Funding Sources

byMicrosoft Research

Conference

ICTIR '15

Sponsor:

SIGIR

ICTIR '15: ACM SIGIR International Conference on the Theory of Information Retrieval

September 27 - 30, 2015

Massachusetts, Northampton, USA

Acceptance Rates

ICTIR '15 Paper Acceptance Rate 29 of 57 submissions, 51%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
130
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guti´errez-Soto CPalomino MCuriel ACerda HRain F(2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
https://doi.org/10.25046/aj0506105
Roitman HKurland OPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Query Performance Prediction for Pseudo-Feedback-Based RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331369(1261-1264)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331369
Gutierrez-Soto CDiaz AHubert G(2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
https://doi.org/10.1109/SCCC49216.2019.8966432
Roitman HSong DLiu TSun LBruza PMelucci MSebastiani FYang G(2018)Enhanced Performance Prediction of Fusion-based RetrievalProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234950(195-198)Online publication date: 10-Sep-2018
https://dl.acm.org/doi/10.1145/3234944.3234950
Sheetrit ECollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Utilizing Inter-Passage Similarities for Focused RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210222(1453-1453)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210222
Sheetrit EShtok AKurland OShprincis ICollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Testing the Cluster Hypothesis with Focused and Graded Relevance JudgmentsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210120(1173-1176)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210120
Kotlerman LDagan IKurland O(2017)Clustering small-sized collections of short textsInformation Retrieval Journal10.1007/s10791-017-9324-821:4(273-306)Online publication date: 30-Nov-2017
https://doi.org/10.1007/s10791-017-9324-8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents