Google Scholar

Supervised semantic indexing

B Bai, J Weston, D Grangier, R Collobert… - Proceedings of the 18th …, 2009 - dl.acm.org

B Bai, J Weston, D Grangier, R Collobert, K Sadamasa, Y Qi, O Chapelle, K Weinberger

Proceedings of the 18th ACM conference on Information and knowledge management, 2009•dl.acm.org

In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH). We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.

ACM Digital Library

Show moreShow less

Save Cite Cited by 120 Related articles All 25 versions

Cite

Advanced search

Saved to My library

Supervised semantic indexing