article

Diversifying Search Results through Pattern-Based Subtopic Modeling

Authors:

Xuanhui WangAuthors Info & Claims

International Journal on Semantic Web & Information Systems, Volume 8, Issue 4

Pages 37 - 56

https://doi.org/10.4018/jswis.2012100103

Published: 01 October 2012 Publication History

Abstract

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain excessively redundant information. Therefore, satisfying search results should be not only relevant to the query but also diversified to cover different subtopics of the query. In this paper, the authors propose a novel pattern-based framework to diversify search results, where each pattern is a set of semantically related terms covering the same subtopic. They first apply a maximal frequent pattern mining algorithm to extract the patterns from retrieval results of the query. The authors then propose to model a subtopic with either a single pattern or a group of similar patterns. A profile-based clustering method is adapted to group similar patterns based on their context information. The search results are then diversified using the extracted subtopics. Experimental results show that the proposed pattern-based methods are effective to diversify the search results.

References

[1]

Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining pp. 5-14. New York, NY: ACM.

[2]

Agrawal, R., Imieliński, T., & Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data pp. 207-216. New York, NY: ACM.

Digital Library

[3]

Agrawal, R., & Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases pp. 487-499. San Francisco, CA: Morgan Kaufmann Publishers Inc.

Digital Library

[4]

Balog, K., Bron, M., He, J., Hofmann, K., Meij, E., & Rijke, M. ' Weerkamp, W. 2009. The University of Amsterdam at TREC 2009. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://ilps.science.uva.nl/sites/default/files/trec2009-wn.pdf

[5]

Bayardo, R. J. 1998. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data pp. 85-93. New York, NY: ACM.

[6]

Berger, A., & Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 222-229. New York, NY: ACM.

[7]

Bi, W., Yu, X., Liu, Y., Guan, F., Peng, Z., Xu, H., & Cheng, X. 2009. ICTNET at web track 2009 diversity task. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/ictnet.WEB-DIV.pdf

[8]

Blei, D. M., Ng, A. Y., & Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

Digital Library

[9]

Boyce, B. 1982. Beyond topicality: A two stage view of relevance and the retrieval process. Information Processing & Management, 183, 105-109.

[10]

Carbonell, J., & Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 335-336. New York, NY: ACM.

[11]

Carterette, B., & Chandar, P. 2009. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proceedings of the 18th ACM Conference on Information and Knowledge Management pp. 1287-1296. New York, NY: ACM.

[12]

Chen, H., & Karger, D. R. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 429 - 436. New York, NY: ACM.

[13]

Clarke, C. L. A., Craswell, N., & Soboroff, I. 2009. Overview of the TREC 2009 web track. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf

[14]

Clarke, C. L. A., Craswell, N., Soboroff, I., & Cormack, G. V. 2010. Overview of the TREC 2010 web track. In Proceedings of the Nineteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec19/papers/WEB.OVERVIEW.pdf

[15]

Clarke, C. L. A., Craswell, N., Soboroff, I., & Voorhees, E. M. 2011. Overview of the TREC 2011 web track. In Proceedings of the Twentieth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec20/papers/WEB.OVERVIEW.pdf

[16]

Cover, T. M., & Thomas, J. A. 1991. Elements of information theory. New York, NY: Wiley-Interscience.

[17]

Craswell, N., Fetterly, D., Najork, M., Robertson, S., & Yilmaz, E. 2009. Microsoft Research at TREC 2009: Web and relevance feedback track. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/microsoft.WEB.RF.pdf

[18]

Dou, Z., Chen, K., Song, R., Ma, Y., Shi, S., & Wen, J. R. 2009. Microsoft Research Asia at the web track of TREC 2009. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/microsoft-asia.WEB.pdf

[19]

Fang, H., & Zhai, C. 2006. Semantic term matching in axiomatic approaches to information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 115-122. New York, NY: ACM.

[20]

Goffman, W. 1964. A search procedure for information retrieval. Information Storage and Retrieval, 21, 73-78.

[21]

Gollapudi, S., & Sharma, A. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web pp. 381-390. New York, NY: ACM.

[22]

Han, J., Pei, J., & Yin, Y. 2000. Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of data pp. 1-12. New York, NY: ACM.

[23]

Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence pp. 289-296. San Francisco, CA: Morgan Kaufmann.

[24]

Lavrenko, V., & Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 120-127. New York, NY: ACM.

[25]

Li, Z., Cheng, F., Xiang, Q., Miao, J., Xue, Y., Zhu, T., et al. 2009. THUIR at TREC 2009 web track: Finding relevant and diverse results for large scale web search. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/tsinghuau.WEB.pdf

[26]

Mccreadie, R., Macdonald, C., Ounis, I., Peng, J., & Santos, R. 2009. University of Glasgow at TREC 2009: Experiments with Terrier. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved 2012, December 9, from http://trec.nist.gov/pubs/trec18/papers/uglasgow.BLOG.ENT.MQ.RF.WEB.pdf

[27]

Mendenhall, W., Wackerly, D. D., & Schaeffer, R. L. 1990. Mathematical statistics with applications. Boston, MA: PWS-KENT.

[28]

Radlinski, F., Bennett, P. N., Carterette, B., & Joachims, T. 2009. Redundancy, diversity and interdependent document relevance. ACM SIGIR Forum, 432, 46-52.

Digital Library

[29]

Radlinski, F., & Dumais, S. 2006. Improving personalized web search using result diversification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 691-692. New York, NY: ACM.

[30]

Salton, G., & Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24, 513-523.

Digital Library

[31]

Santos, R. L. T., Macdonald, C., & Ounis, I. 2010. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th International Conference on World Wide Web pp. 881-890. New York, NY: ACM.

[32]

Schütze, H., & Pedersen, J. O. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 333, 307-318.

Digital Library

[33]

Swaminathan, A., Mathew, C. V., & Kirovski, D. 2009. Essential pages. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology pp. 173-182. Washington, DC: IEEE Computer Society.

Digital Library

[34]

Xue, G.-R., Dai, W., Yang, Q., & Yu, Y. 2008. Topic-bridged PLSA for cross-domain text classification. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 627-634. New York, NY: ACM.

[35]

Yan, X., Cheng, H., Han, J., & Xin, D. 2005. Summarizing itemset patterns: A profile-based approach. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining pp. 314-323. New York, NY: ACM.

[36]

Yue, Y., & Joachims, T. 2008. Predicting diverse subsets using structural SVMs. In Proceedings of the 25th International Conference on Machine Learning pp. 1224-1231. New York, NY: ACM.

[37]

Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 123, 372-390.

Digital Library

[38]

Zhai, C., Cohen, W. W., & Lafferty, J. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 10-17. New York, NY: ACM.

[39]

Zhai, C., & Lafferty, J. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 334-342. New York, NY: ACM.

[40]

Zheng, W., & Fang, H. 2011. A comparative study of search result diversification methods. Proceedings of Diversity in Document Retrieval 2011. Retrieved December 9, 2012, from http://www.eecis.udel.edu/~zwei/paper/ddr.pdf

[41]

Zheng, W., Xuanhui, W., Fang, H., & Cheng, H. 2012. Coverage-based search result diversification. Journal of Information Retrieval, 155, 433-457.

Digital Library

[42]

Zobel, J., & Moffat, A. 1998. Exploring the similarity space. ACM SIGIR Forum, 321, 18-34.

Digital Library

Cited By

Wu BWei BLiu JGuo ZZheng YChen Y(2018)Facet annotation by extending cnn with a matching strategyNeural Computation10.1162/neco_a_0107730:6(1647-1672)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1162/neco_a_01077

Diversifying Search Results through Pattern-Based Subtopic Modeling
1. Computing methodologies
2. Information systems

Recommendations

An exploration of pattern-based subtopic modeling for search result diversification
JCDL '11: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result ...
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

The NTCIR INTENT task comprises two subtasks: {\em Subtopic Mining}, where systems are required to return a ranked list of {\em subtopic strings} for each given query; and {\em Document Ranking}, where systems are required to return a diversified web ...
The impact of intent selection on diversified search evaluation
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal on Semantic Web & Information Systems

International Journal on Semantic Web & Information Systems Volume 8, Issue 4

October 2012

116 pages

ISSN:1552-6283

EISSN:1552-6291

Issue’s Table of Contents

Publisher

IGI Global

United States

Publication History

Published: 01 October 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu BWei BLiu JGuo ZZheng YChen Y(2018)Facet annotation by extending cnn with a matching strategyNeural Computation10.1162/neco_a_0107730:6(1647-1672)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1162/neco_a_01077

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents