Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Diversifying Search Results through Pattern-Based Subtopic Modeling

Published: 01 October 2012 Publication History

Abstract

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain excessively redundant information. Therefore, satisfying search results should be not only relevant to the query but also diversified to cover different subtopics of the query. In this paper, the authors propose a novel pattern-based framework to diversify search results, where each pattern is a set of semantically related terms covering the same subtopic. They first apply a maximal frequent pattern mining algorithm to extract the patterns from retrieval results of the query. The authors then propose to model a subtopic with either a single pattern or a group of similar patterns. A profile-based clustering method is adapted to group similar patterns based on their context information. The search results are then diversified using the extracted subtopics. Experimental results show that the proposed pattern-based methods are effective to diversify the search results.

References

[1]
Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining pp. 5-14. New York, NY: ACM.
[2]
Agrawal, R., Imieliński, T., & Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data pp. 207-216. New York, NY: ACM.
[3]
Agrawal, R., & Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases pp. 487-499. San Francisco, CA: Morgan Kaufmann Publishers Inc.
[4]
Balog, K., Bron, M., He, J., Hofmann, K., Meij, E., & Rijke, M. ' Weerkamp, W. 2009. The University of Amsterdam at TREC 2009. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://ilps.science.uva.nl/sites/default/files/trec2009-wn.pdf
[5]
Bayardo, R. J. 1998. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data pp. 85-93. New York, NY: ACM.
[6]
Berger, A., & Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 222-229. New York, NY: ACM.
[7]
Bi, W., Yu, X., Liu, Y., Guan, F., Peng, Z., Xu, H., & Cheng, X. 2009. ICTNET at web track 2009 diversity task. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/ictnet.WEB-DIV.pdf
[8]
Blei, D. M., Ng, A. Y., & Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
[9]
Boyce, B. 1982. Beyond topicality: A two stage view of relevance and the retrieval process. Information Processing & Management, 183, 105-109.
[10]
Carbonell, J., & Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 335-336. New York, NY: ACM.
[11]
Carterette, B., & Chandar, P. 2009. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proceedings of the 18th ACM Conference on Information and Knowledge Management pp. 1287-1296. New York, NY: ACM.
[12]
Chen, H., & Karger, D. R. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 429 - 436. New York, NY: ACM.
[13]
Clarke, C. L. A., Craswell, N., & Soboroff, I. 2009. Overview of the TREC 2009 web track. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf
[14]
Clarke, C. L. A., Craswell, N., Soboroff, I., & Cormack, G. V. 2010. Overview of the TREC 2010 web track. In Proceedings of the Nineteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec19/papers/WEB.OVERVIEW.pdf
[15]
Clarke, C. L. A., Craswell, N., Soboroff, I., & Voorhees, E. M. 2011. Overview of the TREC 2011 web track. In Proceedings of the Twentieth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec20/papers/WEB.OVERVIEW.pdf
[16]
Cover, T. M., & Thomas, J. A. 1991. Elements of information theory. New York, NY: Wiley-Interscience.
[17]
Craswell, N., Fetterly, D., Najork, M., Robertson, S., & Yilmaz, E. 2009. Microsoft Research at TREC 2009: Web and relevance feedback track. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/microsoft.WEB.RF.pdf
[18]
Dou, Z., Chen, K., Song, R., Ma, Y., Shi, S., & Wen, J. R. 2009. Microsoft Research Asia at the web track of TREC 2009. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/microsoft-asia.WEB.pdf
[19]
Fang, H., & Zhai, C. 2006. Semantic term matching in axiomatic approaches to information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 115-122. New York, NY: ACM.
[20]
Goffman, W. 1964. A search procedure for information retrieval. Information Storage and Retrieval, 21, 73-78.
[21]
Gollapudi, S., & Sharma, A. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web pp. 381-390. New York, NY: ACM.
[22]
Han, J., Pei, J., & Yin, Y. 2000. Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of data pp. 1-12. New York, NY: ACM.
[23]
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence pp. 289-296. San Francisco, CA: Morgan Kaufmann.
[24]
Lavrenko, V., & Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 120-127. New York, NY: ACM.
[25]
Li, Z., Cheng, F., Xiang, Q., Miao, J., Xue, Y., Zhu, T., et al. 2009. THUIR at TREC 2009 web track: Finding relevant and diverse results for large scale web search. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved December 9, 2012, from http://trec.nist.gov/pubs/trec18/papers/tsinghuau.WEB.pdf
[26]
Mccreadie, R., Macdonald, C., Ounis, I., Peng, J., & Santos, R. 2009. University of Glasgow at TREC 2009: Experiments with Terrier. In Proceedings of the Eighteenth Text REtrieval Conference. Retrieved 2012, December 9, from http://trec.nist.gov/pubs/trec18/papers/uglasgow.BLOG.ENT.MQ.RF.WEB.pdf
[27]
Mendenhall, W., Wackerly, D. D., & Schaeffer, R. L. 1990. Mathematical statistics with applications. Boston, MA: PWS-KENT.
[28]
Radlinski, F., Bennett, P. N., Carterette, B., & Joachims, T. 2009. Redundancy, diversity and interdependent document relevance. ACM SIGIR Forum, 432, 46-52.
[29]
Radlinski, F., & Dumais, S. 2006. Improving personalized web search using result diversification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 691-692. New York, NY: ACM.
[30]
Salton, G., & Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24, 513-523.
[31]
Santos, R. L. T., Macdonald, C., & Ounis, I. 2010. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th International Conference on World Wide Web pp. 881-890. New York, NY: ACM.
[32]
Schütze, H., & Pedersen, J. O. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 333, 307-318.
[33]
Swaminathan, A., Mathew, C. V., & Kirovski, D. 2009. Essential pages. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology pp. 173-182. Washington, DC: IEEE Computer Society.
[34]
Xue, G.-R., Dai, W., Yang, Q., & Yu, Y. 2008. Topic-bridged PLSA for cross-domain text classification. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 627-634. New York, NY: ACM.
[35]
Yan, X., Cheng, H., Han, J., & Xin, D. 2005. Summarizing itemset patterns: A profile-based approach. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining pp. 314-323. New York, NY: ACM.
[36]
Yue, Y., & Joachims, T. 2008. Predicting diverse subsets using structural SVMs. In Proceedings of the 25th International Conference on Machine Learning pp. 1224-1231. New York, NY: ACM.
[37]
Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 123, 372-390.
[38]
Zhai, C., Cohen, W. W., & Lafferty, J. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 10-17. New York, NY: ACM.
[39]
Zhai, C., & Lafferty, J. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 334-342. New York, NY: ACM.
[40]
Zheng, W., & Fang, H. 2011. A comparative study of search result diversification methods. Proceedings of Diversity in Document Retrieval 2011. Retrieved December 9, 2012, from http://www.eecis.udel.edu/~zwei/paper/ddr.pdf
[41]
Zheng, W., Xuanhui, W., Fang, H., & Cheng, H. 2012. Coverage-based search result diversification. Journal of Information Retrieval, 155, 433-457.
[42]
Zobel, J., & Moffat, A. 1998. Exploring the similarity space. ACM SIGIR Forum, 321, 18-34.

Cited By

View all
  • (2018)Facet annotation by extending cnn with a matching strategyNeural Computation10.1162/neco_a_0107730:6(1647-1672)Online publication date: 1-Jun-2018
  1. Diversifying Search Results through Pattern-Based Subtopic Modeling

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image International Journal on Semantic Web & Information Systems
      International Journal on Semantic Web & Information Systems  Volume 8, Issue 4
      October 2012
      116 pages
      ISSN:1552-6283
      EISSN:1552-6291
      Issue’s Table of Contents

      Publisher

      IGI Global

      United States

      Publication History

      Published: 01 October 2012

      Author Tags

      1. Clustering
      2. Diversity
      3. Frequent Pattern Mining
      4. Information Retrieval
      5. Subtopics

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Facet annotation by extending cnn with a matching strategyNeural Computation10.1162/neco_a_0107730:6(1647-1672)Online publication date: 1-Jun-2018

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media