Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Leveraging integrated information to extract query subtopics for search result diversification

Published: 01 February 2014 Publication History

Abstract

Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance.

References

[1]
Agrawal, R., Gollapudi, S., Halverson, A., & Leong, S. (2009). Diversifying search results. In Proceedings of WSDM’09.
[2]
Balog, K., Bron, M., He, J., Hofmann, K., Meij, E., de Rijke, M., et al. (2009a). The University of Amesterdam at TREC 2009. In Proceedings of TREC’09.
[3]
Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., & Westerveld, T. (2009b). Overview of the TREC 2009 entity track. In Proceedings of TREC’09.
[4]
Bi, W., Yu, X., Liu, Y., Guan, F., Peng, Z., Xu, H., et al. (2009). ICTNET at web track 2009 diversity task. In Proceedings of TREC’09.
[5]
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR’98.
[6]
Carterette, B., & Chandar, P. (2009). Probabilistic models of novel document rankings for faceted topic retrieval. In Proceedings of CIKM’09.
[7]
Chapelle, O., Metzzler, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceedings of CIKM’09.
[8]
Chen, Z., & Li, T. (2007). Addressing diverse user preferences in sql-query-result navigation. In Proceedings of SIGMOD’07.
[9]
Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009a). Overview of the TREC 2009 web track. In Proceedings of TREC’09.
[10]
Clarke, C. L. A., Craswell, N., Soboroff, I., & Cormack, G. V. (2010). Overview of the TREC 2010 web track. In Proceedings of TREC’10.
[11]
Clarke, C. L. A., Koll, M., & Vechtomova, O. (2009b). An effectiveness measure for ambiguous and underspecified queries. In Proceedings of ICTIR’09.
[12]
Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkann, A., Buttcher, S., et al. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR’08.
[13]
Dang, V., Xue, X., & Croft, W. B. (2011). Inferring query aspects from reformulations using clustering. In Proceedings of NTCIR-9.
[14]
Demidova, E., Fankhauser, P., Zhou, X., & Nejdl, W. (2010). Divq: Diversification for keyword search over structured databases. In Proceedings of SIGIR’10.
[15]
Dou, Z., Chen, K., Song, R., Ma, Y., Shi, S., & Wen, J.-R. (2009). Microsoft research Asia at the web track of TREC 2009. In Proceedings of TREC’09.
[16]
Dou, Z., Hu, S., Chen, K., Song, R., & Wen, J. R. (2011). Multi-dimensional search result diversification. In Proceedings of WSDM’11.
[17]
Fang, H., & Zhai, C. (2006). Semantic term matching in axiomatic approaches to information retrieval. In Proceedings of SIGIR’06.
[18]
Hauff, C., & Hiemstra, D. (2009). University of Twente @ TREC 2009: Indexing half a billion web pages. In Proceedings of TREC’09.
[19]
Hawking, D. (2004). Challenges in enterprise search. In Proceedings of ADC’04.
[20]
He J., Meij E., and de Rijke M. Result diversification based on query-specific cluster ranking Journal of the American Society for Information Science and Technology 2010 62 3 550-571
[21]
Li, Z., Cheng, F., Xiang, Q., Miao, J., Xue, Y., Zhu, T., et al. (2009). THUIR at TREC 2009 web track: Finding relevant and diverse results for large scale web search. In Proceedings of TREC’09.
[22]
Lubell-Doughtie, P., & Hofmann, K. (2011). Improving result diversity using probabilistic latent semantic analysis. In Proceedings of DIR’11.
[23]
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW’07.
[24]
Radlinski, F., & Dumais, S. T. (2006). Improving personalized web search using result diversification. In Proceedings of SIGIR’06.
[25]
Radlinski, F., Szummer, M., & Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of WWW’10.
[26]
Sakai, T., & Song, R. (2012). Diversified search evaluation: Lessons from the NTCIR-9 INTENT task. Information Retrieval.
[27]
Santos, R. L. T., Macdonald, C., & Ounis, I. (2010a). Exploiting query reformulations for web search result diversification. In Proceedings of WWW’10.
[28]
Santos, R. L. T., Macdonald, C., & Ounis, I. (2010b). Selectively diversifying web search results. In Proceedings of CIKM’10.
[29]
Santos, R. L. T., Peng, J., Macdonald, C., & Ounis, I. (2010c). Explicit search result diversification through sub-queries. In Proceedings of ECIR’10.
[30]
Song, R., Zhang, M., Sakai, T., Kato, M. P., Liu, Y., Sugimoto, M., et al. (2011). Overview of the ntcir-9 intent task. In Proceedings of CIKM’11.
[31]
van Rijsbergen C.J. Information Retrieval 1979 Strand, London Butterworths
[32]
Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Yahia, S. A. (2008). Efficient computation of diverse query results. In Proceedings of ICDE’08.
[33]
Zhai, C., Cohen, W., & Lafferty, J. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of SIGIR’03.
[34]
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR’01.
[35]
Zhang, S., Lu, K., & Wang, B. (2011). ICTIR subtopic mining system at NTCIR-9 INTENT task. In Proceedings of NTCIR-9.
[36]
Zheng, W., Fang, H., Yao, C.,& Wang, M. (2011a). Search result diversification for enterprise search. In Proceedings of CIKM’11.
[37]
Zheng, W., Wang, X., Fang, H., & Cheng, H. (2011b). An exploration of pattern-based subtopic modeling for search result diversification. In Proceedings of JCDL’11.
[38]
Zheng W., Wang X., Fang H., and Cheng H. Coverage-based search result diversification Information Retrieval 2012 15 5 433-457

Cited By

View all
  • (2015)Search Result DiversificationFoundations and Trends in Information Retrieval10.1561/15000000409:1(1-90)Online publication date: 1-Mar-2015

Index Terms

  1. Leveraging integrated information to extract query subtopics for search result diversification
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Information Retrieval
      Information Retrieval  Volume 17, Issue 1
      Feb 2014
      108 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 February 2014
      Accepted: 21 February 2013
      Received: 08 October 2012

      Author Tags

      1. Web search
      2. Enterprise search
      3. Diversification
      4. Query subtopics
      5. Structured data
      6. Unstructured data

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Search Result DiversificationFoundations and Trends in Information Retrieval10.1561/15000000409:1(1-90)Online publication date: 1-Mar-2015

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media