Abstract
Multimodal Retrieval provides new paradigms and methods aimed at effectively searching through the enormous volume of data. Multimodal retrieval is a well studied problem often used in image retrieval. Most of the existing works in image retrieval under the pretext of multimodality stress on bridging the semantic gap by using both textual and visual features. In this paper, we use relevance feedback from the user-generated documents associated with the images for expanding textual query and study its effect on both image and text retrieval. We employ a topic decomposition based keyphrase extraction technique to expand the textual queries. Our results articulate the fact that an insightful textual query expansion always improves retrieval performance for both textual or image retrieval. Also, we adopt optimum weight learning scheme to combine the modalities in a privileged way. We perform a comparative study with two well established keyphrase extraction techniques which are used for textual query expansion. A detailed set of experiments on a standard real world dataset is also carried out for the same.
Similar content being viewed by others
Notes
Henceforth we refer R-Precision as simply Precision
The presence or absence of a subscript x to p x denotes the significance when compared to the previous x method(s).
References
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379
Buffoni D, Tollari S, Gallinari P (2012) A Learning to rank framework applied to text-image retrieval. Multimed Tools Appl 60(1):161–180. doi:10.1007/s11042-011-0806-1
Caicedo JC, Moreno JG, Niño EA, González FA (2010) Combining visual features and text data for medical image retrieval using latent semantic kernels. In: Proceedings of the international conference on multimedia information retrieval. ACM, pp 359–366
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1
Choi S (2015) Multimodal medical case-based retrieval on the radiology image and report: SNUMedinfo at VISCERAL retrieval benchmark. In: Multimodal retrieval in the medical domain. Springer , pp 124–128
Datta D, Varma S, RC C, Singh SK (2017) Multimodal retrieval using mutual information based textual query reformulation, vol 68, pp 81– 92. doi:10.1016/j.eswa.2016.09.039
Del Bimbo A (1999) Visual information retrieval. Morgan and Kaufmann
Depeursinge A, Müller H (2010) Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. Springer, pp 95–114
Fakhari A, Moghadam AME (2013) Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval. Appl Soft Comput 13(2):1292–1302
Faro A, Giordano D, Pino C, Spampinato C (2010) Visual attention for implicit relevance feedback in a content based image retrieval. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications. ACM, pp 73–76
Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web. ACM, pp 661–670
Hearst M (2009) Search user interfaces. Cambridge University Press
Hochmair HH, Fu ZJ (2006) User interface design for semantic query expansion in geodata repositories. Angew Geoinformatik 18(18):236–245
Hoi SC, Wu P (2011) Sire: a social image retrieval engine. In: Proceedings of the 19th ACM International Conference on Multimedia. ACM, pp 817–818
Jiji GW, DuraiRaj PJ (2015) Content-based image retrieval techniques for the analysis of dermatological lesions using particle swarm optimization technique. Appl Soft Comput 30:650–662
Joint M, Moellic PA, Hede P, Adam P (2004) PIRIA: a general tool for indexing, search, and retrieval of multimedia content. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp 116–125
Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, p 9
Lisin D, Mattar M, Blaschko MB, Learned-Miller EG, Benfield MC, et al (2005) Combining local and global image features for object class recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops 2005 CVPR Workshops. IEEE, pp 47–47
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp 366–376
Manning CD, Raghavan P, Schütze H, et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, Association for Computational Linguistics, Barcelona, pp 404–411. http://www.aclweb.org/anthology/W04-3252
Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 206–214
Moulin C, Largeron C, Ducottet C, Géry M, Barat C (2014) Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recogn 47(1):260–269
Myoupo D, Popescu A, Le Borgne H, Moëllic PA (2010) Multimodal image retrieval over a large database. In: Multilingual Information Access Evaluation II. Multimedia Experiments. Springer , pp 177–184
Peng Y, Zhou X, Wang DZ, Fang CV (2015) Scalable image retrieval with multimodal fusion
Rocchio JJ (1971) Relevance feedback in information retrieval. Prentice-Hall, Englewood Cliffs
Ruthven I (2003) Re-examining the potential effectiveness of interactive query expansion. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, pp 213–220
Wang S, Pan P, Lu Y, Xie L (2015) Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model. Multimed Tools Appl 74(6):2009–2032. doi:10.1007/s11042-013-1737-9
Welling M (2005) Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, p 3
Yagcioglu S, Erdem E, Erdem A, Cakıcı R (2015) A distributed representation based query expansion approach for image captioning. Volume 2: Short Papers. p 106
Zellhöfer D (2012) A permeable expert search strategy approach to multimodal retrieval. In: Proceedings of the 4th Information Interaction in Context Symposium. ACM, pp 62–71
Zhai C (2001) Notes on the lemur TFIDF model. Unpublished Report
Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimed 4(2):189–200
Zhou XS, Huang TS (2002) Unifying keywords and visual contents in image retrieval. IEEE Multimed 2:23–33
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Datta, D., Singh, S.K. & Chowdary, C.R. Bridging the gap: effect of text query reformulation in multimodal retrieval. Multimed Tools Appl 76, 22871–22888 (2017). https://doi.org/10.1007/s11042-016-4262-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4262-9