Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Bridging the gap: effect of text query reformulation in multimodal retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multimodal Retrieval provides new paradigms and methods aimed at effectively searching through the enormous volume of data. Multimodal retrieval is a well studied problem often used in image retrieval. Most of the existing works in image retrieval under the pretext of multimodality stress on bridging the semantic gap by using both textual and visual features. In this paper, we use relevance feedback from the user-generated documents associated with the images for expanding textual query and study its effect on both image and text retrieval. We employ a topic decomposition based keyphrase extraction technique to expand the textual queries. Our results articulate the fact that an insightful textual query expansion always improves retrieval performance for both textual or image retrieval. Also, we adopt optimum weight learning scheme to combine the modalities in a privileged way. We perform a comparative study with two well established keyphrase extraction techniques which are used for textual query expansion. A detailed set of experiments on a standard real world dataset is also carried out for the same.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html

  2. http://www.imageclef.org/2011/Wikipedia

  3. http://medgift.hevs.ch/wikipediaMM/2010-2011/wikipedia_topics_2011.zip

  4. Henceforth we refer R-Precision as simply Precision

  5. The presence or absence of a subscript x to p x denotes the significance when compared to the previous x method(s).

References

  1. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379

    Article  Google Scholar 

  2. Buffoni D, Tollari S, Gallinari P (2012) A Learning to rank framework applied to text-image retrieval. Multimed Tools Appl 60(1):161–180. doi:10.1007/s11042-011-0806-1

    Article  Google Scholar 

  3. Caicedo JC, Moreno JG, Niño EA, González FA (2010) Combining visual features and text data for medical image retrieval using latent semantic kernels. In: Proceedings of the international conference on multimedia information retrieval. ACM, pp 359–366

  4. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1

    Article  MATH  Google Scholar 

  5. Choi S (2015) Multimodal medical case-based retrieval on the radiology image and report: SNUMedinfo at VISCERAL retrieval benchmark. In: Multimodal retrieval in the medical domain. Springer , pp 124–128

  6. Datta D, Varma S, RC C, Singh SK (2017) Multimodal retrieval using mutual information based textual query reformulation, vol 68, pp 81– 92. doi:10.1016/j.eswa.2016.09.039

  7. Del Bimbo A (1999) Visual information retrieval. Morgan and Kaufmann

  8. Depeursinge A, Müller H (2010) Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. Springer, pp 95–114

  9. Fakhari A, Moghadam AME (2013) Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval. Appl Soft Comput 13(2):1292–1302

    Article  Google Scholar 

  10. Faro A, Giordano D, Pino C, Spampinato C (2010) Visual attention for implicit relevance feedback in a content based image retrieval. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications. ACM, pp 73–76

  11. Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web. ACM, pp 661–670

  12. Hearst M (2009) Search user interfaces. Cambridge University Press

  13. Hochmair HH, Fu ZJ (2006) User interface design for semantic query expansion in geodata repositories. Angew Geoinformatik 18(18):236–245

    Google Scholar 

  14. Hoi SC, Wu P (2011) Sire: a social image retrieval engine. In: Proceedings of the 19th ACM International Conference on Multimedia. ACM, pp 817–818

  15. Jiji GW, DuraiRaj PJ (2015) Content-based image retrieval techniques for the analysis of dermatological lesions using particle swarm optimization technique. Appl Soft Comput 30:650–662

    Article  Google Scholar 

  16. Joint M, Moellic PA, Hede P, Adam P (2004) PIRIA: a general tool for indexing, search, and retrieval of multimedia content. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp 116–125

  17. Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, p 9

  18. Lisin D, Mattar M, Blaschko MB, Learned-Miller EG, Benfield MC, et al (2005) Combining local and global image features for object class recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops 2005 CVPR Workshops. IEEE, pp 47–47

  19. Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp 366–376

  20. Manning CD, Raghavan P, Schütze H, et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  21. Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, Association for Computational Linguistics, Barcelona, pp 404–411. http://www.aclweb.org/anthology/W04-3252

  22. Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 206–214

  23. Moulin C, Largeron C, Ducottet C, Géry M, Barat C (2014) Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recogn 47(1):260–269

    Article  Google Scholar 

  24. Myoupo D, Popescu A, Le Borgne H, Moëllic PA (2010) Multimodal image retrieval over a large database. In: Multilingual Information Access Evaluation II. Multimedia Experiments. Springer , pp 177–184

  25. Peng Y, Zhou X, Wang DZ, Fang CV (2015) Scalable image retrieval with multimodal fusion

  26. Rocchio JJ (1971) Relevance feedback in information retrieval. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  27. Ruthven I (2003) Re-examining the potential effectiveness of interactive query expansion. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, pp 213–220

  28. Wang S, Pan P, Lu Y, Xie L (2015) Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model. Multimed Tools Appl 74(6):2009–2032. doi:10.1007/s11042-013-1737-9

    Article  Google Scholar 

  29. Welling M (2005) Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, p 3

  30. Yagcioglu S, Erdem E, Erdem A, Cakıcı R (2015) A distributed representation based query expansion approach for image captioning. Volume 2: Short Papers. p 106

  31. Zellhöfer D (2012) A permeable expert search strategy approach to multimodal retrieval. In: Proceedings of the 4th Information Interaction in Context Symposium. ACM, pp 62–71

  32. Zhai C (2001) Notes on the lemur TFIDF model. Unpublished Report

  33. Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimed 4(2):189–200

    Article  Google Scholar 

  34. Zhou XS, Huang TS (2002) Unifying keywords and visual contents in image retrieval. IEEE Multimed 2:23–33

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepanwita Datta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Datta, D., Singh, S.K. & Chowdary, C.R. Bridging the gap: effect of text query reformulation in multimodal retrieval. Multimed Tools Appl 76, 22871–22888 (2017). https://doi.org/10.1007/s11042-016-4262-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4262-9

Keywords

Navigation