Estimating the imageability of words by mining visual characteristics from crawled image data

Marc A. Kastner ORCID: orcid.org/0000-0002-9193-5973¹,
Ichiro Ide¹,
Frank Nack²,
Yasutomo Kawanishi¹,
Takatsugu Hirayama³,
Daisuke Deguchi⁴ &
…
Hiroshi Murase¹

714 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Natural Language Processing and multi-modal analyses are key elements in many applications. However, the semantic gap is an everlasting problem, leading to unnatural results disconnected from the user’s perception. To understand semantics in multimedia applications, human perception needs to be taken into consideration. Imageability is an approach originating from Pyscholinguistics to quantize the human perception of words. Research shows a relationship between language usage and the imageability of words, making it useful for multimodal applications. However, the creation of imageability datasets is often manual and labor-intensive. In this paper, we propose a method using image data mining of a variety of visual features to estimate the imageability of words. The main assumption is a relationship between the imageability of concepts, human perception, and the contents of Web-crawled images. Using a set of low- and high-level visual features from Web-crawled images, a model is trained to predict imageability. The evaluations show that the imageability can be predicted with both a sufficiently low error, and a high correlation to the ground-truth annotations. The proposed method can be used to increase the corpus of imageability dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating the visual variety of concepts by referring to Web popularity

Article 23 August 2018

Image annotation: the effects of content, lexicon and annotation method

Article 02 March 2020

Semantic-Based Image Analysis with the Goal of Assisting Artistic Creation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.mturk.com/
https://www.flickr.com/
Parts-of-speech are obtained using NLTK [29] and may thus have slight error due to ambiguities.
https://github.com/mkasu/imageabilityestimation/

References

Bai S, An S (2018) A survey on automatic image caption generation. Neurocomputing 311:291–304. https://doi.org/10.1016/j.neucom.2018.05.080
Google Scholar
Balahur A, Mohammad S M, Hoste V, Klinger R (eds.) (2018) Proc. 9th Workshop on Computational Approaches to Subjectivity Sentiment and Social Media Analysis, ACL, Stroudsburg, PA, USA
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45 (1):5–32. https://doi.org/10.1023/A:1010933404324
MATH Google Scholar
Charbonnier J, Wartena C (2019) Predicting word concreteness and imagery. In: Proc. 13th Int. Conf. on Computational Semantics, pp 176–187. https://www.aclweb.org/anthology/W19-0415
Chollet F, et al. (2015) Keras. https://github.com/fchollet/keras/
Coltheart M (1981) The MRC psycholinguistic database. Q J Exp Psychol A 33 (4):497–505. https://doi.org/10.1080/14640748108400805
Google Scholar
Coltheart V, Laxon V J, Keating C (1988) Effects of word imageability and age of acquisition on children’s reading. Br J Psychol 79(1):1–12. https://doi.org/10.1111/j.2044-8295.1988.tb02270.x
Google Scholar
Comaniciu D, Meer P (2002) Mean Shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236
Google Scholar
Cortese M J, Fugett A (2004) Imageability ratings for 3,000 monosyllabic words. Behav Res Methods Instrum Comput 36(3):384–387. https://doi.org/10.3758/BF03195585
Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Proc. ECCV 2004 Workshop on Statistical Learning in Computer Vision, pp 1–22
Deng JDJ, Dong WDW, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: Proc. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp 2–9, https://doi.org/10.1109/CVPR.2009.5206848
Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning. In: Proc. 2014 IEEE Conf. on Computer Vision and Pattern Recognition, pp 3270–3277, https://doi.org/10.1109/CVPR.2014.412
Douze M, Jégou H, Sandhawalia H, Amsaleg L, Schmid C (2009) Evaluation of GIST descriptors for Web-scale image search. In: Proc. ACM Int. Conf. on Image and Video Retrieval 2009, pp 19:1–19:8. https://doi.org/10.1145/1646396.1646421
Fast E, Chen B, Bernstein M S (2016) Empath: Understanding topic signals in large-scale text. Computing Research Repository. arXiv:1602.06979
Giesbrecht B, Camblin C C, Swaab T Y (2004) Separable effects of semantic priming and imageability on word processing in human cortex. Cereb Cortex 14(5):521–529
Google Scholar
Hessel J, Mimno D, Lee L (2018) Quantifying the visual concreteness of words and topics in multimodal datasets. In: Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 2194–2205. https://doi.org/10.18653/v1/N18-1199
Hewitt J, Ippolito D, Callahan B, Kriz R, Wijaya D T, Callison-Burch C (2018) Learning translations via images with a massively multilingual image dataset. In: Proc. 56th Annual Meeting of the Association for Computational Linguistics, vol 1, pp 2566–2576. https://doi.org/10.18653/v1/P18-1239
Holzinger A, Biemann C, Pattichis C S, Kell DB (2017a) What do we need to build explainable AI systems for the medical domain. Computing Research Repository. arXiv:1712.09923
Holzinger A, Malle B, Kieseberg P, Roth P M, Mu̇ller H, Reihs R, Zatloukal K (2017b) Towards the augmented pathologist: Challenges of explainable-AI in digital pathology. Computing Research Repository. arXiv:1712.06657
Inoue N, Shinoda K (2016) Adaptation of word vectors using tree structure for visual semantics. In: Proc. 24th ACM Multimedia Conf., pp 277–281. https://doi.org/10.1145/2964284.2967226
Itseez (2015) Open source computer vision library. https://opencv.org/
Jones G V (1985) Deep dyslexia, imageability, and ease of predication. Brain Lang 24(1):1–19. https://doi.org/10.1016/0093-934X(85)90094-X
Google Scholar
Kastner M A, Ide I, Kawanishi Y, Hirayama T, Deguchi D, Murase H (2019) Estimating the visual variety of concepts by referring to Web popularity. Multimed Tools Appl 78(7):9463–9488. https://doi.org/10.1007/s11042-018-6528-x
Google Scholar
Kawakubo H, Akima Y, Yanai K (2010) Automatic construction of a folksonomy-based visual ontology. In: Proc. 2010 IEEE Int. Symposium on Multimedia, pp 330–335. https://doi.org/10.1109/ISM.2010.57
Kohara Y, Yanai K (2013) Visual analysis of tag co-occurrence on nouns and adjectives. In: Li S, El Saddik A, Wang M, Mei T, Sebe N, Yan S, Hong R, Gurrin C (eds) Advances in Multimedia Modeling: 19th Int. Conf. on Multimedia Modeling Procs., Springer, Lecture Notes in Computer Science, vol 7732, pp 47–57. https://doi.org/10.1007/978-3-642-35725-1-5
Li JJ, Nenkova A (2015) Fast and accurate prediction of sentence specificity. In: Proc. 29th AAAI Conf. on Artificial Intelligence, pp 2281–2287
LjubeŠić N, FiŠer D, Peti-Stantić A (2018) Predicting concreteness and imageability of words within and across languages via word embeddings. In: Proc. 3rd Workshop on Representation Learning for NLP, pp 217–222, https://doi.org/10.18653/v1/W18-3028
Loper E, Bird S (2002) NLTK: The Natural Language Toolkit. In: Proc. ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics Vol. 1, pp 63–70. https://doi.org/10.3115/1118108.1118117
Ma W, Golinkoff R M, Hirsh-Pasek K, McDonough C, Tardif T (2009) Imageability predicts the age of acquisition of verbs in Chinese children. J Child Lang 36:405–423. https://doi.org/10.1017/S0305000908009008
Google Scholar
Miller GA (1995) WordNet: A lexical database for English. Comm ACM 38 (11):39–41. https://doi.org/10.1145/219717.219748
Google Scholar
Paivio A, Yuille J C, Madigan S A (1968) Concreteness, imagery, and meaningfulness values for 925 nouns. J Exp Psychol 76(1):1–25
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pennebaker J W, Francis M E, Booth R J (2001) Linguistic Inquiry and Word Count: LIWC 2001. Erlbaum, Mahwah, NJ, USA
Google Scholar
Redmon J, Farhadi A (2016) YOLO9000: Better, faster, stronger. Computing Research Repository. arXiv:1612.08242
Reilly J, Kean J (2010) Formal distinctiveness of high- and low-imageability nouns: Analyses and theoretical implications. J Cogn Sci 31(1):157–168. https://doi.org/10.1080/03640210709336988
Google Scholar
Ringeval F, Schuller B, Valstar M, Cowie R, Kaya H, Schmitt M, Amiriparian S, Cummins N, Lalanne D, Michaud A, Ciftçi E, Güleç H, Salah A A (2018) Proc. 2018 Audio/Visual Emotion Challenge and Workshop. ACM, New York, NY, USA, Pantic M (ed)
Samek W, Wiegand T, Mu̇ller K (2017) Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. Computing Research Repository. arXiv:1708.08296
Schwanenflugel P J (2013) Why are abstract concepts hard to understand? in: The Psychology of Word Meanings, Psychology Press, New York, NY, USA, pp 235–262
Shu X, Qi GJ, Tang J, Wang J (2015) Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proc. 23rd ACM Multimedia Conf., pp 35–44. https://doi.org/10.1145/2733373.2806216
Sianipar A, van Groenestijn P, Dijkstra T (2016) Affective meaning, concreteness, and subjective frequency norms for Indonesian words. Front Psychol 7:1907. https://doi.org/10.3389/fpsyg.2016.01907
Google Scholar
Smolik F, Kriz A (2015) The power of imageability: How the acquisition of inflected forms is facilitated in highly imageable verbs and nouns in Czech children. J First Lang 35(6):446–465. https://doi.org/10.1177/0142723715609228
Google Scholar
Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. Computing Research Repository. arXiv:1707.02968
Tanaka S, Jatowt A, Kato MP, Tanaka K (2013) Estimating content concreteness for finding comprehensible documents. In: Proc. 6th ACM Int. Conf. on Web Search and Data Mining, pp 475–484. https://doi.org/10.1145/2433396.2433455
Tang J, Shu X, Li Z, Qi G J, Wang J (2016) Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains. ACM Trans. Multimed Comput. Commun. Appl. 12(4s):1–22. https://doi.org/10.1145/2998574
Google Scholar
Tang J, Shu X, Qi G, Li Z, Wang M, Yan S, Jain R (2017) Tri-clustered tensor completion for social-aware image tag refinement. IEEE Trans Pattern Anal Mach Intell 39(8):1662–1674. https://doi.org/10.1109/TPAMI.2016.2608882x
Google Scholar
Tang J, Shu X, Li Z, Jiang Y, Tian Q (2019) Social anchor-unit graph regularized tensor completion for large-scale image retagging. IEEE Trans Pattern Anal Mach Intell 41(8):2027–2034. https://doi.org/10.1109/TPAMI.2019.2906603
Google Scholar
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) YFCC100M: The new data in multimedia research. Comm ACM 59(2):64–73. https://doi.org/10.1145/2812802
Google Scholar
Vidanapathirana M (2018) YOLO3-4-Py. https://github.com/madhawav/YOLO3-4-Py
Yanai K, Barnard K (2005) Image region entropy: A measure of “visualness” of Web images associated with one concept. In: Proc. 13th ACM Multimedia Conf., pp 419–422. https://doi.org/10.1145/1101149.1101241
Yee L T (2017) Valence, arousal, familiarity, concreteness, and imageability ratings for 292 two-character Chinese nouns in Cantonese speakers in Hong Kong. PloS one 12 (3):e0174569. https://doi.org/10.3389/fpsyg.2016.01907
MathSciNet Google Scholar
Zhang M, Hwa R, Kovashka A (2018) Equal but not the same: Understanding the implicit relationship between persuasive images and text. In: Proc. British Machine Vision Conference 2018, no. 8

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi & Hiroshi Murase
Informatics Institute, University of Amsterdam, Amsterdam, 1098, XH, The Netherlands
Frank Nack
Institute of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
Takatsugu Hirayama
Information Strategy Office, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
Daisuke Deguchi

Authors

Marc A. Kastner
View author publications
You can also search for this author in PubMed Google Scholar
Ichiro Ide
View author publications
You can also search for this author in PubMed Google Scholar
Frank Nack
View author publications
You can also search for this author in PubMed Google Scholar
Yasutomo Kawanishi
View author publications
You can also search for this author in PubMed Google Scholar
Takatsugu Hirayama
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Deguchi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Murase
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc A. Kastner.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Parts of this research were supported by JSPS KAKENHI 16H02846, and a joint research project with NII, Japan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kastner, M.A., Ide, I., Nack, F. et al. Estimating the imageability of words by mining visual characteristics from crawled image data. Multimed Tools Appl 79, 18167–18199 (2020). https://doi.org/10.1007/s11042-019-08571-4

Download citation

Received: 22 February 2019
Revised: 03 October 2019
Accepted: 06 December 2019
Published: 29 February 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-019-08571-4

Estimating the imageability of words by mining visual characteristics from crawled image data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Estimating the visual variety of concepts by referring to Web popularity

Image annotation: the effects of content, lexicon and annotation method

Semantic-Based Image Analysis with the Goal of Assisting Artistic Creation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Estimating the imageability of words by mining visual characteristics from crawled image data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Estimating the visual variety of concepts by referring to Web popularity

Image annotation: the effects of content, lexicon and annotation method

Semantic-Based Image Analysis with the Goal of Assisting Artistic Creation

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation