research-article

Multilingual Visual Sentiment Concept Matching

Authors:

Nikolaos Pappas,

Mercan Topkara,

Shih-Fu ChangAuthors Info & Claims

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 151 - 158

https://doi.org/10.1145/2911996.2912016

Published: 06 June 2016 Publication History

Abstract

The impact of culture in visual emotion perception has recently captured the attention of multimedia research. In this study, we provide powerful computational linguistics tools to explore, retrieve and browse a dataset of 16K multilingual affective visual concepts and 7.3M Flickr images. First, we design an effective crowdsourcing experiment to collect human judgements of sentiment connected to the visual concepts. We then use word embeddings to represent these concepts in a low dimensional vector space, allowing us to expand the meaning around concepts, and thus enabling insight about commonalities and differences among different languages. We compare a variety of concept representations through a novel evaluation task based on the notion of visual semantic relatedness. Based on these representations, we design clustering schemes to group multilingual visual concepts, and evaluate them with novel metrics based on the crowdsourced sentiment annotations as well as visual semantic relatedness. The proposed clustering framework enables us to analyze the full multilingual dataset in-depth and also show an application on a facial data subset, exploring cultural insights of portrait-related affective visual concepts.

References

[1]

B. Jou, T. Chen, N. Pappas, M. Redi, M. Topkara*, and S.-F. Chang, "Visual affect around the world: A large-scale multilingual visual sentiment ontology," in ACM International Conference on Multimedia, (Brisbane, Australia), pp. 159--168, 2015.

Digital Library

[2]

H. Liu, B. Jou, T. Chen, M. Topkara, N. Pappas, M. Redi, and S.-F. Chang, "Complura: Exploring and leveraging a large-scale multilingual visual sentiment ontology," in ACM Interational Conference on Multimedia Retrieval, (New York, NY, USA), 2016.

Digital Library

[3]

J. Turian, L. Ratinov, and Y. Bengio, "Word representations: A simple and general method for semi-supervised learning," in 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, (Uppsala, Sweden), pp. 384--394, 2010.

Digital Library

[4]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," Journal of Machine Learning Research, vol. 12, pp. 2493--2537, 2011.

Digital Library

[5]

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," CoRR, vol. abs/1301.3781, 2013.

[6]

J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Empirical Methods in Natural Language Processing, pp. 1532--1543, 2014.

[7]

R. Al-Rfou, B. Perozzi, and S. Skiena, "Polyglot: Distributed word representations for multilingual NLP," CoRR, vol. abs/1307.1662, 2013.

[8]

A. Klementiev, I. Titov, and B. Bhattarai, "Inducing crosslingual distributed representations of words," in Proceedings of COLING 2012, (Mumbai, India), pp. 1459--1474, 2012.

[9]

W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, "Bilingual word embeddings for phrase-based machine translation," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, (Seattle, WA, USA), pp. 1393--1398, 2013.

[10]

K. M. Hermann and P. Blunsom, "Multilingual models for compositional distributed semantics," in Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 58--68, 2014.

[11]

A. P. S. Chandar, S. Lauly, H. Larochelle, M. M. Khapra, B. Ravindran, V. C. Raykar, and A. Saha, "An autoencoder approach to learning bilingual word representations," CoRR, vol. abs/1402.1454, 2014.

[12]

F. Hill, R. Reichart, and A. Korhonen, "Simlex-999: Evaluating semantic models with (genuine) similarity estimation," CoRR, vol. abs/1408.3456, 2014.

[13]

E. Bruni, N. K. Tran, and M. Baroni, "Multimodal distributional semantics," Journal of Artificial Intelligence Research, vol. 49, pp. 1--47, Jan. 2014.

[14]

C. Silberer and M. Lapata, "Learning grounded meaning representations with autoencoders," in 52nd Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 721--732, June 2014.

[15]

A. Lazaridou, N. T. Pham, and M. Baroni, "Combining language and vision with a multimodal skip-gram model," in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Denver, Colorado), pp. 153--163, 2015.

[16]

A. Karpathy, A. Joulin, and F. Li, "Deep fragment embeddings for bidirectional image sentence mapping," in Advances in Neural Information Processing Systems 27, pp. 1889--1897, Curran Associates, Inc., 2014.

[17]

R. Kiros, R. Salakhutdinov, and R. S. Zemel, "Unifying visual-semantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014.

[18]

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," TACL, vol. 2, pp. 207--218, 2014.

[19]

J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks," CoRR, vol. abs/1410.1090, 2014.

[20]

S. Kottur, R. Vedantam, J. M. F. Moura, and D. Parikh, "Visual word2vec (vis-w2v): Learning visually grounded word embeddings using abstract scenes," CoRR, vol. abs/1511.07067, 2015.

[21]

T. Schnabel, I. Labutov, D. Mimno, and T. Joachims, "Evaluation methods for unsupervised word embeddings," in Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 298--307, 2015.

[22]

O. Levy, Y. Goldberg, and I. Dagan, "Improving distributional similarity with lessons learned from word embeddings," Transactions of Association for Computational Linguistics, vol. 3, pp. 211--225, 2015.

[23]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems 26, pp. 3111--3119, 2013.

Digital Library

[24]

R. Lebret and R. Collobert, "Word embeddings through hellinger pca," in Conference of the European Chapter of the Association for Computational Linguistics, (Gothenburg, Sweden), pp. 482--490, 2014.

[25]

M. Baroni and R. Zamparelli, "Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space," in Conference on Empirical Methods in Natural Language Processing, (Cambridge, MA, USA), pp. 1183--1193, 2010.

Digital Library

[26]

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, "Semantic compositionality through recursive matrix-vector spaces," in Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (Jeju Island, Korea), pp. 1201--1211, 2012.

Digital Library

[27]

H. Schmid, "Probabilistic part-of-speech tagging using decision trees," in International Conference on New Methods in Language Processing, (Manchester, UK), 1994.

[28]

W. A. Freiwald and D. Y. Tsao, "Neurons that keep a straight face," National Academy of Sciences, vol. 111, no. 22, pp. 7894--7895, 2014.

[29]

M. Redi, N. Rasiwasia, G. Aggarwal, and A. Jaimes, "The beauty of capturing faces: Rating the quality of digital portraits," in IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, (Ljubljana, Slovenia), pp. 1--8, 2015.

[30]

B. Jou, S. Bhattacharya, and S.-F. Chang, "Predicting viewer perceived emotions in animated GIFs," in ACM International Conference on Multimedia, (Orlando, Florida, USA), pp. 213--216, 2014.

Digital Library

[31]

S. Bakhshi, D. A. Shamma, and E. Gilbert, "Faces engage us: Photos with faces attract more likes and comments on instagram," in ACM Conference on Human Factors in Computing Systems, (Toronto, ON, Canada), pp. 965--974, 2014.

Digital Library

[32]

S. Liao, A. K. Jain, and S. Z. Li, "A fast and accurate unconstrained face detector," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 211--223, Feb 2016.

Digital Library

Cited By

Abdullah NRusli N(2021)Multilingual Sentiment Analysis: A Systematic Literature ReviewPertanika Journal of Science and Technology10.47836/pjst.29.1.2529:1Online publication date: 2021
https://doi.org/10.47836/pjst.29.1.25
Cowen AKeltner DSchroff FJou BAdam HPrasad G(2020)Sixteen facial expressions occur in similar contexts worldwideNature10.1038/s41586-020-3037-7589:7841(251-257)Online publication date: 16-Dec-2020
https://doi.org/10.1038/s41586-020-3037-7
Wang XWu JChen JLi LWang YWang W(2019)VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research2019 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV.2019.00468(4580-4590)Online publication date: Oct-2019
https://doi.org/10.1109/ICCV.2019.00468
Show More Cited By

Index Terms

Multilingual Visual Sentiment Concept Matching
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Social content sharing
      2. Social tagging
  2. Human computer interaction (HCI)
    1. Empirical studies in HCI
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Every culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of ...
SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content
MM '13: Proceedings of the 21st ACM international conference on Multimedia

A picture is worth one thousand words, but what words should be used to describe the sentiment and emotions conveyed in the increasingly popular social multimedia? We demonstrate a novel system which combines sound structures from psychology and the ...
Large-scale visual sentiment ontology and detectors using adjective noun pairs
MM '13: Proceedings of the 21st ACM international conference on Multimedia

We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer sentiment or emotion directly from visual low-level features, we propose a novel approach based on understanding of the visual concepts that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

June 2016

452 pages

ISBN:9781450343596

DOI:10.1145/2911996

General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Multimodal paper

Author Tags

Qualifiers

Research-article

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6 - 9, 2016

New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abdullah NRusli N(2021)Multilingual Sentiment Analysis: A Systematic Literature ReviewPertanika Journal of Science and Technology10.47836/pjst.29.1.2529:1Online publication date: 2021
https://doi.org/10.47836/pjst.29.1.25
Cowen AKeltner DSchroff FJou BAdam HPrasad G(2020)Sixteen facial expressions occur in similar contexts worldwideNature10.1038/s41586-020-3037-7589:7841(251-257)Online publication date: 16-Dec-2020
https://doi.org/10.1038/s41586-020-3037-7
Wang XWu JChen JLi LWang YWang W(2019)VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research2019 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV.2019.00468(4580-4590)Online publication date: Oct-2019
https://doi.org/10.1109/ICCV.2019.00468
Ceroni AMa CEwerth R(2019)Mining exoticism from visual content with fusion-based deep neural networksInternational Journal of Multimedia Information Retrieval10.1007/s13735-018-00165-48:1(19-33)Online publication date: 23-Jan-2019
https://doi.org/10.1007/s13735-018-00165-4
Ceroni AMa CEwerth RAizawa KLew MSatoh S(2018)Mining Exoticism from Visual Content with Fusion-based Deep Neural NetworksProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206044(37-45)Online publication date: 5-Jun-2018
https://dl.acm.org/doi/10.1145/3206025.3206044
Hamano SOgawa THaseyama M(2018)A Language-Independent Ontology Construction Method Using Tagged Images in FolksonomyIEEE Access10.1109/ACCESS.2017.27862186(2930-2942)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2017.2786218
Pappas NRedi MTopkara MLiu HJou BChen TChang S(2017)Multilingual visual sentiment concept clustering and analysisInternational Journal of Multimedia Information Retrieval10.1007/s13735-017-0120-46:1(51-70)Online publication date: 20-Feb-2017
https://doi.org/10.1007/s13735-017-0120-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents