Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2964284.2967239acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Multimodal Learning via Exploring Deep Semantic Similarity

Published: 01 October 2016 Publication History

Abstract

Deep learning is skilled at learning representation from raw data, which are embedded in the semantic space. Traditional multimodal networks take advantage of this, and maximize the joint distribution over the representations of different modalities. However, the similarity among the representations are not emphasized, which is an important property for multimodal data. In this paper, we will introduce a novel learning method for multimodal networks, named as Semantic Similarity Learning (SSL), which aims at training the model via enhancing the similarity between the high-level features of different modalities. Sets of experiments are conducted for evaluating the method on different multimodal networks and multiple tasks. The experimental results demonstrate the effectiveness of SSL in keeping the shared information and improving the discrimination. Particularly, SSL shows its ability in encouraging each modality to learn transferred knowledge from the other one when faced with missing data.

References

[1]
G. Andrew, R. Arora, J. Bilmes, and K. Livescu. Deep canonical correlation analysis. In ICML, pages 1247--1255, 2013.
[2]
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on PAMI, 35(8):1798--1828, 2013.
[3]
S. J. Cox, R. Harvey, Y. Lan, J. L. Newman, and B.-J. Theobald. The challenge of multispeaker lip-reading. In AVSP, pages 179--184, 2008.
[4]
G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771--1800, 2002.
[5]
H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321--377, 1936.
[6]
D. Hu, X. Li, and X. Lu. Temporal multimodal learning in audiovisual speech recognition. In CVPR, 2016.
[7]
J. Huang and B. Kingsbury. Audio-visual deep learning for noise robust speech recognition. In ICASSP, pages 7596--7599, 2013.
[8]
M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In ACM MIR, pages 39--43, 2008.
[9]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[10]
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate analysis. 1980.
[11]
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, pages 689--696, 2011.
[12]
Y. Pang, Z. Ma, Y. Yuan, X. Li, and K. Wang. Multimodal learning for multi-label image classification. In ICIP, pages 1797--1800, 2011.
[13]
X. Shu, G.-J. Qi, J. Tang, and J. Wang. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In ACM Multimedia, pages 35--44, 2015.
[14]
K. Sohn, W. Shang, and H. Lee. Improved multimodal deep learning with variation of information. In NIPS, pages 2141--2149, 2014.
[15]
N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222--2230, 2012.
[16]
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096--1103, 2008.
[17]
A. Wang, J. Lu, G. Wang, J. Cai, and T.-J. Cham. Multi-modal unsupervised feature learning for rgb-d scene labeling. In ECCV, pages 453--467. 2014.
[18]
W. Wang, R. Arora, K. Livescu, and J. Bilmes. On deep multi-view representation learning. In ICML, pages 1083--1092, 2015.
[19]
Y. Zhen, Y. Gao, D.-Y. Yeung, H. Zha, and X. Li. Spectral multimodal hashing and its application to multimedia retrieval. IEEE Transactions on cybernetics, 46(1):27--38, 2016.

Cited By

View all

Index Terms

  1. Multimodal Learning via Exploring Deep Semantic Similarity

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '16: Proceedings of the 24th ACM international conference on Multimedia
    October 2016
    1542 pages
    ISBN:9781450336031
    DOI:10.1145/2964284
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. multimodal learning
    3. semantic similarity

    Qualifiers

    • Short-paper

    Funding Sources

    • National Basic Research Program of China (973 Program)
    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities
    • Open Research Fund of Key Laboratory of Spectral Imaging Technology, Chinese Academy of Sciences
    • State Key Program of National Natural Science of China

    Conference

    MM '16
    Sponsor:
    MM '16: ACM Multimedia Conference
    October 15 - 19, 2016
    Amsterdam, The Netherlands

    Acceptance Rates

    MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multimodal Features Alignment for Vision–Language Object TrackingRemote Sensing10.3390/rs1607116816:7(1168)Online publication date: 27-Mar-2024
    • (2023)Multi-modal cognitive computingSCIENTIA SINICA Informationis10.1360/SSI-2022-022653:1(1)Online publication date: 11-Jan-2023
    • (2021)Bio-Inspired Representation Learning for Visual Attention PredictionIEEE Transactions on Cybernetics10.1109/TCYB.2019.293173551:7(3562-3575)Online publication date: Jul-2021
    • (2020)Cross-Media Semantic Correlation Learning Based on Deep Hash Network and Semantic Expansion for Social Network Cross-Media SearchIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.294556731:9(3634-3648)Online publication date: Sep-2020
    • (2020)How Do We Experience Crossmodal Correspondent Mulsemedia Content?IEEE Transactions on Multimedia10.1109/TMM.2019.294127422:5(1249-1258)Online publication date: May-2020
    • (2019)Robust Hierarchical Deep Learning for Vehicular ManagementIEEE Transactions on Vehicular Technology10.1109/TVT.2018.288304668:5(4148-4156)Online publication date: May-2019
    • (2019)Deep Binary Reconstruction for Cross-Modal HashingIEEE Transactions on Multimedia10.1109/TMM.2018.286677121:4(973-985)Online publication date: Apr-2019
    • (2019)Improving speech embedding using crossmodal transfer learning with audio-visual dataMultimedia Tools and Applications10.1007/s11042-018-6992-378:11(15681-15704)Online publication date: 1-Jun-2019
    • (2017)Deep Binary Reconstruction for Cross-modal HashingProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123355(1398-1406)Online publication date: 23-Oct-2017
    • (2017)Improving Speaker Turn Embedding by Crossmodal Transfer Learning from Face Embedding2017 IEEE International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW.2017.58(428-437)Online publication date: Oct-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media