Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3206025.3206035acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder

Published: 05 June 2018 Publication History

Abstract

Learning the embedding for social media data has attracted extensive research interests as well as boomed a lot of applications, such as classification and link prediction. In this paper, we examine the scenario of a multimodal network with nodes containing multimodal contents and connected by heterogeneous relationships, such as social images containing multimodal contents (e.g., visual content and text description), and linked with various forms (e.g., in the same album or with the same tag). However, given the multimodal network, simply learning the embedding from the network structure or a subset of content results in sub-optimal representation. In this paper, we propose a novel deep embedding method, i.e., Attention-based Multi-view Variational Auto-Encoder (AMVAE), to incorporate both the link information and the multimodal contents for more effective and efficient embedding. Specifically, we adopt LSTM with attention model to learn the correlation between different data modalities, such as the correlation between visual regions and the specific words, to obtain the semantic embedding of the multimodal contents. Then, the link information and the semantic embedding are considered as two correlated views. A multi-view correlation learning based Variational Auto-Encoder (VAE) is proposed to learn the representation of each node, in which the embedding of link information and multimodal contents are integrated and mutually reinforced. Experiments on three real-world datasets demonstrate the superiority of the proposed model in two applications, i.e., multi-label classification and link prediction.

References

[1]
Amr Ahmed, Nino Shervashidze, Shravan M. Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed large-scale natural graph factorization. In WWW, Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo A. Baeza-Yates, and Sue B. Moon (Eds.). International World Wide Web Conferences Steering Committee / ACM, 37--48. http://dl.acm.org/citation.cfm?id=2488393
[2]
Smriti Bhagat, Graham Cormode, and S. Muthukrishnan. 2011. Node Classification in Social Networks. In Social Network Data Analytics. Springer, 115--148.
[3]
Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations. In AAAI. AAAI Press, 1145--1152. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12423
[4]
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In SIGKDD. ACM, 119--128.
[5]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-tao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In CIVR. ACM.
[6]
Trevor F Cox and Michael AA Cox. 2000. Multidimensional scaling. CRC press.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. IEEE Computer Society, 248--255.
[8]
Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. In NIPS, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 658--666. http://papers.nips.cc/paper/ 6158-generating-images-with-perceptual-similarity-metrics-based-on-deep-networks
[9]
Mark Everingham, Luc J. Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. IJCV 88, 2 (2010), 303--338.
[10]
Kai Fan, Chunyuan Li, and Katherine A. Heller. 2016. A Unifying Variational Inference Framework for Hierarchical Graph-Coupled HMM with an Application to Influenza Infection. In AAAI, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 3828--3834. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11865
[11]
Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121--2129. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model
[12]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 1462--1471. http://jmlr.org/proceedings/papers/v37/gregor15.html
[13]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. ACM, 855--864.
[14]
Feiran Huang, Xiaoming Zhang, Zhoujun Li, Tao Mei, Yueying He, and Zhonghua Zhao. 2017. Learning Social Image Embedding with Deep Multimodal Attention Networks. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23-27, 2017. ACM, 460--468.
[15]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In SIGMM. ACM, 39--43.
[16]
Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. CoRRabs/1312.6114 (2013). http://arxiv.org/abs/1312.6114
[17]
Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. CoRRabs/1411.2539 (2014). http://arxiv.org/abs/1411.2539
[18]
Jure Leskovec, Kevin J. Lang, and Michael W. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW, Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti (Eds.). ACM, 631--640.
[19]
Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation Autoencoder Based Network Representation Learning for Classification. In ACL. Association for Computational Linguistics, 56--61.
[20]
Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang, and Qi Tian. 2017. Image Caption with Global-Local Attention. In AAAI, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 4133--4139. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14880
[21]
David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. JASIST 58, 7 (2007), 1019--1031.
[22]
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 289--297. http://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering
[23]
Julian J. McAuley and Jure Leskovec. 2012. Image Labeling on a Network: Using Social-Network Metadata for Image Classification. In ECCV (Lecture Notes in Computer Science), Vol. 7575. Springer, 828--841.
[24]
Shakir Mohamed and Danilo Jimenez Rezende. 2015. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. In NIPS. 2125--2133. http://papers.nips.cc/paper/5668-variational-information-maximisation-for-intrinsically-motivated-reinforcement-learning
[25]
Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Tri-Party Deep Network Representation. In IJCAI. IJCAI/AAAI Press, 1895--1901. http://www.ijcai.org/Abstract/16/271
[26]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In ACL. ACL, 1532--1543. http://aclweb. org/anthology/D/D14/D14--1162.pdf
[27]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. ACM, 701--710.
[28]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In ICML (JMLR Workshop and Conference Proceedings), Vol. 32. JMLR.org, 1278--1286. http://jmlr.org/proceedings/papers/v32/rezende14.html
[29]
Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science 290, 5500 (2000), 2323--2326.
[30]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
[31]
Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing Large-scale and High-dimensional Data. In WWW, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). ACM, 287--297.
[32]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In WWW. ACM, 1067--1077.
[33]
Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290, 5500 (2000), 2319-- 2323.
[34]
Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In SIGKDD. ACM, 1225--1234.
[35]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 2048--2057. http://jmlr.org/proceedings/papers/v37/xuc15.html
[36]
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network Representation Learning with Rich Text Information. In IJCAI. AAAI Press, 2111--2117. http://ijcai.org/Abstract/15/299
[37]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2016. Stacked Attention Networks for Image Question Answering. In CVPR. IEEE Computer Society, 21--29.
[38]
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2017. User Profile Preserving Social Network Embedding. In IJCAI, Carles Sierra (Ed.). ijcai.org, 3378--3384.

Cited By

View all
  • (2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
  • (2023)Differentiable Hierarchical Optimal Transport for Robust Multi-View LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322256945:6(7293-7307)Online publication date: 1-Jun-2023
  • (2023)Graph Neural Networks with Deep Mutual Learning for Designing Multi-modal Recommendation SystemsInformation Sciences10.1016/j.ins.2023.119815(119815)Online publication date: Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval
June 2018
550 pages
ISBN:9781450350464
DOI:10.1145/3206025
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention
  2. multi-view
  3. multimodal
  4. network embedding
  5. vae

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR '18
Sponsor:

Acceptance Rates

ICMR '18 Paper Acceptance Rate 44 of 136 submissions, 32%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)13
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
  • (2023)Differentiable Hierarchical Optimal Transport for Robust Multi-View LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322256945:6(7293-7307)Online publication date: 1-Jun-2023
  • (2023)Graph Neural Networks with Deep Mutual Learning for Designing Multi-modal Recommendation SystemsInformation Sciences10.1016/j.ins.2023.119815(119815)Online publication date: Oct-2023
  • (2023)Unveiling hierarchical relationships for social image representation learningApplied Soft Computing10.1016/j.asoc.2023.110792147:COnline publication date: 1-Nov-2023
  • (2022)Node Pair Information Preserving Network Embedding Based on Adversarial NetworksIEEE Transactions on Cybernetics10.1109/TCYB.2020.303506652:7(5908-5922)Online publication date: Jul-2022
  • (2022)DeepComp: A Hybrid Framework for Data Compression Using Attention Coupled AutoencoderArabian Journal for Science and Engineering10.1007/s13369-022-06587-x47:8(10395-10410)Online publication date: 7-Feb-2022
  • (2022)Semisupervised anomaly detection of multivariate time series based on a variational autoencoderApplied Intelligence10.1007/s10489-022-03829-153:5(6074-6098)Online publication date: 5-Jul-2022
  • (2021)Deep Attentive Multimodal Network Representation Learning for Social Media ImagesACM Transactions on Internet Technology10.1145/341729521:3(1-17)Online publication date: 16-Jun-2021
  • (2021)LP-UIT: A Multimodal Framework for Link Prediction in Social Networks2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom53373.2021.00108(742-749)Online publication date: Oct-2021
  • (2021)Dynamic network embedding via structural attentionExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114895176:COnline publication date: 15-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media