research-article

Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder

Authors:

Xiaoming Zhang,

Zhonghua ZhaoAuthors Info & Claims

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

Pages 108 - 116

https://doi.org/10.1145/3206025.3206035

Published: 05 June 2018 Publication History

Abstract

Learning the embedding for social media data has attracted extensive research interests as well as boomed a lot of applications, such as classification and link prediction. In this paper, we examine the scenario of a multimodal network with nodes containing multimodal contents and connected by heterogeneous relationships, such as social images containing multimodal contents (e.g., visual content and text description), and linked with various forms (e.g., in the same album or with the same tag). However, given the multimodal network, simply learning the embedding from the network structure or a subset of content results in sub-optimal representation. In this paper, we propose a novel deep embedding method, i.e., Attention-based Multi-view Variational Auto-Encoder (AMVAE), to incorporate both the link information and the multimodal contents for more effective and efficient embedding. Specifically, we adopt LSTM with attention model to learn the correlation between different data modalities, such as the correlation between visual regions and the specific words, to obtain the semantic embedding of the multimodal contents. Then, the link information and the semantic embedding are considered as two correlated views. A multi-view correlation learning based Variational Auto-Encoder (VAE) is proposed to learn the representation of each node, in which the embedding of link information and multimodal contents are integrated and mutually reinforced. Experiments on three real-world datasets demonstrate the superiority of the proposed model in two applications, i.e., multi-label classification and link prediction.

References

[1]

Amr Ahmed, Nino Shervashidze, Shravan M. Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed large-scale natural graph factorization. In WWW, Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo A. Baeza-Yates, and Sue B. Moon (Eds.). International World Wide Web Conferences Steering Committee / ACM, 37--48. http://dl.acm.org/citation.cfm?id=2488393

Digital Library

[2]

Smriti Bhagat, Graham Cormode, and S. Muthukrishnan. 2011. Node Classification in Social Networks. In Social Network Data Analytics. Springer, 115--148.

[3]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations. In AAAI. AAAI Press, 1145--1152. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12423

Digital Library

[4]

Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In SIGKDD. ACM, 119--128.

Digital Library

[5]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-tao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In CIVR. ACM.

Digital Library

[6]

Trevor F Cox and Michael AA Cox. 2000. Multidimensional scaling. CRC press.

[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. IEEE Computer Society, 248--255.

[8]

Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. In NIPS, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 658--666. http://papers.nips.cc/paper/ 6158-generating-images-with-perceptual-similarity-metrics-based-on-deep-networks

Digital Library

[9]

Mark Everingham, Luc J. Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. IJCV 88, 2 (2010), 303--338.

Digital Library

[10]

Kai Fan, Chunyuan Li, and Katherine A. Heller. 2016. A Unifying Variational Inference Framework for Hierarchical Graph-Coupled HMM with an Application to Influenza Infection. In AAAI, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 3828--3834. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11865

Digital Library

[11]

Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121--2129. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model

Digital Library

[12]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 1462--1471. http://jmlr.org/proceedings/papers/v37/gregor15.html

Digital Library

[13]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. ACM, 855--864.

Digital Library

[14]

Feiran Huang, Xiaoming Zhang, Zhoujun Li, Tao Mei, Yueying He, and Zhonghua Zhao. 2017. Learning Social Image Embedding with Deep Multimodal Attention Networks. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23-27, 2017. ACM, 460--468.

Digital Library

[15]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In SIGMM. ACM, 39--43.

Digital Library

[16]

Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. CoRRabs/1312.6114 (2013). http://arxiv.org/abs/1312.6114

[17]

Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. CoRRabs/1411.2539 (2014). http://arxiv.org/abs/1411.2539

[18]

Jure Leskovec, Kevin J. Lang, and Michael W. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW, Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti (Eds.). ACM, 631--640.

Digital Library

[19]

Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation Autoencoder Based Network Representation Learning for Classification. In ACL. Association for Computational Linguistics, 56--61.

[20]

Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang, and Qi Tian. 2017. Image Caption with Global-Local Attention. In AAAI, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 4133--4139. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14880

[21]

David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. JASIST 58, 7 (2007), 1019--1031.

Digital Library

[22]

Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 289--297. http://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering

Digital Library

[23]

Julian J. McAuley and Jure Leskovec. 2012. Image Labeling on a Network: Using Social-Network Metadata for Image Classification. In ECCV (Lecture Notes in Computer Science), Vol. 7575. Springer, 828--841.

Digital Library

[24]

Shakir Mohamed and Danilo Jimenez Rezende. 2015. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. In NIPS. 2125--2133. http://papers.nips.cc/paper/5668-variational-information-maximisation-for-intrinsically-motivated-reinforcement-learning

Digital Library

[25]

Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Tri-Party Deep Network Representation. In IJCAI. IJCAI/AAAI Press, 1895--1901. http://www.ijcai.org/Abstract/16/271

Digital Library

[26]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In ACL. ACL, 1532--1543. http://aclweb. org/anthology/D/D14/D14--1162.pdf

[27]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. ACM, 701--710.

Digital Library

[28]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In ICML (JMLR Workshop and Conference Proceedings), Vol. 32. JMLR.org, 1278--1286. http://jmlr.org/proceedings/papers/v32/rezende14.html

Digital Library

[29]

Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science 290, 5500 (2000), 2323--2326.

[30]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556

[31]

Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing Large-scale and High-dimensional Data. In WWW, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). ACM, 287--297.

Digital Library

[32]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In WWW. ACM, 1067--1077.

Digital Library

[33]

Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290, 5500 (2000), 2319-- 2323.

[34]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In SIGKDD. ACM, 1225--1234.

Digital Library

[35]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML (JMLR Workshop and Conference Proceedings), Vol. 37. JMLR.org, 2048--2057. http://jmlr.org/proceedings/papers/v37/xuc15.html

Digital Library

[36]

Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network Representation Learning with Rich Text Information. In IJCAI. AAAI Press, 2111--2117. http://ijcai.org/Abstract/15/299

Digital Library

[37]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2016. Stacked Attention Networks for Image Question Answering. In CVPR. IEEE Computer Society, 21--29.

[38]

Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2017. User Profile Preserving Social Network Embedding. In IJCAI, Carles Sierra (Ed.). ijcai.org, 3378--3384.

Digital Library

Cited By

Ullman SSamtani SZhu HLazarine BChen HNunamaker J(2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
https://doi.org/10.1080/07421222.2024.2376384
Luo DXu HCarin L(2023)Differentiable Hierarchical Optimal Transport for Robust Multi-View LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322256945:6(7293-7307)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3222569
Li JYang CYe GNguyen Q(2023)Graph Neural Networks with Deep Mutual Learning for Designing Multi-modal Recommendation SystemsInformation Sciences10.1016/j.ins.2023.119815(119815)Online publication date: Oct-2023
https://doi.org/10.1016/j.ins.2023.119815
Show More Cited By

Index Terms

Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Intra-view and Inter-view Attention for Multi-view Network Embedding
Advances in Multimedia Information Processing – PCM 2018
Abstract
Network Embedding, which represents nodes in networks with efficient low-dimensional vectors, has been proved useful in a variety of applications. However, most existing approaches study single-view networks but not the multi-view networks with ...
AAANE: Attention-Based Adversarial Autoencoder for Multi-scale Network Embedding
Advances in Knowledge Discovery and Data Mining
Abstract
Network embedding represents nodes in a continuous vector space and preserves structure information from a network. Existing methods usually adopt a “one-size-fits-all” approach when concerning multi-scale structure information, such as first- and ...
Multi-view Heterogeneous Network Embedding
Knowledge Science, Engineering and Management
Abstract
In the real world, the complex and diverse relations among different objects can be described in the form of networks. At the same time, with the emergence and development of network embedding, it has become an effective tool for processing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

June 2018

550 pages

ISBN:9781450350464

DOI:10.1145/3206025

Conference Chairs:
Kiyoharu Aizawa
The Univ. of Tokyo, Japan
,
Michael Lew
Leiden Univ., Netherlands
,
Shin'ichi Satoh
National Inst. of Informatics, Japan

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National High Technology Research and Development Program of China
National Natural Science Foundation of China
State Key Laboratory of Software Development Environment

Conference

ICMR '18

Sponsor:

SIGMM

ICMR '18: International Conference on Multimedia Retrieval

June 11 - 14, 2018

Yokohama, Japan

Acceptance Rates

ICMR '18 Paper Acceptance Rate 44 of 136 submissions, 32%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,375
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)13

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ullman SSamtani SZhu HLazarine BChen HNunamaker J(2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
https://doi.org/10.1080/07421222.2024.2376384
Luo DXu HCarin L(2023)Differentiable Hierarchical Optimal Transport for Robust Multi-View LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322256945:6(7293-7307)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3222569
Li JYang CYe GNguyen Q(2023)Graph Neural Networks with Deep Mutual Learning for Designing Multi-modal Recommendation SystemsInformation Sciences10.1016/j.ins.2023.119815(119815)Online publication date: Oct-2023
https://doi.org/10.1016/j.ins.2023.119815
Han LZhang XZhang LLu MHuang FLiu Y(2023)Unveiling hierarchical relationships for social image representation learningApplied Soft Computing10.1016/j.asoc.2023.110792147:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110792
Wang CShi WHuang LLin KHuang DYu P(2022)Node Pair Information Preserving Network Embedding Based on Adversarial NetworksIEEE Transactions on Cybernetics10.1109/TCYB.2020.303506652:7(5908-5922)Online publication date: Jul-2022
https://doi.org/10.1109/TCYB.2020.3035066
Sriram SDwivedi AChitra PSankar VAbirami SDurai SPandey DKhare M(2022)DeepComp: A Hybrid Framework for Data Compression Using Attention Coupled AutoencoderArabian Journal for Science and Engineering10.1007/s13369-022-06587-x47:8(10395-10410)Online publication date: 7-Feb-2022
https://doi.org/10.1007/s13369-022-06587-x
Chen NTu HDuan XHu LGuo C(2022)Semisupervised anomaly detection of multivariate time series based on a variational autoencoderApplied Intelligence10.1007/s10489-022-03829-153:5(6074-6098)Online publication date: 5-Jul-2022
https://dl.acm.org/doi/10.1007/s10489-022-03829-1
Huang FLi CGao BLiu YAlotaibi SChen H(2021)Deep Attentive Multimodal Network Representation Learning for Social Media ImagesACM Transactions on Internet Technology10.1145/341729521:3(1-17)Online publication date: 16-Jun-2021
https://dl.acm.org/doi/10.1145/3417295
Wu HWang SFang H(2021)LP-UIT: A Multimodal Framework for Link Prediction in Social Networks2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom53373.2021.00108(742-749)Online publication date: Oct-2021
https://doi.org/10.1109/TrustCom53373.2021.00108
Zhang CFan YXie YYu BLi CPan K(2021)Dynamic network embedding via structural attentionExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114895176:COnline publication date: 15-Aug-2021
https://dl.acm.org/doi/10.1016/j.eswa.2021.114895
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents