Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3178876.3186026acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

User-guided Hierarchical Attention Network for Multi-modal Social Image Popularity Prediction

Published: 10 April 2018 Publication History

Abstract

Popularity prediction for the growing social images has opened unprecedented opportunities for wide commercial applications, such as precision advertising and recommender system. While a few studies have explored this significant task, little research has addressed its unstructured properties of both visual and textual modalities, and further considered to learn effective representation from multi-modalities for popularity prediction. To this end, we propose a model named User-guided Hierarchical Attention Network (UHAN) with two novel user-guided attention mechanisms to hierarchically attend both visual and textual modalities. It is capable of not only learning effective representation for each modality, but also fusing them to obtain an integrated multi-modal representation under the guidance of user embedding. As no benchmark dataset exists, we extend a publicly available social image dataset by adding the descriptions of images. The comprehensive experiments have demonstrated the rationality of our proposed UHAN and its better performance than several strong alternatives.

References

[1]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2425--2433.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR Vol. abs/1409.0473 (2014). {arxiv}1409.0473
[3]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022.
[4]
Biao Chang, Hengshu Zhu, Yong Ge, Enhong Chen, Hui Xiong, and Chang Tan. 2014. Predicting the Popularity of Online Serials with Autoregressive Models CIKM. 1339--1348.
[5]
Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. 2016. Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model MM. 898--907.
[6]
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering. CoRR Vol. abs/1511.05960 (2015). {arxiv}1511.05960
[7]
Cesc Chunseong Park, Byeongchang Kim, and Gunhee Kim. 2017. Attend to You: Personalized Image Captioning With Context Sequence Memory Networks CVPR. 895--903.
[8]
Peng Cui, Fei Wang, Shaowei Liu, Mingdong Ou, Shiqiang Yang, and Lifeng Sun. 2011. Who should share what?: item-level social influence prediction for users and posts ranking. In SIGIR. 185--194.
[9]
Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih-Fu Chang. 2015. Image Popularity Prediction in Social Media Using Sentiment and Context Features MM. 907--910.
[10]
Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika (1971), 83--90.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[12]
Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, and Kazunari Sugiyama. 2014. Predicting the popularity of web 2.0 items based on user comments SIGIR. 233--242.
[13]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780.
[14]
Yoonseop Kang, Saehoon Kim, and Seungjin Choi. 2012. Deep Learning to Hash with Multiple Representations ICDM. 930--935.
[15]
Andrej Karpathy and Fei-Fei Li. 2015. Deep visual-semantic alignments for generating image descriptions CVPR. 3128--3137.
[16]
Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular?. In WWW. 867--876.
[17]
Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, JungWoo Ha, and Byoung-Tak Zhang. 2016. Multimodal Residual Learning for Visual QA. In NIPS. 361--369.
[18]
Jin-Hwa Kim, Kyoung-Woon On, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Hadamard product for low-rank bilinear pooling. ICLR (2017).
[19]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In ICLR.
[20]
Himabindu Lakkaraju and Jitendra Ajmera. 2011. Attention prediction on social media brand pages. In CIKM. 2157--2160.
[21]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature Vol. 521, 7553 (2015), 436--444.
[22]
Kathy Lee, Ashequl Qadir, Sadid A. Hasan, Vivek V. Datla, Aaditya Prakash, Joey Liu, and Oladimeji Farri. {n. d.}. Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks. In WWW. 705--714.
[23]
Chee Wee Leong, Rada Mihalcea, and Samer Hassan. 2010. Text Mining for Automatic Image Tagging. In COLING. 647--655.
[24]
Cheng Li, Jiaqi Ma, Xiaoxiao Guo, and Qiaozhu Mei. 2017. DeepCas: An End-to-end Predictor of Information Cascades WWW. 577--586.
[25]
Pan Lu, Hongsheng Li, Wei Zhang, Jianyong Wang, and Xiaogang Wang. 2018. Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering. In AAAI.
[26]
Corey Lynch, Kamelia Aryafar, and Josh Attenberg. 2016. Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank. In SIGKDD. 541--548.
[27]
Travis Martin, Jake M. Hofman, Amit Sharma, Ashton Anderson, and Duncan J. Watts. 2016. Exploring Limits to Prediction in Complex Social Systems WWW. 683--694.
[28]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. In NIPS. 2204--2212.
[29]
Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2017. Dual Attention Networks for Multimodal Reasoning and Matching CVPR. 299--307.
[30]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML. 689--696.
[31]
Behnaz Nojavanasghari, Deepak Gopinath, Jayanth Koushik, Tadas Baltrusaitis, and Louis-Philippe Morency. 2016. Deep multimodal fusion for persuasiveness prediction ICML. 284--288.
[32]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP. 1532--1543.
[33]
Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrián, Honglin Yu, and Pascal Van Hentenryck. 2017. Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity WWW. 735--744.
[34]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR Vol. abs/1409.1556 (2014). {arxiv}1409.1556
[35]
Gábor Szabó and Bernardo A. Huberman. 2010. Predicting the popularity of online content. Journal of Commun. ACM Vol. 53, 8 (2010), 80--88.
[36]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In ACL. 1556--1566.
[37]
Oren Tsur and Ari Rappoport. 2012. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In WSDM. 643--652.
[38]
Daixin Wang, Peng Cui, Mingdong Ou, and Wenwu Zhu. 2015. Deep Multimodal Hashing with Orthogonal Regularization IJCAI. 2291--2297.
[39]
William M Wells, Paul Viola, Hideki Atsumi, Shin Nakajima, and Ron Kikinis. 1996. Multi-modal volume registration by maximization of mutual information. Medical image analysis Vol. 1, 1 (1996), 35--51.
[40]
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, and Tao Mei. 2017. Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks IJCAI. 3062--3068.
[41]
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, and Tao Mei. 2016. Time Matters: Multi-scale Temporalization of Social Media Popularity MM. 1336--1344.
[42]
Shuai Xiao, Junchi Yan, Changsheng Li, Bo Jin, Xiangfeng Wang, Xiaokang Yang, Stephen M. Chu, and Hongyuan Zha. 2016. On Modeling and Predicting Individual Paper Citation Count over Time IJCAI. 2676--2682.
[43]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2016. Stacked Attention Networks for Image Question Answering CVPR. 21--29.
[44]
Chao Zhang, Keyang Zhang, Quan Yuan, Haoruo Peng, Yu Zheng, Tim Hanratty, Shaowen Wang, and Jiawei Han. 2017. Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning. In WWW. 361--370.
[45]
Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. 2015. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity SIGKDD. 1513--1522.

Cited By

View all
  • (2024)A Survey of Deep Learning-Based Information Cascade PredictionSymmetry10.3390/sym1611143616:11(1436)Online publication date: 29-Oct-2024
  • (2024)Retrieval-Augmented Hypergraph for Multimodal Social Media Popularity PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672041(445-455)Online publication date: 25-Aug-2024
  • (2024)Sentiment and hashtag-aware attentive deep neural network for multimodal post popularity predictionNeural Computing and Applications10.1007/s00521-024-10755-537:4(2799-2824)Online publication date: 9-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 10 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention network
  2. multi-modal analysis
  3. social image popularity

Qualifiers

  • Research-article

Funding Sources

  • NSFC
  • Shanghai Sailing Program
  • SHMEC
  • NSFC-Zhejiang

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)33
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Deep Learning-Based Information Cascade PredictionSymmetry10.3390/sym1611143616:11(1436)Online publication date: 29-Oct-2024
  • (2024)Retrieval-Augmented Hypergraph for Multimodal Social Media Popularity PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672041(445-455)Online publication date: 25-Aug-2024
  • (2024)Sentiment and hashtag-aware attentive deep neural network for multimodal post popularity predictionNeural Computing and Applications10.1007/s00521-024-10755-537:4(2799-2824)Online publication date: 9-Dec-2024
  • (2023)Neural Image Popularity Assessment with Retrieval-augmented TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611918(2427-2436)Online publication date: 26-Oct-2023
  • (2023)Micro-Video Popularity Prediction Via Multimodal Variational Information BottleneckIEEE Transactions on Multimedia10.1109/TMM.2021.312053725(24-37)Online publication date: 2023
  • (2023)CasFlow: Exploring Hierarchical Structures and Propagation Uncertainty for Cascade PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312647535:4(3484-3499)Online publication date: 1-Apr-2023
  • (2023)ViViD: View Prediction of Online Video Through Deep Neural Network-Based Analysis of Subjective Video AttributesIEEE Transactions on Broadcasting10.1109/TBC.2022.323110069:1(191-200)Online publication date: Mar-2023
  • (2023)Cascade Prediction with Recurrent Neural Networks and Diffusion Depth Distributions2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE)10.1109/NNICE58320.2023.10105676(70-77)Online publication date: 24-Feb-2023
  • (2023)Who is the Rising Star? Demystifying the Promising Streamers in Crowdsourced Live StreamingIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228881(1-10)Online publication date: 17-May-2023
  • (2023)Hierarchical attention neural network for information cascade predictionInformation Sciences10.1016/j.ins.2022.11.163622(1109-1127)Online publication date: Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media