Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs

Published: 04 July 2014 Publication History

Abstract

Nowadays, the amount of multimedia contents in microblogs is growing significantly. More than 20% of microblogs link to a picture or video in certain large systems. The rich semantics in microblogs provides an opportunity to endow images with higher-level semantics beyond object labels. However, this raises new challenges for understanding the association between multimodal multimedia contents in multimedia-rich microblogs. Disobeying the fundamental assumptions of traditional annotation, tagging, and retrieval systems, pictures and words in multimedia-rich microblogs are loosely associated and a correspondence between pictures and words cannot be established. To address the aforementioned challenges, we present the first study analyzing and modeling the associations between multimodal contents in microblog streams, aiming to discover multimodal topics from microblogs by establishing correspondences between pictures and words in microblogs. We first use a data-driven approach to analyze the new characteristics of the words, pictures, and their association types in microblogs. We then propose a novel generative model called the Bilateral Correspondence Latent Dirichlet Allocation (BC-LDA) model. Our BC-LDA model can assign flexible associations between pictures and words and is able to not only allow picture-word co-occurrence with bilateral directions, but also single modal association. This flexible association can best fit the data distribution, so that the model can discover various types of joint topics and generate pictures and words with the topics accordingly. We evaluate this model extensively on a large-scale real multimedia-rich microblogs dataset. We demonstrate the advantages of the proposed model in several application scenarios, including image tagging, text illustration, and topic discovery. The experimental results demonstrate that our proposed model can significantly and consistently outperform traditional approaches.

References

[1]
K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. 2003. Matching words and pictures. J. Mach. Learn. Res. 3, 1107--1135.
[2]
D. M. Blei and M. I. Jordan. 2003. Modeling annotated data. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). ACM Press, New York, 127--134.
[3]
D. M. Blei and J. D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. 1, 1, 17--35.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[5]
G. Casella and E. I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3, 167--174.
[6]
T. Chen, D. Lu, M.-Y. Kan, and P. Cui. 2013. Understanding and classifying image tweets. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 781--784.
[7]
X. Chen, X. Hu, Y. An, Z. Xiong, T. He, and E. K. Park. 2011. Perspective hierarchical dirichlet process for user-tagged image modeling. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 1341--1346.
[8]
X. Chen, X. Hu, Z. Zhou, C. Lu, G. Rosen, T. He, and E. K. Park. 2010. A probabilistic topic-connection model for automatic image annotation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 899--908.
[9]
China Internet Watch Team Staff. 2011. Total WEIBO users: Sina v.s. Tencent. http://www.chinainternetwatch.com/1296/total-weibo-users-sina-tencent.
[10]
T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. 2009. Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the International Conference on Image and Video Retrieval.
[11]
B. Cui, C. Zhang, and G. Cong. 2010. Content-enriched classifier for web video classification. In Proceedings of the Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'10). 619--626.
[12]
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 248--255.
[13]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2011. The Pascal visual object classes challenge 2011 (voc2011) results. http://www.pascalnetwork.org/challenges/VOC.
[14]
R. Fagin, R. Kumar, and D. Sivakumar. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 28--36.
[15]
K. Fukumasu, K. Eguchi, and E. Xing. 2012. Symmetric correspondence topic models for multilingual text analysis. Adv. Neural Inf. Process. Syst. 25, 1295--1303.
[16]
J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). 119--126.
[17]
M. Jiang, P. Cui, R. Liu, Q. Yang, F. Wang, W. Zhu, and S. Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM Press, New York, 45--54.
[18]
D. Joshi, J. Z. Wang, and J. Li. 2006. The story picturing engine—A system for automatic text illustration. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 68--89.
[19]
L. J. Li, C. Wang, Y. Lim, D. M. Blei, and L. Fei-Fei. 2010. Building and using a semantivisual image hierarchy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 3336--3343.
[20]
S. Liu, P. Cui, H. Luan, W. Zhu, S. Yang, and Q. Tian. 2014. Social-oriented visual image search. Comput. Vis. Image Understand. 118, 30--39.
[21]
D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV'99). 1150--1157.
[22]
G. A. Miller. 1995. WordNet: A lexical database for English. Comm. ACM 38, 11, 39--41.
[23]
R. Miller. 2010. Twitter unveils new website with picture and video content embedded on site. http://www.engadget.com/2010/09/14/twitter-relaunches-main-site-with-content-embedded-on-site.
[24]
F. Moosmann, B. Triggs, and F. Jurie. 2007. Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 985--992.
[25]
L. Nie, M. Wang, Z. Zha, G. Li, and T.-S. Chua. 2011. Multimedia answering: Enriching text QA with media information. In Proceedings of the 34th ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'11). 695--704.
[26]
T.-G. Noh, S.-B. Park, H.-G. Yoon, S.-J. Lee, and S.-Y. Park. 2009. An automatic translation of tags for multimedia contents using folksonomy networks. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). 492--499.
[27]
M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 230--238.
[28]
G. J. Qi, C. Aggarwal, and T. Huang. 2011. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM Press, New York, 297--306.
[29]
Z. Qi, M. Yang, Z. M. Zhang, and Z. Zhang. 2012. Multi-view learning from imperfect tagging. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 479--488.
[30]
Y. Qu, C. Huang, P. Zhang, and J. Zhang. 2011. Microblogging after a major disaster in China: A case study of the 2010 Yushu earthquake. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). ACM Press, New York, 25--34.
[31]
D. Ramage, S. Dumais, and D. Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the AAAI International Conference on Weblogs and Social Media. The AAAI Press.
[32]
J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448.
[33]
M. Shi, X. Sun, D. Tao, and C. Xu. 2012. Exploiting visual word co-occurrence for image retrieval. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 69--78.
[34]
B. Sigurbjornsson and R. Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 327--336.
[35]
J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. 1470--1477.
[36]
R. Sproat and T. Emerson. 2003. The first international chinese word segmentation bakeoff. In Proceedings of the SIGHAN Workshop on Chinese Language Processing. ACL, 133--143.
[37]
R. Van Zwol and L. G. Pueyo. 2012. Spatially-aware indexing for image object retrieval. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM'12). ACM Press, New York, 3--12.
[38]
Z. Wang, P. Cui, L. Xie, H. Chen, W. Zhu, and S. Yang. 2012. Analyzing social media via event facets. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 1359--1360.
[39]
P. Wu, S. C. H. Hoi, P. Zhao, and Y. He. 2011. Mining social images with distance metric learning for automated image tagging. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM Press, New York, 197--206.
[40]
Y. Yang, P. Cui, W. Zhu, and S. Yang. 2013. User interest and social influence based emotion prediction for individuals. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 785--788.

Cited By

View all
  • (2024)Graph-Based Multimodal Topic Modeling With Word Relations and Object RelationsIEEE Transactions on Multimedia10.1109/TMM.2024.337817326(8210-8225)Online publication date: 19-Mar-2024
  • (2023)AnANet: Association and Alignment Network for Modeling Implicit Relevance in Cross-Modal Correlation ClassificationIEEE Transactions on Multimedia10.1109/TMM.2022.322996025(7867-7880)Online publication date: 1-Jan-2023
  • (2023)Multimodal Topic Modeling by Exploring Characteristics of Short Text Social MediaIEEE Transactions on Multimedia10.1109/TMM.2022.314706425(2430-2445)Online publication date: 1-Jan-2023
  • Show More Cited By

Index Terms

  1. Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 4
    June 2014
    132 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/2656131
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2014
    Accepted: 01 February 2014
    Revised: 01 January 2014
    Received: 01 August 2013
    Published in TOMM Volume 10, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Social media
    2. image analysis
    3. topic models

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Graph-Based Multimodal Topic Modeling With Word Relations and Object RelationsIEEE Transactions on Multimedia10.1109/TMM.2024.337817326(8210-8225)Online publication date: 19-Mar-2024
    • (2023)AnANet: Association and Alignment Network for Modeling Implicit Relevance in Cross-Modal Correlation ClassificationIEEE Transactions on Multimedia10.1109/TMM.2022.322996025(7867-7880)Online publication date: 1-Jan-2023
    • (2023)Multimodal Topic Modeling by Exploring Characteristics of Short Text Social MediaIEEE Transactions on Multimedia10.1109/TMM.2022.314706425(2430-2445)Online publication date: 1-Jan-2023
    • (2023)Unsupervised multimodal learning for image-text relation classification in tweetsPattern Analysis & Applications10.1007/s10044-023-01204-526:4(1793-1804)Online publication date: 1-Nov-2023
    • (2023)Multimodal Topic and Sentiment Recognition for Chinese Data Based on Pre-trained EncodersPattern Recognition and Computer Vision10.1007/978-981-99-8540-1_26(323-334)Online publication date: 13-Oct-2023
    • (2022)CrowdNAS: A Crowd-guided Neural Architecture Searching Approach to Disaster Damage AssessmentProceedings of the ACM on Human-Computer Interaction10.1145/35551796:CSCW2(1-29)Online publication date: 11-Nov-2022
    • (2021)Semantic Correspondence with Geometric Structure AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344157617:3(1-21)Online publication date: 22-Jul-2021
    • (2019)COCO-CN for Cross-Lingual Image Tagging, Captioning, and RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.289649421:9(2347-2360)Online publication date: Sep-2019
    • (2019)Social media based event summarization by user–text–image co-clusteringKnowledge-Based Systems10.1016/j.knosys.2018.10.028164(107-121)Online publication date: Jan-2019
    • (2018)Multimodal Multiplatform Social Media Event SummarizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/311543314:2s(1-23)Online publication date: 25-Apr-2018
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media