short-paper

Visual Story Ordering with a Bidirectional Writer

Authors:

Hen-Hsen Huang,

Hsin-Hsi ChenAuthors Info & Claims

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

Pages 326 - 330

https://doi.org/10.1145/3372278.3390735

Published: 08 June 2020 Publication History

Abstract

This paper introduces visual story ordering, a challenging task in which images and text are ordered in a visual story jointly. We propose a neural network model based on the reader-processor-writer architecture with a self-attention mechanism. A novel bidirectional decoder is further proposed with bidirectional beam search. Experimental results show the effectiveness of the approach. The information gained from multimodal learning is presented and discussed. We also find that the proposed embedding narrows the distance between images and their corresponding story sentences, even though we do not align the two modalities explicitly. As it addresses a general issue in generative models, the proposed bidirectional inference mechanism applies to a variety of applications.

References

[1]

Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, and Mohit Bansal. 2016. Sort Story: Sorting Jumbled Images and Captions into Stories. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 925--931.https://aclweb.org/anthology/D16--1091

[2]

Regina Barzilay, Noemie Elhadad, and Kathleen R. McKeown. 2002. Inferring Strategies for Sentence Ordering in Multi document News Summarization. J. Artif. Int. Res.17, 1 (Aug. 2002), 35--55. http://dl.acm.org/citation.cfm?id=1622810.1622812

[3]

Regina Barzilay and Mirella Lapata. 2008. Modeling Local Coherence: An Entity-based Approach. Comput. Linguist. 34, 1 (March 2008), 1--34. https://doi.org/10.1162/coli.2008.34.1.1

Digital Library

[4]

Danushka Bollegala, Naoaki Okazaki, and Mitsuru Ishizuka. 2010. A Bottom-up Approach to Sentence Ordering for Multi-document Summarization. Inf. Process.Manage.46, 1 (Jan. 2010), 89--109. https://doi.org/10.1016/j.ipm.2009.07.004

Digital Library

[5]

Jonathan M. Campbell and David M. McCord. 1996.The WAIS-R Comprehension and Picture Arrangement Subtests as Measures of Social Intelligence: Testing Traditional Interpretations. Journal of Psychoeducational As-sessment14, 3 (1996), 240--249. https://doi.org/10.1177/073428299601400305arXiv:https://doi.org/10.1177/073428299601400305

[6]

Patricia Chisholm. 1970. The Structure of Intellect Theory: Implications for More Meaningful Mental Test Interpretations. (1970).

[7]

Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. [n.d.]. DeViSE: A Deep Visual-Semantic Embedding Model. In Advances in Neural Information Processing Systems 26.

[8]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14). MIT Press, Cambridge, MA,USA, 2672--2680. http://dl.acm.org/citation.cfm?id=2969033.2969125

Digital Library

[9]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90

[10]

Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, and Margaret Mitchell. 2016. Visual Storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Association for Computational Linguistics, San Diego, California, 1233--1239. http://www.aclweb.org/anthology/N16--1147

[11]

A. Karpathy and Fei-Fei Li. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 664--676. https://doi.org/10.1109/TPAMI.2016.2598339

Digital Library

[12]

Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal Neural Language Models. In Proceedings of the 31st International Conference on Machine Learning. 595--603.

[13]

Mirella Lapata. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sapporo, Japan,545--552. https://doi.org/10.3115/1075096.1075165

Digital Library

[14]

Karen Le. 2015. Narrative and Horticultural Imperative: Predicting Discourse Ability in Traumatic Brain Injury from Cognitive and Communicative Factors.(2015).

[15]

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunning-ham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al.2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681--4690.

[16]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130(2017).

[17]

Lajanugen Logeswaran, Honglak Lee, and Dragomir Radev. 2018. Sentence Ordering and Coherence Modeling using Recurrent Neural Networks. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17011

[18]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. 2015. Unsupervised Learning of Video Representations Using LSTMs. In Proceedings of the32Nd International Conference on International Conference on Machine Learning- Volume 37 (ICML'15). JMLR.org, 843--852. http://dl.acm.org/citation.cfm?id=3045118.3045209

[19]

Qing Sun, Stefan Lee, and Dhruv Batra. 2017. Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning. In The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).

[20]

Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2015. Order matters: Sequence to sequence for sets.arXiv preprint arXiv:1511.06391(2015).

[21]

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. [n.d.]. Pointer Networks. In Advances in Neural Information Processing Systems 28.

[22]

Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling Up to Large Vocabulary Image Annotation. https://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/view/2926

[23]

Kuo-Hao Zeng, William B Shen, De-An Huang, Min Sun, and Juan Carlos Niebles. 2017. Visual forecasting by imitating dynamics in natural sequences. In Proceedings of the IEEE International Conference on Computer Vision. 2999--3008.

[24]

M. Zhou, M. Huang, and X. Zhu. 2019. Story Ending Selection by Finding Hints From Pairwise Candidate Endings. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 4 (April 2019), 719--729. https://doi.org/10.1109/TASLP.2019.2893499

Digital Library

Index Terms

Visual Story Ordering with a Bidirectional Writer
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Bidirectional Sentence Ordering with Interactive Decoding
Sentence ordering aims at restoring orders of shuffled sentences in a paragraph. Previous methods usually predict orders in a single direction, i.e., from head to tail. However, unidirectional prediction inevitably causes error accumulation, which ...
Cognitive memory-inspired sentence ordering model

As the novel web social media emerges on the web, large scale unordered sentences are springing up in the forms: news headlines, microblogs, comments and so on. Although these massive sentences contain rich information, their loose semantic association ...
Sentence Ordering by Context-Enhanced Pairwise Comparison
Natural Language Processing and Chinese Computing
Abstract
Sentence ordering is a task arranging the given unordered text into the correct order. A feasible approach is to use neural networks to predict the relative order of all sentence pairs and then organize the sentences into a coherent paragraph with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

June 2020

605 pages

ISBN:9781450370875

DOI:10.1145/3372278

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Björn Þór Jónsson
IT University of Copenhagen, Denmark
,
Noriko Kando
National Institute of Informatics, Tokyo
,
Program Chairs:
Klaus Schoeffmann
Klagenfurt University, Austria
,
Phoebe Chen
La Trobe University, Australia
,
Noel E. O'Connor
Dublin City University, Ireland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Academia Sinica, Taiwan
Ministry of Science and Technology, Taiwan

Conference

ICMR '20

Sponsor:

SIGMM

ICMR '20: International Conference on Multimedia Retrieval

June 8 - 11, 2020

Dublin, Ireland

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
67
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten