A Hierarchical Approach for Visual Storytelling Using Image Description

Md. Sultan Al Nahian¹¹,
Tasmia Tasrin¹¹,
Sagar Gandhi¹¹,
Ryan Gaines¹¹ &
…
Brent Harrison¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11869))

Included in the following conference series:

International Conference on Interactive Digital Storytelling

3276 Accesses
10 Citations

Abstract

One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset [7] and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vision Transformer Based Model for Describing a Set of Images as a Story

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Modular StoryGAN with Background and Theme Awareness for Story Visualization

References

Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Cardona-Rivera, R.E., Li, B.: PLOTSHOT: generating discourse-constrained stories around photos. In: AIIDE (2016)
Google Scholar
Gonzalez-Rico, D., Fuentes-Pineda, G.: Contextualize, show and tell: a neural visual storyteller. arXiv preprint. arXiv:1806.00738 (2018)
Harrison, B., Purdy, C., Riedl, M.O.: Toward automated story generation with Markov chain Monte Carlo methods and deep neural networks. In: Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, T.H.K., et al.: Visual storytelling. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1233–1239 (2016)
Google Scholar
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)
Google Scholar
Kim, T., Heo, M.O., Son, S., Park, K.W., Zhang, B.T.: GLAC Net: glocal attention cascading networks for multi-image cued story generation. arXiv preprint. arXiv:1805.10973 (2018)
Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–325 (2017)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Lukin, S.M., Hobbs, R., Voss, C.R.: A pipeline for creative visual storytelling. arXiv preprint. arXiv:1807.08077 (2018)
Martin, L.J., et al.: Event representations for automated story generation with deep neural nets. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)
Smilevski, M., Lalkovski, I., Madjarov, G.: Stories for images-in-sequence by using visual and narrative components. In: Kalajdziski, S., Ackovska, N. (eds.) ICT 2018. CCIS, vol. 940, pp. 148–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00825-3_13
Chapter Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Wang, X., Chen, W., Wang, Y.F., Wang, W.Y.: No metrics are perfect: adversarial reward learning for visual storytelling. arXiv preprint. arXiv:1804.09160 (2018)
Yao, L., Peng, N., Weischedel, R.M., Knight, K., Zhao, D., Yan, R.: Plan-and-write: towards better automatic storytelling. In: CoRR. abs/1811.05701 (2018)
Young, R.M., Ware, S.G., Cassell, B.A., Robertson, J.: Plans and planning in narrative generation: a review of plan-based approaches to the generation of story, discourse and interactivity in narratives. Sprache und Datenverarbeitung Spec. Issue Formal Comput. Models Narrative 37(1–2), 41–64 (2013)
Google Scholar
Yu, L., Bansal, M., Berg, T.L.: Hierarchically-attentive rnn for album summarization and storytelling. arXiv preprint. arXiv:1708.02977 (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Kentucky, Lexington, USA
Md. Sultan Al Nahian, Tasmia Tasrin, Sagar Gandhi, Ryan Gaines & Brent Harrison

Authors

Md. Sultan Al Nahian
View author publications
You can also search for this author in PubMed Google Scholar
Tasmia Tasrin
View author publications
You can also search for this author in PubMed Google Scholar
Sagar Gandhi
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Gaines
View author publications
You can also search for this author in PubMed Google Scholar
Brent Harrison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Sultan Al Nahian .

Editor information

Editors and Affiliations

University of Utah, Salt Lake City, UT, USA
Rogelio E. Cardona-Rivera
Georgia Institute of Technology, Atlanta, GA, USA
Anne Sullivan
University of Utah, Salt Lake City, UT, USA
R. Michael Young

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nahian, M.S.A., Tasrin, T., Gandhi, S., Gaines, R., Harrison, B. (2019). A Hierarchical Approach for Visual Storytelling Using Image Description. In: Cardona-Rivera, R., Sullivan, A., Young, R. (eds) Interactive Storytelling. ICIDS 2019. Lecture Notes in Computer Science(), vol 11869. Springer, Cham. https://doi.org/10.1007/978-3-030-33894-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-33894-7_30
Published: 22 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33893-0
Online ISBN: 978-3-030-33894-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hierarchical Approach for Visual Storytelling Using Image Description

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vision Transformer Based Model for Describing a Set of Images as a Story

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Modular StoryGAN with Background and Theme Awareness for Story Visualization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Hierarchical Approach for Visual Storytelling Using Image Description

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vision Transformer Based Model for Describing a Set of Images as a Story

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Modular StoryGAN with Background and Theme Awareness for Story Visualization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation