Story Generation from Images Using Deep Learning

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1417))

Included in the following conference series:

International Conference on Information, Communication and Computing Technology

581 Accesses
1 Citations

Abstract

Recently, the problem of creating descriptive captions for images became a significant one. However, human languages’ expressivity had been among the challenges that hindered researchers from widely experimenting with creating linguistically rich captions for images. That motivated us to utilize advanced deep learning algorithms to generate captions for images. The researchers proposed an AI model utilizing deep learning and natural language processing algorithms, which has two main components, an image-feature extractor, and a story generator. The researchers trained the first component (image-feature extractor) of the model to predict object names in images. The second component (story-generator) was trained on a custom short descriptive sentence which considered short stories. So, the output from the first component (list of words) will be entered into the second component to generate stories on input images. Thus, when testing the model’s performance, a list of names will be entered from the first component so that the second generator arranges them and generates a short story from them. The proposed model developed could generate a short story expressive of an input image as shown by the results of a logical value used on the BLEU scale of 0.59, which further research is planned to improve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Hierarchical Approach for Visual Storytelling Using Image Description

Image Caption Generation Using Neural Network Models and LSTM Hierarchical Structure

Hybrid explainable image caption generation using image processing and natural language processing

Article 23 September 2024

References

Amritkar, C., Jabade, V.: Image caption generation using deep learning technique. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–4. IEEE (2018)
Google Scholar
Chu, W.T., Guo, H.J.: Movie genre classification based on poster images with deep neural networks. In: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes, pp. 39–45 (2017)
Google Scholar
Ganegedara, T.: Natural Language Processing with TensorFlow: Teach Language to Machines Using Python’s Deep Learning Library. Packt Publishing Ltd. (2018)
Google Scholar
Han, M., Chen, W., Moges, A.D.: Fast image captioning using LSTM. Cluster Comput. 22(3), 6143–6155 (2019)
Article Google Scholar
Haque, M.F., Lim, H.Y., Kang, D.S.: Object detection based on VGG with ResNet network. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. IEEE (2019)
Google Scholar
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
He, X., Deng, L.: Deep learning for image-to-text generation: a technical overview. IEEE Signal Process. Mag. 34(6), 109–116 (2017)
Article Google Scholar
Hoang, L.: An Evaluation of VGG16 and Yolo V3 on Hand-Drawn Images. University Honors These (2019)
Google Scholar
Hossain, M.A., Sajib, M.S.A.: Classification of image using convolutional neural network (CNN). Glob. J. Comput. Sci. Technol. (2019)
Google Scholar
Islam, S., Khan, S.I.A., Abedin, M.M., Habibullah, K.M., Das, A.K.: Bird species classification from an image using VGG-16 network. In: Proceedings of the 2019 7th International Conference on Computer and Communications Management, pp. 38–42 (2019)
Google Scholar
Jain, P., Agrawal, P., Mishra, A., Sukhwani, M., Laha, A., Sankaranarayanan, K.: Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501 (2017)
Kamavisdar, P., Saluja, S., Agrawal, S.: A survey on image classification approaches and techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(1), 1005–1009 (2013)
Google Scholar
Khatri, C., et al.: Algorithmic content generation for products. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2945–2947. IEEE (2015)
Google Scholar
Lakshminarasimhan Srinivasan, D.S., Amutha, A.: Image captioning-a deep learning approach. Int. J. Appl. Eng. Res. 13(9), 7239–7242 (2018)
Google Scholar
Lee, C.: Image caption generation using recurrent neural network. J. KIISE 43(8), 878–882 (2016)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Ordonez, V., et al.: Large scale retrieval and generation of image descriptions. Int. J. Comput. Vis. 119(1), 46–59 (2016)
Article MathSciNet Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002). Story Generation from Images using Deep Learning1118
Google Scholar
Pawade, D., Sakhapara, A., Jain, M., Jain, N., Gada, K.: Story scrambler-automatic text generation using word level RNN-LSTM. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 10(6), 44–53 (2018)
Google Scholar
Rashid, M., Khan, M.A., Sharif, M., Raza, M., Sarfraz, M.M., Afza, F.: Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed. Tools Appl. 78(12), 15751–15777 (2019)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, X., Guo, H., Li, S., Wang, S., Li, J.: A novel image classification method with CNN-XGBoost model. In: Kraetzer, C., Shi, Y.-Q., Dittmann, J., Kim, H.J. (eds.) IWDW 2017. LNCS, vol. 10431, pp. 378–390. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64185-0_28
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Google Scholar
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)
Google Scholar
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S.: Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078 (2017)
Skovajsová, L.: Long short-term memory description and its application in text processing. In: 2017 Communication and Information Technologies (KIT), pp. 1–4. IEEE (2017)
Google Scholar
Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019). https://doi.org/10.3390/app9102024
Article Google Scholar
Thomaidou, S., Lourentzou, I., Katsivelis-Perakis, P., Vazirgiannis, M.: Automated snippet generation for online advertising. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1841–1844 (2013)
Google Scholar
Yin, X., Ordonez, V.: Obj2text: generating visually descriptive language from object layouts. arXiv preprint arXiv:1707.07102 (2017)

Download references

Author information

Authors and Affiliations

King Abdulaziz University, Jeddah, Saudi Arabia
Abrar Alnami, Miada Almasre & Norah Al-Malki

Authors

Abrar Alnami
View author publications
You can also search for this author in PubMed Google Scholar
Miada Almasre
View author publications
You can also search for this author in PubMed Google Scholar
Norah Al-Malki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Abrar Alnami , Miada Almasre or Norah Al-Malki .

Editor information

Editors and Affiliations

ABV Indian Institute of Information Technology and Management, Gwalior, India
Mahua Bhattacharya
Jagan Institute of Management Studies, Delhi, India
Latika Kharb
Jagan Institute of Management Studies, Delhi, India
Deepak Chahal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alnami, A., Almasre, M., Al-Malki, N. (2021). Story Generation from Images Using Deep Learning. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-88378-2_16
Published: 08 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88377-5
Online ISBN: 978-3-030-88378-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics