Nothing Special   »   [go: up one dir, main page]

Skip to main content

Story Generation from Images Using Deep Learning

  • Conference paper
  • First Online:
Information, Communication and Computing Technology (ICICCT 2021)

Abstract

Recently, the problem of creating descriptive captions for images became a significant one. However, human languages’ expressivity had been among the challenges that hindered researchers from widely experimenting with creating linguistically rich captions for images. That motivated us to utilize advanced deep learning algorithms to generate captions for images. The researchers proposed an AI model utilizing deep learning and natural language processing algorithms, which has two main components, an image-feature extractor, and a story generator. The researchers trained the first component (image-feature extractor) of the model to predict object names in images. The second component (story-generator) was trained on a custom short descriptive sentence which considered short stories. So, the output from the first component (list of words) will be entered into the second component to generate stories on input images. Thus, when testing the model’s performance, a list of names will be entered from the first component so that the second generator arranges them and generates a short story from them. The proposed model developed could generate a short story expressive of an input image as shown by the results of a logical value used on the BLEU scale of 0.59, which further research is planned to improve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amritkar, C., Jabade, V.: Image caption generation using deep learning technique. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–4. IEEE (2018)

    Google Scholar 

  2. Chu, W.T., Guo, H.J.: Movie genre classification based on poster images with deep neural networks. In: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes, pp. 39–45 (2017)

    Google Scholar 

  3. Ganegedara, T.: Natural Language Processing with TensorFlow: Teach Language to Machines Using Python’s Deep Learning Library. Packt Publishing Ltd. (2018)

    Google Scholar 

  4. Han, M., Chen, W., Moges, A.D.: Fast image captioning using LSTM. Cluster Comput. 22(3), 6143–6155 (2019)

    Article  Google Scholar 

  5. Haque, M.F., Lim, H.Y., Kang, D.S.: Object detection based on VGG with ResNet network. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. IEEE (2019)

    Google Scholar 

  6. Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

    Google Scholar 

  7. He, X., Deng, L.: Deep learning for image-to-text generation: a technical overview. IEEE Signal Process. Mag. 34(6), 109–116 (2017)

    Article  Google Scholar 

  8. Hoang, L.: An Evaluation of VGG16 and Yolo V3 on Hand-Drawn Images. University Honors These (2019)

    Google Scholar 

  9. Hossain, M.A., Sajib, M.S.A.: Classification of image using convolutional neural network (CNN). Glob. J. Comput. Sci. Technol. (2019)

    Google Scholar 

  10. Islam, S., Khan, S.I.A., Abedin, M.M., Habibullah, K.M., Das, A.K.: Bird species classification from an image using VGG-16 network. In: Proceedings of the 2019 7th International Conference on Computer and Communications Management, pp. 38–42 (2019)

    Google Scholar 

  11. Jain, P., Agrawal, P., Mishra, A., Sukhwani, M., Laha, A., Sankaranarayanan, K.: Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501 (2017)

  12. Kamavisdar, P., Saluja, S., Agrawal, S.: A survey on image classification approaches and techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(1), 1005–1009 (2013)

    Google Scholar 

  13. Khatri, C., et al.: Algorithmic content generation for products. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2945–2947. IEEE (2015)

    Google Scholar 

  14. Lakshminarasimhan Srinivasan, D.S., Amutha, A.: Image captioning-a deep learning approach. Int. J. Appl. Eng. Res. 13(9), 7239–7242 (2018)

    Google Scholar 

  15. Lee, C.: Image caption generation using recurrent neural network. J. KIISE 43(8), 878–882 (2016)

    Article  Google Scholar 

  16. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  17. Ordonez, V., et al.: Large scale retrieval and generation of image descriptions. Int. J. Comput. Vis. 119(1), 46–59 (2016)

    Article  MathSciNet  Google Scholar 

  18. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002). Story Generation from Images using Deep Learning1118

    Google Scholar 

  19. Pawade, D., Sakhapara, A., Jain, M., Jain, N., Gada, K.: Story scrambler-automatic text generation using word level RNN-LSTM. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 10(6), 44–53 (2018)

    Google Scholar 

  20. Rashid, M., Khan, M.A., Sharif, M., Raza, M., Sarfraz, M.M., Afza, F.: Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed. Tools Appl. 78(12), 15751–15777 (2019)

    Article  Google Scholar 

  21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  22. Ren, X., Guo, H., Li, S., Wang, S., Li, J.: A novel image classification method with CNN-XGBoost model. In: Kraetzer, C., Shi, Y.-Q., Dittmann, J., Kim, H.J. (eds.) IWDW 2017. LNCS, vol. 10431, pp. 378–390. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64185-0_28

    Chapter  Google Scholar 

  23. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Google Scholar 

  24. Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)

    Google Scholar 

  25. Salehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S.: Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078 (2017)

  26. Skovajsová, L.: Long short-term memory description and its application in text processing. In: 2017 Communication and Information Technologies (KIT), pp. 1–4. IEEE (2017)

    Google Scholar 

  27. Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019). https://doi.org/10.3390/app9102024

    Article  Google Scholar 

  28. Thomaidou, S., Lourentzou, I., Katsivelis-Perakis, P., Vazirgiannis, M.: Automated snippet generation for online advertising. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1841–1844 (2013)

    Google Scholar 

  29. Yin, X., Ordonez, V.: Obj2text: generating visually descriptive language from object layouts. arXiv preprint arXiv:1707.07102 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Abrar Alnami , Miada Almasre or Norah Al-Malki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alnami, A., Almasre, M., Al-Malki, N. (2021). Story Generation from Images Using Deep Learning. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88378-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88377-5

  • Online ISBN: 978-3-030-88378-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics