Abstract
Recently, the problem of creating descriptive captions for images became a significant one. However, human languages’ expressivity had been among the challenges that hindered researchers from widely experimenting with creating linguistically rich captions for images. That motivated us to utilize advanced deep learning algorithms to generate captions for images. The researchers proposed an AI model utilizing deep learning and natural language processing algorithms, which has two main components, an image-feature extractor, and a story generator. The researchers trained the first component (image-feature extractor) of the model to predict object names in images. The second component (story-generator) was trained on a custom short descriptive sentence which considered short stories. So, the output from the first component (list of words) will be entered into the second component to generate stories on input images. Thus, when testing the model’s performance, a list of names will be entered from the first component so that the second generator arranges them and generates a short story from them. The proposed model developed could generate a short story expressive of an input image as shown by the results of a logical value used on the BLEU scale of 0.59, which further research is planned to improve.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amritkar, C., Jabade, V.: Image caption generation using deep learning technique. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–4. IEEE (2018)
Chu, W.T., Guo, H.J.: Movie genre classification based on poster images with deep neural networks. In: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes, pp. 39–45 (2017)
Ganegedara, T.: Natural Language Processing with TensorFlow: Teach Language to Machines Using Python’s Deep Learning Library. Packt Publishing Ltd. (2018)
Han, M., Chen, W., Moges, A.D.: Fast image captioning using LSTM. Cluster Comput. 22(3), 6143–6155 (2019)
Haque, M.F., Lim, H.Y., Kang, D.S.: Object detection based on VGG with ResNet network. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. IEEE (2019)
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
He, X., Deng, L.: Deep learning for image-to-text generation: a technical overview. IEEE Signal Process. Mag. 34(6), 109–116 (2017)
Hoang, L.: An Evaluation of VGG16 and Yolo V3 on Hand-Drawn Images. University Honors These (2019)
Hossain, M.A., Sajib, M.S.A.: Classification of image using convolutional neural network (CNN). Glob. J. Comput. Sci. Technol. (2019)
Islam, S., Khan, S.I.A., Abedin, M.M., Habibullah, K.M., Das, A.K.: Bird species classification from an image using VGG-16 network. In: Proceedings of the 2019 7th International Conference on Computer and Communications Management, pp. 38–42 (2019)
Jain, P., Agrawal, P., Mishra, A., Sukhwani, M., Laha, A., Sankaranarayanan, K.: Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501 (2017)
Kamavisdar, P., Saluja, S., Agrawal, S.: A survey on image classification approaches and techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(1), 1005–1009 (2013)
Khatri, C., et al.: Algorithmic content generation for products. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2945–2947. IEEE (2015)
Lakshminarasimhan Srinivasan, D.S., Amutha, A.: Image captioning-a deep learning approach. Int. J. Appl. Eng. Res. 13(9), 7239–7242 (2018)
Lee, C.: Image caption generation using recurrent neural network. J. KIISE 43(8), 878–882 (2016)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Ordonez, V., et al.: Large scale retrieval and generation of image descriptions. Int. J. Comput. Vis. 119(1), 46–59 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002). Story Generation from Images using Deep Learning1118
Pawade, D., Sakhapara, A., Jain, M., Jain, N., Gada, K.: Story scrambler-automatic text generation using word level RNN-LSTM. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 10(6), 44–53 (2018)
Rashid, M., Khan, M.A., Sharif, M., Raza, M., Sarfraz, M.M., Afza, F.: Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed. Tools Appl. 78(12), 15751–15777 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, X., Guo, H., Li, S., Wang, S., Li, J.: A novel image classification method with CNN-XGBoost model. In: Kraetzer, C., Shi, Y.-Q., Dittmann, J., Kim, H.J. (eds.) IWDW 2017. LNCS, vol. 10431, pp. 378–390. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64185-0_28
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S.: Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078 (2017)
Skovajsová, L.: Long short-term memory description and its application in text processing. In: 2017 Communication and Information Technologies (KIT), pp. 1–4. IEEE (2017)
Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019). https://doi.org/10.3390/app9102024
Thomaidou, S., Lourentzou, I., Katsivelis-Perakis, P., Vazirgiannis, M.: Automated snippet generation for online advertising. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1841–1844 (2013)
Yin, X., Ordonez, V.: Obj2text: generating visually descriptive language from object layouts. arXiv preprint arXiv:1707.07102 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Alnami, A., Almasre, M., Al-Malki, N. (2021). Story Generation from Images Using Deep Learning. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-88378-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88377-5
Online ISBN: 978-3-030-88378-2
eBook Packages: Computer ScienceComputer Science (R0)