Nothing Special   »   [go: up one dir, main page]

Skip to main content

Image Aesthetics Assessment Based on Visual Perception and Textual Semantic Understanding

  • Conference paper
  • First Online:
Digital Multimedia Communications (IFTC 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2067))

  • 226 Accesses

Abstract

Our goal is to promote an effective image aesthetics assessment (IAA) model. In the current Internet era, it has become easier to obtain the text description of an image. With the dual-modal support of image and text, the image aesthetics assessment model will further reflect its superiority. To this end, we design a multimodal feature-driven guided image aesthetics assessment model (MFD). Firstly, multi-modal features are extracted through the feature extraction sub-network, including image-driven aesthetic features and content features, as well as text-driven semantic features. Each feature captures the implicit characteristics of different levels of human brain object analysis. Secondly, these multi-modal features are combined to form multi-modal combination features that contain multiple characteristics. Finally, the obtained multi-modal are combined for aesthetic assessment prediction. Experimental results on public image aesthetics assessment databases demonstrate the superiority of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ji, Z., Chen, K., He, Y., et al.: Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval. Sci. China Inf. Sci. 65(7), 172104 (2022)

    Article  Google Scholar 

  2. Wang, L., Wang, X., Yamasaki, T.: Image aesthetics prediction using multiple patches preserving the original aspect ratio of contents. Multimed. Tools Appl. 82, 2783–2804 (2023)

    Google Scholar 

  3. Mei, S., Geng, Y., Hou, J., et al.: Learning hyperspectral images from RGB images via a coarse-to-fine CNN. Sci. China Inf. Sci. 65, 1–14 (2022)

    Article  Google Scholar 

  4. Cheng, G., Lai, P., Gao, D., et al.: Class attention network for image recognition. Sci. China Inf. Sci. 66(3), 132105 (2023)

    Article  Google Scholar 

  5. Pandit, A., Animesh, Gautam, B.K., Agarwal, R.: Image aesthetic score prediction using image captioning. In: Kumar, A., Mozar, S., Haase, J. (eds.) Advances in Cognitive Science and Communications, ICCCE 2023. Cognitive Science and Technology. Springer, Singapore (2023)

    Google Scholar 

  6. Ke, J., et al.: VILA: learning image aesthetics from user comments with vision-language pretraining. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June, pp. 10041–10051 (2023)

    Google Scholar 

  7. Wang, J., Li, Y., Pan, Y., et al.: Contextual and selective attention networks for image captioning. Sci. China Inf. Sci. 65(12), 222103 (2022)

    Article  Google Scholar 

  8. Yue, Y., Zou, Q., Yu, H., et al.: An end-to-end network for co-saliency detection in one single image. Sci. China Inf. Sci. 66(11), 1–18 (2023)

    Article  Google Scholar 

  9. Zhang, X., Gao, X., He, L., Lu, W.: MSCAN: multimodal self-and-collaborative attention Network for image aesthetic prediction tasks. Neurocomputing 430, 14–23 (2021)

    Google Scholar 

  10. Yu, H., Wu, J.: A unified pruning framework for vision transformers. Sci. China Inf. Sci. 66(7), 1–2 (2023)

    Article  Google Scholar 

  11. Yan, P., Liu, X., Zhang, P., et al.: Learning convolutional multi-level transformers for image-based person re-identification. Visual Intell. 1(1), 24 (2023)

    Google Scholar 

  12. Cui, Y., Jiang, G., Yu, M., et al.: Stitched wide field of view light field image quality assessment: benchmark database and objective metric. IEEE Trans. Multimed. Early Access (2023). https://doi.org/10.1109/TMM.2023.3330096

  13. Chen, B., Fu, H., Chen, X., et al.: NeuralReshaper: single-image human-body retouching with deep neural networks. arXiv preprint arXiv:2203.10496 (2022)

  14. Du, B., Du, C., Yu, L.: MEGF-Net: multi-exposure generation and fusion network for vehicle detection under dim light conditions. Vis. Intell. 1(1), 1–13 (2023)

    Article  Google Scholar 

  15. Guo, G., Han, L., Wang, L., et al.: Semantic-aware knowledge distillation with parameter-free feature uniformization. Vis. Intell. 1(1), 6 (2023)

    Article  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  18. Mai, L., Jin, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 497–506, December 2016

    Google Scholar 

  19. Zeng, H., Zhang, L., Bovik, A.C.: A probabilistic quality representation approach to deep blind image quality prediction. In: arXiv (2017)

    Google Scholar 

  20. Liu, L., Guo, X., Bai, R., Li, W.: Image aesthetic assessment based on attention mechanisms and holistic nested edge detection. In: Proceedings - 2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering, ARACE 2022, pp. 70–75 (2022)

    Google Scholar 

  21. Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training Diffusion Models with Reinforcement Learning. In: arXiv (2023)

    Google Scholar 

  22. Valenzise, G., Kang, C., Dufaux, F.: Advances and challenges in computational image aesthetics. In: Ionescu, B., Bainbridge, W.A., Murray, N. (eds) Human Perception of Visual Information. Springer, Cham (2022)

    Google Scholar 

  23. Biswas, K., Shivakumara, P., Pal, U., et al.: Classification of aesthetic natural scene images using statistical and semantic features. Multimed. Tools Appl. 82, 13507–13532 (2023)

    Google Scholar 

  24. Jang, H., Lee, Y., Lee, J.-S.: Modeling, Quantifying, and Predicting Subjectivity of Image Aesthetics. In: arXiv (2022)

    Google Scholar 

  25. Zhu, T., Li, L., Chen, P., Wu, J., Yang, Y., Li, Y., Guo, Y.: Attribute-assisted multimodal network for image aesthetics assessment. In: Proceedings of IEEE International Conference on Multimedia and Expo, 2023-July, pp. 2477–2482 (2023)

    Google Scholar 

  26. Withöft, A., Abdenebaoui, L., Boll, S.: ILMICA - interactive learning model of image collage assessment: a transfer learning approach for aesthetic principles. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol. 13142. Springer, Cham (2022)

    Google Scholar 

  27. Li, K., Guo, D., Wang, M.: ViGT: proposal-free video grounding with a learnable token in the transformer. Sci. China Inf. Sci. 66(10), 202102 (2023)

    Article  Google Scholar 

  28. Ramachandram, D., Taylor, G.W.: Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process. Mag. 34(6), 96–108 (2017)

    Google Scholar 

  29. Zhu, W., Wang, X., Li, H.: Multi-modal deep analysis for multimedia. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3740–3764 (2019)

    Google Scholar 

  30. Talebi, H., Milanfar, P.: Nima: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)

    Google Scholar 

  31. Datta, R., Joshi, D., Li, J., Wang, J. Z.: Studying aesthetics in photographic images using a computational approach. In: Proceedings of the European Conference on Computer Vision, pp. 288–301. Springer (2006)

    Google Scholar 

  32. Wong, L.-K., Low, K.-L.: Saliency-enhanced image aesthetics class prediction. In: Proceedings of the IEEE International Conference on Image Processing, pp. 997–1000 (2009)

    Google Scholar 

  33. Lu, X., Lin, Z., Shen, X., Mech, R., Wang, J.Z.: Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 990–998. 1 (2015)

    Google Scholar 

  34. Kao, Y., Huang, K., Maybank, S.: Hierarchical aesthetic quality assessment using deep convolutional neural networks. In: Signal Processing: Image Communication, vol. 47, pp. 500–510 (2016)

    Google Scholar 

  35. Mai, L., Jin, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506 (2016)

    Google Scholar 

  36. Zeng, H., Zhang, L., Bovik, A.C.: A probabilistic quality representation approach to deep blind image quality prediction. In: arXiv preprint arXiv:1708.08190, 2017. 1 (2017)

  37. Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. In: European Conference on Computer Vision, pp. 662–679. Springer (2016), 1, 6, 7

    Google Scholar 

  38. Ma, S., Liu, J., Chen, C. W.: A-lamp: Adaptive layout-aware multipatch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017). 1, 6, 7

    Google Scholar 

  39. Li, L., Zhu, H., Zhao, S., Ding, G., Lin, W.: Personality-assisted multitask learning for generic and personalized image aesthetics assessment. In: Proceedings of IEEE Transactions on Image Processing, vol. 29, pp. 3898–3910 (2020)

    Google Scholar 

  40. Li, L., Zhi, T., Shi, G., Yang, Y., Xu, L., Li, Y., Guo, Y.: Anchor-based knowledge embedding for image aesthetics assessment. In: Proceedings of Neurocomputing (2023)

    Google Scholar 

  41. She, D., Lai, Y. K., Yi, G., et al.: Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8475–8484 (2021)

    Google Scholar 

  42. Hosu, V., Goldlucke, B., Saupe, D.: Effective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9375–9383 (2019)

    Google Scholar 

  43. He, S., Zhang, Y., Xie, R., et al.: Rethinking image aesthetics assessment: Models, datasets, and benchmarks. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 2022, pp. 942–948 (2022)

    Google Scholar 

  44. Li, L., Huang, Y., Wu, J., et al.: Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans. Circuits Syst. Video Technol. (2023)

    Google Scholar 

  45. Zhang, R., Zhang, Z., Li, M., Ma, W.-Y., Zhang, H.-J.: A probabilistic semantic model for image annotation and multimodal image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 1, pp. 846–851. IEEE (2005)

    Google Scholar 

  46. Wu, Q., Wang, Z., Deng, F., Chi, Z., Feng, D.D.: Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans. Syst. Man Cybern. Syst. 43(4), 875–885 (2013)

    Google Scholar 

  47. He, X., Peng, Y.: Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5994–6002 (2017)

    Google Scholar 

  48. Zhang, X., Gao, X., Lu, W., He, L., Li, J.: Beyond vision: a multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Trans. Multimed. 23, 611–623 (2021)

    Google Scholar 

  49. Miao, H., Zhang, Y., Wang, D., Feng, S.: Multimodal aesthetic analysis assisted by styles through a multimodal co-transformer model. In: Proceedings of the IEEE 24th International Conference on Computational Science and Engineering (CSE), 2021 (2021)

    Google Scholar 

  50. Zhu, T., Li, L., Yang, J., Zhao, S., Liu, H., Qian, J.: Multimodal sentiment analysis with image-text interaction network. IEEE Trans. Multimed., 1–12 (2022)

    Google Scholar 

  51. Li, L., Zhu, T., Chen, P., Yang, Y., Li, Y., Lin, W.: Image aesthetics assessment with attribute-assisted multimodal memory network. IEEE Trans. Circuits Syst. Video Technol., 1 (2023)

    Google Scholar 

  52. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  53. Tan, M., Le, Q.: Efficientnetv2: smaller models and faster training. In: International Conference on Machine Learning. PMLR, 2021, pp. 10096–10106 (2021)

    Google Scholar 

  54. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  55. He, K., Fan, H., Wu, Y., et al.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  56. Murray, N., Marchesotti, L., Perronnin, F.: AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408–2415. IEEE (2012)

    Google Scholar 

  57. Zhou, Y., Lu, X., Zhang, J., et al.: Joint image and text representation for aesthetics analysis. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 262–266 (2016)

    Google Scholar 

  58. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  59. Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Liaoning Province Natural Science Foundation under Grant 2023-MS-139, Shenyang science and technology plan project under Grant 23-407-3-32 and National Natural Science Foundation of China under Grant 61901205.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhipeng Wen , Sifan Li or Daoxin Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Wen, Z., Li, S., Fan, D., Zhai, G. (2024). Image Aesthetics Assessment Based on Visual Perception and Textual Semantic Understanding. In: Zhai, G., Zhou, J., Ye, L., Yang, H., An, P., Yang, X. (eds) Digital Multimedia Communications. IFTC 2023. Communications in Computer and Information Science, vol 2067. Springer, Singapore. https://doi.org/10.1007/978-981-97-3626-3_4

Download citation

Publish with us

Policies and ethics