Abstract
Our goal is to promote an effective image aesthetics assessment (IAA) model. In the current Internet era, it has become easier to obtain the text description of an image. With the dual-modal support of image and text, the image aesthetics assessment model will further reflect its superiority. To this end, we design a multimodal feature-driven guided image aesthetics assessment model (MFD). Firstly, multi-modal features are extracted through the feature extraction sub-network, including image-driven aesthetic features and content features, as well as text-driven semantic features. Each feature captures the implicit characteristics of different levels of human brain object analysis. Secondly, these multi-modal features are combined to form multi-modal combination features that contain multiple characteristics. Finally, the obtained multi-modal are combined for aesthetic assessment prediction. Experimental results on public image aesthetics assessment databases demonstrate the superiority of our model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ji, Z., Chen, K., He, Y., et al.: Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval. Sci. China Inf. Sci. 65(7), 172104 (2022)
Wang, L., Wang, X., Yamasaki, T.: Image aesthetics prediction using multiple patches preserving the original aspect ratio of contents. Multimed. Tools Appl. 82, 2783–2804 (2023)
Mei, S., Geng, Y., Hou, J., et al.: Learning hyperspectral images from RGB images via a coarse-to-fine CNN. Sci. China Inf. Sci. 65, 1–14 (2022)
Cheng, G., Lai, P., Gao, D., et al.: Class attention network for image recognition. Sci. China Inf. Sci. 66(3), 132105 (2023)
Pandit, A., Animesh, Gautam, B.K., Agarwal, R.: Image aesthetic score prediction using image captioning. In: Kumar, A., Mozar, S., Haase, J. (eds.) Advances in Cognitive Science and Communications, ICCCE 2023. Cognitive Science and Technology. Springer, Singapore (2023)
Ke, J., et al.: VILA: learning image aesthetics from user comments with vision-language pretraining. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June, pp. 10041–10051 (2023)
Wang, J., Li, Y., Pan, Y., et al.: Contextual and selective attention networks for image captioning. Sci. China Inf. Sci. 65(12), 222103 (2022)
Yue, Y., Zou, Q., Yu, H., et al.: An end-to-end network for co-saliency detection in one single image. Sci. China Inf. Sci. 66(11), 1–18 (2023)
Zhang, X., Gao, X., He, L., Lu, W.: MSCAN: multimodal self-and-collaborative attention Network for image aesthetic prediction tasks. Neurocomputing 430, 14–23 (2021)
Yu, H., Wu, J.: A unified pruning framework for vision transformers. Sci. China Inf. Sci. 66(7), 1–2 (2023)
Yan, P., Liu, X., Zhang, P., et al.: Learning convolutional multi-level transformers for image-based person re-identification. Visual Intell. 1(1), 24 (2023)
Cui, Y., Jiang, G., Yu, M., et al.: Stitched wide field of view light field image quality assessment: benchmark database and objective metric. IEEE Trans. Multimed. Early Access (2023). https://doi.org/10.1109/TMM.2023.3330096
Chen, B., Fu, H., Chen, X., et al.: NeuralReshaper: single-image human-body retouching with deep neural networks. arXiv preprint arXiv:2203.10496 (2022)
Du, B., Du, C., Yu, L.: MEGF-Net: multi-exposure generation and fusion network for vehicle detection under dim light conditions. Vis. Intell. 1(1), 1–13 (2023)
Guo, G., Han, L., Wang, L., et al.: Semantic-aware knowledge distillation with parameter-free feature uniformization. Vis. Intell. 1(1), 6 (2023)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Mai, L., Jin, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 497–506, December 2016
Zeng, H., Zhang, L., Bovik, A.C.: A probabilistic quality representation approach to deep blind image quality prediction. In: arXiv (2017)
Liu, L., Guo, X., Bai, R., Li, W.: Image aesthetic assessment based on attention mechanisms and holistic nested edge detection. In: Proceedings - 2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering, ARACE 2022, pp. 70–75 (2022)
Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training Diffusion Models with Reinforcement Learning. In: arXiv (2023)
Valenzise, G., Kang, C., Dufaux, F.: Advances and challenges in computational image aesthetics. In: Ionescu, B., Bainbridge, W.A., Murray, N. (eds) Human Perception of Visual Information. Springer, Cham (2022)
Biswas, K., Shivakumara, P., Pal, U., et al.: Classification of aesthetic natural scene images using statistical and semantic features. Multimed. Tools Appl. 82, 13507–13532 (2023)
Jang, H., Lee, Y., Lee, J.-S.: Modeling, Quantifying, and Predicting Subjectivity of Image Aesthetics. In: arXiv (2022)
Zhu, T., Li, L., Chen, P., Wu, J., Yang, Y., Li, Y., Guo, Y.: Attribute-assisted multimodal network for image aesthetics assessment. In: Proceedings of IEEE International Conference on Multimedia and Expo, 2023-July, pp. 2477–2482 (2023)
Withöft, A., Abdenebaoui, L., Boll, S.: ILMICA - interactive learning model of image collage assessment: a transfer learning approach for aesthetic principles. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol. 13142. Springer, Cham (2022)
Li, K., Guo, D., Wang, M.: ViGT: proposal-free video grounding with a learnable token in the transformer. Sci. China Inf. Sci. 66(10), 202102 (2023)
Ramachandram, D., Taylor, G.W.: Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Process. Mag. 34(6), 96–108 (2017)
Zhu, W., Wang, X., Li, H.: Multi-modal deep analysis for multimedia. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3740–3764 (2019)
Talebi, H., Milanfar, P.: Nima: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Datta, R., Joshi, D., Li, J., Wang, J. Z.: Studying aesthetics in photographic images using a computational approach. In: Proceedings of the European Conference on Computer Vision, pp. 288–301. Springer (2006)
Wong, L.-K., Low, K.-L.: Saliency-enhanced image aesthetics class prediction. In: Proceedings of the IEEE International Conference on Image Processing, pp. 997–1000 (2009)
Lu, X., Lin, Z., Shen, X., Mech, R., Wang, J.Z.: Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 990–998. 1 (2015)
Kao, Y., Huang, K., Maybank, S.: Hierarchical aesthetic quality assessment using deep convolutional neural networks. In: Signal Processing: Image Communication, vol. 47, pp. 500–510 (2016)
Mai, L., Jin, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506 (2016)
Zeng, H., Zhang, L., Bovik, A.C.: A probabilistic quality representation approach to deep blind image quality prediction. In: arXiv preprint arXiv:1708.08190, 2017. 1 (2017)
Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. In: European Conference on Computer Vision, pp. 662–679. Springer (2016), 1, 6, 7
Ma, S., Liu, J., Chen, C. W.: A-lamp: Adaptive layout-aware multipatch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017). 1, 6, 7
Li, L., Zhu, H., Zhao, S., Ding, G., Lin, W.: Personality-assisted multitask learning for generic and personalized image aesthetics assessment. In: Proceedings of IEEE Transactions on Image Processing, vol. 29, pp. 3898–3910 (2020)
Li, L., Zhi, T., Shi, G., Yang, Y., Xu, L., Li, Y., Guo, Y.: Anchor-based knowledge embedding for image aesthetics assessment. In: Proceedings of Neurocomputing (2023)
She, D., Lai, Y. K., Yi, G., et al.: Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8475–8484 (2021)
Hosu, V., Goldlucke, B., Saupe, D.: Effective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9375–9383 (2019)
He, S., Zhang, Y., Xie, R., et al.: Rethinking image aesthetics assessment: Models, datasets, and benchmarks. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 2022, pp. 942–948 (2022)
Li, L., Huang, Y., Wu, J., et al.: Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans. Circuits Syst. Video Technol. (2023)
Zhang, R., Zhang, Z., Li, M., Ma, W.-Y., Zhang, H.-J.: A probabilistic semantic model for image annotation and multimodal image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 1, pp. 846–851. IEEE (2005)
Wu, Q., Wang, Z., Deng, F., Chi, Z., Feng, D.D.: Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans. Syst. Man Cybern. Syst. 43(4), 875–885 (2013)
He, X., Peng, Y.: Fine-grained image classification via combining vision and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5994–6002 (2017)
Zhang, X., Gao, X., Lu, W., He, L., Li, J.: Beyond vision: a multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Trans. Multimed. 23, 611–623 (2021)
Miao, H., Zhang, Y., Wang, D., Feng, S.: Multimodal aesthetic analysis assisted by styles through a multimodal co-transformer model. In: Proceedings of the IEEE 24th International Conference on Computational Science and Engineering (CSE), 2021 (2021)
Zhu, T., Li, L., Yang, J., Zhao, S., Liu, H., Qian, J.: Multimodal sentiment analysis with image-text interaction network. IEEE Trans. Multimed., 1–12 (2022)
Li, L., Zhu, T., Chen, P., Yang, Y., Li, Y., Lin, W.: Image aesthetics assessment with attribute-assisted multimodal memory network. IEEE Trans. Circuits Syst. Video Technol., 1 (2023)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, pp. 4171–4186 (2019)
Tan, M., Le, Q.: Efficientnetv2: smaller models and faster training. In: International Conference on Machine Learning. PMLR, 2021, pp. 10096–10106 (2021)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Fan, H., Wu, Y., et al.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Murray, N., Marchesotti, L., Perronnin, F.: AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408–2415. IEEE (2012)
Zhou, Y., Lu, X., Zhang, J., et al.: Joint image and text representation for aesthetics analysis. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 262–266 (2016)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Acknowledgments
This work is supported by Liaoning Province Natural Science Foundation under Grant 2023-MS-139, Shenyang science and technology plan project under Grant 23-407-3-32 and National Natural Science Foundation of China under Grant 61901205.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, Y., Wen, Z., Li, S., Fan, D., Zhai, G. (2024). Image Aesthetics Assessment Based on Visual Perception and Textual Semantic Understanding. In: Zhai, G., Zhou, J., Ye, L., Yang, H., An, P., Yang, X. (eds) Digital Multimedia Communications. IFTC 2023. Communications in Computer and Information Science, vol 2067. Springer, Singapore. https://doi.org/10.1007/978-981-97-3626-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-97-3626-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3625-6
Online ISBN: 978-981-97-3626-3
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)