research-article

Deep Aesthetic Quality Assessment With Semantic Information

Authors:

Kaiqi HuangAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 26, Issue 3

Pages 1482 - 1495

https://doi.org/10.1109/TIP.2017.2651399

Published: 01 March 2017 Publication History

Abstract

Human beings often assess the aesthetic quality of an image coupled with the identification of the image’s semantic content. This paper addresses the correlation issue between automatic aesthetic quality assessment and semantic recognition. We cast the assessment problem as the main task among a multi-task deep model, and argue that semantic recognition task offers the key to address this problem. Based on convolutional neural networks, we employ a single and simple multi-task framework to efficiently utilize the supervision of aesthetic and semantic labels. A correlation item between these two tasks is further introduced to the framework by incorporating the inter-task relationship learning. This item not only provides some useful insight about the correlation but also improves assessment accuracy of the aesthetic task. In particular, an effective strategy is developed to keep a balance between the two tasks, which facilitates to optimize the parameters of the framework. Extensive experiments on the challenging Aesthetic Visual Analysis dataset and Photo.net dataset validate the importance of semantic recognition in aesthetic quality assessment, and demonstrate that multitask deep models can discover an effective aesthetic representation to achieve the state-of-the-art results.

References

[1]

R. Datta, J. Li, and J. Z. Wang, “ Algorithmic inferencing of aesthetics and emotion in natural images: An exposition,” in Proc. IEEE Int. Conf. Image Process., Oct. 2008, pp. 105–108.

[2]

D. Joshi et al., “ Aesthetics and emotions in images,” IEEE Signal Process. Mag., vol. Volume 28, no. Issue 5, pp. 94–115, 2011.

[3]

X. Tang, W. Luo, and X. Wang, “ Content-based photo quality assessment,” IEEE Trans. Multimedia, vol. Volume 15, no. Issue 8, pp. 1930–1943, 2013.

Digital Library

[4]

L. Marchesotti, N. Murray, and F. Perronnin, “ Discovering beautiful attributes for aesthetic image analysis,” Int. J. Comput. Vis., vol. Volume 113, no. Issue 3, pp. 246–266, 2015.

Digital Library

[5]

E. Siahaan, A. Hanjalic, and J. Redi, “ A reliable methodology to collect ground truth data of image aesthetic appeal,” IEEE Trans. Multimedia, vol. Volume 18, no. Issue 7, pp. 1338–1350, 2016.

Digital Library

[6]

C. Segalin, A. Perina, M. Cristani, and A. Vinciarelli, “ The pictures we like are our image: Continuous mapping of favorite pictures into self-assessed and attributed personality traits,” IEEE Trans. Affect. Comput., 2016,

[7]

J. Tarvainen, M. Sjöberg, S. Westman, J. Laaksonen, and P. Oittinen, “ Content-based prediction of movie style, aesthetics, and affect: Data set and baseline experiments,” IEEE Trans. Multimedia, vol. Volume 16, no. Issue 8, pp. 2085–2098, 2014.

[8]

T. S. Park and B. T. Zhang, “ Consensus analysis and modeling of visual aesthetic perception,” IEEE Trans. Affect. Comput., vol. Volume 6, no. Issue 3, pp. 272–285, 2015.

[9]

R. Datta, J. Li, and J. Z. Wang, “ Learning the consensus on visual quality for next-generation image management,” in Proc. ACM Int. Conf. Multimedia, 2007, pp. 533–536.

Digital Library

[10]

Y. Ke, X. Tang, and F. Jing, “ The design of high-level features for photo quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2006, pp. 419–426.

Digital Library

[11]

R. Hong, L. Zhang, and D. Tao, “ Unified photo enhancement by discovering aesthetic communities from Flickr,” IEEE Trans. Image Process., vol. Volume 25, no. Issue 3, pp. 1124–1135, 2016.

[12]

L. Zhang, Y. Gao, R. Ji, Y. Xia, Q. Dai, and X. Li, “ Actively learning human gaze shifting paths for semantics-aware photo cropping,” IEEE Trans. Image Process., vol. Volume 23, no. Issue 5, pp. 2235–2245, 2014.

[13]

R. Datta, D. Joshi, J. Li, and J. Z. Wang, “ Studying aesthetics in photographic images using a computational approach,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 288–301.

Digital Library

[14]

Y. Luo and X. Tang, “ Photo and video quality evaluation: Focusing on the subject,” in Proc. Eur. Conf. Comput. Vis., 2008, pp. 386–399.

Digital Library

[15]

S. Dhar, V. Ordonez, and T. L. Berg, “ High level describable attributes for predicting aesthetics and interestingness,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 1657–1664.

Digital Library

[16]

L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka, “ Assessing the aesthetic quality of photographs using generic image descriptors,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 1784–1791.

Digital Library

[17]

H.-H. Yeh, C.-Y. Yang, M.-S. Lee, and C.-S. Chen, “ Video aesthetic quality assessment by temporal integration of photo- and motion-based features,” IEEE Trans. Multimedia, vol. Volume 15, no. Issue 8, pp. 1944–1957, 2013.

Digital Library

[18]

Y. Wang, Q. Dai, R. Feng, and Y.-G. Jiang, “ Beauty is here: Evaluating aesthetics in videos using multimodal features and free training data,” in Proc. ACM Int. Conf. Multimedia, 2013, pp. 369–372.

Digital Library

[19]

L. Zhang, Y. Gao, R. Zimmermann, Q. Tian, and X. Li, “ Fusion of multichannel local and global structural cues for photo aesthetics evaluation,” IEEE Trans. Image Process., vol. Volume 23, no. Issue 3, pp. 1419–1429, 2014.

Digital Library

[20]

O. Wu, W. Hu, and J. Gao, “ Learning to predict the perceived visual quality of photos,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 225–232.

Digital Library

[21]

L. Zhang, Y. Gao, C. Zhang, H. Zhang, Q. Tian, and R. Zimmermann, “ Perception-guided multimodal feature fusion for photo aesthetics assessment,” in Proc. ACM Int. Conf. Multimedia, Nov. 2014, pp. 237–246.

Digital Library

[22]

M. Nishiyama, T. Okabe, I. Sato, and Y. Sato, “ Aesthetic quality classification of photographs based on color harmony,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 33–40.

Digital Library

[23]

K. Huang, Q. Wang, and Z. Wu, “ Color image enhancement and evaluation algorithm based on human visual system,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. Volume 3 . May 2004, pp. 721–724.

[24]

K.-Q. Huang, Z.-Y. Wu, G. S. K. Fung, and F. H. Y. Chan, “ Color image denoising with wavelet thresholding based on human visual system model,” Signal Process., Image Commun., vol. Volume 20, no. Issue 2, pp. 115–127, 2005.

[25]

K. Huang, Q. Wang, and Z.-Y. Wu, “ Natural color image enhancement and evaluation algorithm based on human visual system,” Comput. Vis. Image Understand., vol. Volume 103, no. Issue 1, pp. 52–63, 2006.

Digital Library

[26]

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “ Visual categorization with bags of keypoints,” in Proc. Workshop Statist. Learn. Comput. Vis. (ECCV), 2004, vol. Volume 1 . nos. Issue 1</issue>–<issue>22, pp. 1–2.

[27]

T. S. Jaakkola et al., “ Exploiting generative models in discriminative classifiers,” in Proc. Adv. Neural Inf. Process. Syst., 1999, pp. 487–493.

Digital Library

[28]

N. Murray, L. Marchesotti, and F. Perronnin, “ AVA: A large-scale database for aesthetic visual analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2408–2415.

Digital Library

[29]

L. Marchesotti, F. Perronnin, and F. Meylan, “ Learning beautiful (and ugly) attributes,” in Proc. Brit. Mach. Vis. Conf., vol. Volume 7, pp. 1–11, Sep. 2013.

[30]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

Digital Library

[31]

M. D. Zeiler and R. Fergus, “ Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.

[32]

X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “ Rapid: Rating pictorial aesthetics using deep learning,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 457–466.

Digital Library

[33]

Y. Kao, C. Wang, and K. Huang, “ Visual aesthetic quality assessment with a regression model,” in Proc. IEEE Int. Conf. Image Process., Sep. 2015, pp. 1583–1587.

[34]

X. Lu, Z. Lin, X. Shen, R. Mech, and J. Z. Wang, “ Deep multi-patch aggregation network for image style, aesthetics, and quality estimation,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 990–998.

Digital Library

[35]

L. Mai, H. Jin, and F. Liu, “ Composition-preserving deep photo aesthetics assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 497–506.

[36]

C. Mullin, G. Hayn-Leichsenring, and J. Wagemans, “ There is beauty in gist: An investigation of aesthetic perception in rapidly presented scenes,” J. Vis., vol. Volume 15, no. Issue 12, p. pp.123, 2015.

[37]

P. J. Locher, “<chapter-title>The aesthetic experience with visual art 'at first glance</chapter-title>,”' in Investigations Into the Phenomenology and the Ontology of the Work of Art, vol. Volume 81 . Springer International Publishing, 2015, pp. 75–88.

[38]

R. Caruana, “ Multitask learning,” Mach. Learn., vol. Volume 28, no. Issue 1, pp. 41–75, 1997.

Digital Library

[39]

J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim, “ Rotating your face using multi-task deep neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 676–684.

[40]

Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “ Facial landmark detection by deep multi-task learning,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 94–108.

[41]

W. Luo, X. Wang, and X. Tang, “ Content-based photo quality assessment,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2206–2213.

Digital Library

[42]

Y. Niu and F. Liu, “ What makes a professional video? A computational aesthetics approach,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 22, no. Issue 7, pp. 1037–1049, 2012.

Digital Library

[43]

C. Wang, W. Ren, K. Huang, and T. Tan, “ Weakly supervised object localization with latent category learning,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 431–445.

[44]

Y. Kao, K. Huang, and S. J. Maybank, “ Hierarchical aesthetic quality assessment using deep convolutional neural networks,” Image Commun., vol. Volume 47, pp. 500–510, 2016.

Digital Library

[45]

W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo, “ Multi-task deep visual-semantic embedding for video thumbnail selection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3707–3715.

[46]

A. H. Abdulnabi, G. Wang, J. Lu, and K. Jia, “ Multi-task cnn model for attribute prediction,” IEEE Trans. Multimedia, vol. Volume 17, no. Issue 11, pp. 1949–1959, 2015.

[47]

S. Zhang, R. He, Z. Sun, and T. Tan, “ Multi-task ConvNet for blind face inpainting with application to face verification,” in Proc. Int. Conf. Biometrics, Jun. 2016, pp. 1–8.

[48]

W. Zhang et al., “ Deep model based transfer and multi-task learning for biological image analysis,” in Proc. KDD, 2015, pp. 1475–1484.

Digital Library

[49]

X. Liu, J. Gao, X. He, L. Deng, K. Duh, and Y.-Y. Wang, “ Representation learning using multi-task deep neural networks for semantic classification and information retrieval,” in Proc. NAACL, 2015, pp. 1–8.

[50]

Y. Zhang and D. Y. Yeung, “ A convex formulation for learning task relationships in multi-task learning,” in Proc. Uncertainty Artif. Intell., 2010, pp. 733–742.

Digital Library

[51]

A. Saha, P. Rai, S. Venkatasubramanian, and H. Daume, “ Online learning of multiple tasks and their relationships,” in Proc. Int. Conf. Artif. Intell. Statist., 2011, pp. 643–651.

[52]

E. V. Bonilla, K. M. Chai, and C. Williams, “ Multi-task Gaussian process prediction,” in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 153–160.

Digital Library

[53]

A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, vol. Volume 104 . Boca Raton, FL, USA: CRC Press, 1999.

[54]

R. Girshick, “ Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 1440–1448.

Digital Library

[55]

J. Long, E. Shelhamer, and T. Darrell, “ Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3431–3440.

[56]

K. Simonyan and A. Zisserman. (2015). “<chapter-title>Very deep convolutional networks for large-scale image recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1409.1556

[57]

K. He, X. Zhang, S. Ren, and J. Sun. (2015). “<chapter-title>Deep residual learning for image recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1512.03385

[58]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.

[59]

S. J. Pan and Q. Yang, “ A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. Volume 22, no. Issue 10, pp. 1345–1359, 2010.

Digital Library

[60]

J. Donahue et al. (2013). “<chapter-title>DeCAF: A deep convolutional activation feature for generic visual recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1310.1531

[61]

X. Fan, A. Felsovalyi, S. A. Sivo, and S. C. Keenan, “ SAS for Monte Carlo studies,” in Proc. SAS Inst., Cary, 2002, pp. 87–89.

[62]

F. Perronnin and C. Dance, “ Fisher kernels on visual vocabularies for image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8.

[63]

C.-C. Chang and C.-J. Lin, “ LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. Volume 2, no. Issue 3, pp. 27-1–27-27, 2011. {Online}. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm

Digital Library

[64]

A. Oliva and A. Torralba, “ Modeling the shape of the scene: A holistic representation of the spatial envelope,” Int. J. Comput. Vis., vol. Volume 42, no. Issue 3, pp. 145–175, 2001.

Digital Library

Cited By

Santos ICasal MCorreia JTorrente-Patiño ÁMachado PRomero J(2024)Towards Robust Evaluation of Aesthetic and Photographic Quality MetricsComplexity10.1155/2024/82235862024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/8223586
Yang HWang GLiu YShi PZhou XJin XJin CHe LSong MWang R(2024)Text-guided Multi-Task Image Aesthetic Quality AssessmentProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690176(11-19)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688867.3690176
Huang HJin XLiu YLou HXiao CCui SLi XZou DJin CHe LSong MWang R(2024)Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score LabelsProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690174(63-71)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688867.3690174
Show More Cited By

Recommendations

Text-guided Multi-Task Image Aesthetic Quality Assessment
McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

In the realm of image aesthetic quality assessment, additional tagging information, such as scene classification, photographic style, and aesthetic attributes, embodies a wealth of aesthetic connotations. The textual descriptions and visual features ...
Hierarchical aesthetic quality assessment using deep convolutional neural networks

Aesthetic image analysis has attracted much attention in recent years. However, assessing the aesthetic quality and assigning an aesthetic score are challenging problems. In this paper, we propose a novel framework for assessing the aesthetic quality of ...
Query-Dependent Aesthetic Model With Deep Learning for Photo Quality Assessment
The automatic assessment of photo quality from an aesthetic perspective is a very challenging problem. Most existing research has predominantly focused on the learning of a universal aesthetic model based on hand-crafted visual descriptors . However, this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 26, Issue 3

March 2017

473 pages

ISSN:1057-7149

Issue’s Table of Contents

Copyright © 2017.

Publisher

IEEE Press

Publication History

Published: 01 March 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Santos ICasal MCorreia JTorrente-Patiño ÁMachado PRomero J(2024)Towards Robust Evaluation of Aesthetic and Photographic Quality MetricsComplexity10.1155/2024/82235862024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/8223586
Yang HWang GLiu YShi PZhou XJin XJin CHe LSong MWang R(2024)Text-guided Multi-Task Image Aesthetic Quality AssessmentProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690176(11-19)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688867.3690176
Huang HJin XLiu YLou HXiao CCui SLi XZou DJin CHe LSong MWang R(2024)Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score LabelsProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690174(63-71)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688867.3690174
Shi TChen CWu ZHao AFang Y(2024)Improving Image Aesthetic Assessment via Multiple Image Joint LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368712820:11(1-24)Online publication date: 21-Aug-2024
https://dl.acm.org/doi/10.1145/3687128
Zhu HShi JShao ZYao RZhou YZhao JLi LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681175(2399-2408)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681175
Huang YLi LChen PWu JYang YLi YShi G(2024)Coarse-to-Fine Image Aesthetics Assessment With Dynamic Attribute SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.338945226(9316-9329)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3389452
Shu YLi QLiu LXu G(2024)Semi-Supervised Adversarial Learning for Attribute-Aware Photo Aesthetic AssessmentIEEE Transactions on Multimedia10.1109/TMM.2021.311770926(4086-4096)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2021.3117709
Zhu TLi LChen PWu JYang YLi Y(2024)Emotion-aware hierarchical interaction network for multimodal image aesthetics assessmentPattern Recognition10.1016/j.patcog.2024.110584154:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110584
Zhang XXiao YPeng JGao XHu B(2024)Confidence-based dynamic cross-modal memory network for image aesthetic assessmentPattern Recognition10.1016/j.patcog.2023.110227149:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.patcog.2023.110227
Debnath SRoy RChangder S(2024)A novel approach using deep convolutional neural network to classify the photographs based on leading-line by fine-tuning the pre-trained VGG16 neural networkMultimedia Tools and Applications10.1007/s11042-022-13338-583:1(3189-3214)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-022-13338-5
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents