Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Deep Aesthetic Quality Assessment With Semantic Information

Published: 01 March 2017 Publication History

Abstract

Human beings often assess the aesthetic quality of an image coupled with the identification of the image’s semantic content. This paper addresses the correlation issue between automatic aesthetic quality assessment and semantic recognition. We cast the assessment problem as the main task among a multi-task deep model, and argue that semantic recognition task offers the key to address this problem. Based on convolutional neural networks, we employ a single and simple multi-task framework to efficiently utilize the supervision of aesthetic and semantic labels. A correlation item between these two tasks is further introduced to the framework by incorporating the inter-task relationship learning. This item not only provides some useful insight about the correlation but also improves assessment accuracy of the aesthetic task. In particular, an effective strategy is developed to keep a balance between the two tasks, which facilitates to optimize the parameters of the framework. Extensive experiments on the challenging Aesthetic Visual Analysis dataset and Photo.net dataset validate the importance of semantic recognition in aesthetic quality assessment, and demonstrate that multitask deep models can discover an effective aesthetic representation to achieve the state-of-the-art results.

References

[1]
R. Datta, J. Li, and J. Z. Wang, “ Algorithmic inferencing of aesthetics and emotion in natural images: An exposition,” in Proc. IEEE Int. Conf. Image Process., Oct. 2008, pp. 105–108.
[2]
D. Joshi et al., “ Aesthetics and emotions in images,” IEEE Signal Process. Mag., vol. Volume 28, no. Issue 5, pp. 94–115, 2011.
[3]
X. Tang, W. Luo, and X. Wang, “ Content-based photo quality assessment,” IEEE Trans. Multimedia, vol. Volume 15, no. Issue 8, pp. 1930–1943, 2013.
[4]
L. Marchesotti, N. Murray, and F. Perronnin, “ Discovering beautiful attributes for aesthetic image analysis,” Int. J. Comput. Vis., vol. Volume 113, no. Issue 3, pp. 246–266, 2015.
[5]
E. Siahaan, A. Hanjalic, and J. Redi, “ A reliable methodology to collect ground truth data of image aesthetic appeal,” IEEE Trans. Multimedia, vol. Volume 18, no. Issue 7, pp. 1338–1350, 2016.
[6]
C. Segalin, A. Perina, M. Cristani, and A. Vinciarelli, “ The pictures we like are our image: Continuous mapping of favorite pictures into self-assessed and attributed personality traits,” IEEE Trans. Affect. Comput., 2016,
[7]
J. Tarvainen, M. Sjöberg, S. Westman, J. Laaksonen, and P. Oittinen, “ Content-based prediction of movie style, aesthetics, and affect: Data set and baseline experiments,” IEEE Trans. Multimedia, vol. Volume 16, no. Issue 8, pp. 2085–2098, 2014.
[8]
T. S. Park and B. T. Zhang, “ Consensus analysis and modeling of visual aesthetic perception,” IEEE Trans. Affect. Comput., vol. Volume 6, no. Issue 3, pp. 272–285, 2015.
[9]
R. Datta, J. Li, and J. Z. Wang, “ Learning the consensus on visual quality for next-generation image management,” in Proc. ACM Int. Conf. Multimedia, 2007, pp. 533–536.
[10]
Y. Ke, X. Tang, and F. Jing, “ The design of high-level features for photo quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2006, pp. 419–426.
[11]
R. Hong, L. Zhang, and D. Tao, “ Unified photo enhancement by discovering aesthetic communities from Flickr,” IEEE Trans. Image Process., vol. Volume 25, no. Issue 3, pp. 1124–1135, 2016.
[12]
L. Zhang, Y. Gao, R. Ji, Y. Xia, Q. Dai, and X. Li, “ Actively learning human gaze shifting paths for semantics-aware photo cropping,” IEEE Trans. Image Process., vol. Volume 23, no. Issue 5, pp. 2235–2245, 2014.
[13]
R. Datta, D. Joshi, J. Li, and J. Z. Wang, “ Studying aesthetics in photographic images using a computational approach,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 288–301.
[14]
Y. Luo and X. Tang, “ Photo and video quality evaluation: Focusing on the subject,” in Proc. Eur. Conf. Comput. Vis., 2008, pp. 386–399.
[15]
S. Dhar, V. Ordonez, and T. L. Berg, “ High level describable attributes for predicting aesthetics and interestingness,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 1657–1664.
[16]
L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka, “ Assessing the aesthetic quality of photographs using generic image descriptors,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 1784–1791.
[17]
H.-H. Yeh, C.-Y. Yang, M.-S. Lee, and C.-S. Chen, “ Video aesthetic quality assessment by temporal integration of photo- and motion-based features,” IEEE Trans. Multimedia, vol. Volume 15, no. Issue 8, pp. 1944–1957, 2013.
[18]
Y. Wang, Q. Dai, R. Feng, and Y.-G. Jiang, “ Beauty is here: Evaluating aesthetics in videos using multimodal features and free training data,” in Proc. ACM Int. Conf. Multimedia, 2013, pp. 369–372.
[19]
L. Zhang, Y. Gao, R. Zimmermann, Q. Tian, and X. Li, “ Fusion of multichannel local and global structural cues for photo aesthetics evaluation,” IEEE Trans. Image Process., vol. Volume 23, no. Issue 3, pp. 1419–1429, 2014.
[20]
O. Wu, W. Hu, and J. Gao, “ Learning to predict the perceived visual quality of photos,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 225–232.
[21]
L. Zhang, Y. Gao, C. Zhang, H. Zhang, Q. Tian, and R. Zimmermann, “ Perception-guided multimodal feature fusion for photo aesthetics assessment,” in Proc. ACM Int. Conf. Multimedia, Nov. 2014, pp. 237–246.
[22]
M. Nishiyama, T. Okabe, I. Sato, and Y. Sato, “ Aesthetic quality classification of photographs based on color harmony,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 33–40.
[23]
K. Huang, Q. Wang, and Z. Wu, “ Color image enhancement and evaluation algorithm based on human visual system,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. Volume 3 . May 2004, pp. 721–724.
[24]
K.-Q. Huang, Z.-Y. Wu, G. S. K. Fung, and F. H. Y. Chan, “ Color image denoising with wavelet thresholding based on human visual system model,” Signal Process., Image Commun., vol. Volume 20, no. Issue 2, pp. 115–127, 2005.
[25]
K. Huang, Q. Wang, and Z.-Y. Wu, “ Natural color image enhancement and evaluation algorithm based on human visual system,” Comput. Vis. Image Understand., vol. Volume 103, no. Issue 1, pp. 52–63, 2006.
[26]
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “ Visual categorization with bags of keypoints,” in Proc. Workshop Statist. Learn. Comput. Vis. (ECCV), 2004, vol. Volume 1 . nos. Issue 1</issue>–<issue>22, pp. 1–2.
[27]
T. S. Jaakkola et al., “ Exploiting generative models in discriminative classifiers,” in Proc. Adv. Neural Inf. Process. Syst., 1999, pp. 487–493.
[28]
N. Murray, L. Marchesotti, and F. Perronnin, “ AVA: A large-scale database for aesthetic visual analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2408–2415.
[29]
L. Marchesotti, F. Perronnin, and F. Meylan, “ Learning beautiful (and ugly) attributes,” in Proc. Brit. Mach. Vis. Conf., vol. Volume 7, pp. 1–11, Sep. 2013.
[30]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[31]
M. D. Zeiler and R. Fergus, “ Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.
[32]
X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “ Rapid: Rating pictorial aesthetics using deep learning,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 457–466.
[33]
Y. Kao, C. Wang, and K. Huang, “ Visual aesthetic quality assessment with a regression model,” in Proc. IEEE Int. Conf. Image Process., Sep. 2015, pp. 1583–1587.
[34]
X. Lu, Z. Lin, X. Shen, R. Mech, and J. Z. Wang, “ Deep multi-patch aggregation network for image style, aesthetics, and quality estimation,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 990–998.
[35]
L. Mai, H. Jin, and F. Liu, “ Composition-preserving deep photo aesthetics assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 497–506.
[36]
C. Mullin, G. Hayn-Leichsenring, and J. Wagemans, “ There is beauty in gist: An investigation of aesthetic perception in rapidly presented scenes,” J. Vis., vol. Volume 15, no. Issue 12, p. pp.123, 2015.
[37]
P. J. Locher, “<chapter-title>The aesthetic experience with visual art 'at first glance</chapter-title>,”' in Investigations Into the Phenomenology and the Ontology of the Work of Art, vol. Volume 81 . Springer International Publishing, 2015, pp. 75–88.
[38]
R. Caruana, “ Multitask learning,” Mach. Learn., vol. Volume 28, no. Issue 1, pp. 41–75, 1997.
[39]
J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim, “ Rotating your face using multi-task deep neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 676–684.
[40]
Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “ Facial landmark detection by deep multi-task learning,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 94–108.
[41]
W. Luo, X. Wang, and X. Tang, “ Content-based photo quality assessment,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2206–2213.
[42]
Y. Niu and F. Liu, “ What makes a professional video? A computational aesthetics approach,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 22, no. Issue 7, pp. 1037–1049, 2012.
[43]
C. Wang, W. Ren, K. Huang, and T. Tan, “ Weakly supervised object localization with latent category learning,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 431–445.
[44]
Y. Kao, K. Huang, and S. J. Maybank, “ Hierarchical aesthetic quality assessment using deep convolutional neural networks,” Image Commun., vol. Volume 47, pp. 500–510, 2016.
[45]
W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo, “ Multi-task deep visual-semantic embedding for video thumbnail selection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3707–3715.
[46]
A. H. Abdulnabi, G. Wang, J. Lu, and K. Jia, “ Multi-task cnn model for attribute prediction,” IEEE Trans. Multimedia, vol. Volume 17, no. Issue 11, pp. 1949–1959, 2015.
[47]
S. Zhang, R. He, Z. Sun, and T. Tan, “ Multi-task ConvNet for blind face inpainting with application to face verification,” in Proc. Int. Conf. Biometrics, Jun. 2016, pp. 1–8.
[48]
W. Zhang et al., “ Deep model based transfer and multi-task learning for biological image analysis,” in Proc. KDD, 2015, pp. 1475–1484.
[49]
X. Liu, J. Gao, X. He, L. Deng, K. Duh, and Y.-Y. Wang, “ Representation learning using multi-task deep neural networks for semantic classification and information retrieval,” in Proc. NAACL, 2015, pp. 1–8.
[50]
Y. Zhang and D. Y. Yeung, “ A convex formulation for learning task relationships in multi-task learning,” in Proc. Uncertainty Artif. Intell., 2010, pp. 733–742.
[51]
A. Saha, P. Rai, S. Venkatasubramanian, and H. Daume, “ Online learning of multiple tasks and their relationships,” in Proc. Int. Conf. Artif. Intell. Statist., 2011, pp. 643–651.
[52]
E. V. Bonilla, K. M. Chai, and C. Williams, “ Multi-task Gaussian process prediction,” in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 153–160.
[53]
A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, vol. Volume 104 . Boca Raton, FL, USA: CRC Press, 1999.
[54]
R. Girshick, “ Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 1440–1448.
[55]
J. Long, E. Shelhamer, and T. Darrell, “ Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3431–3440.
[56]
K. Simonyan and A. Zisserman. (2015). “<chapter-title>Very deep convolutional networks for large-scale image recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1409.1556
[57]
K. He, X. Zhang, S. Ren, and J. Sun. (2015). “<chapter-title>Deep residual learning for image recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1512.03385
[58]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[59]
S. J. Pan and Q. Yang, “ A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. Volume 22, no. Issue 10, pp. 1345–1359, 2010.
[60]
J. Donahue et al. (2013). “<chapter-title>DeCAF: A deep convolutional activation feature for generic visual recognition</chapter-title>.” {Online}. Available: https://arxiv.org/abs/1310.1531
[61]
X. Fan, A. Felsovalyi, S. A. Sivo, and S. C. Keenan, “ SAS for Monte Carlo studies,” in Proc. SAS Inst., Cary, 2002, pp. 87–89.
[62]
F. Perronnin and C. Dance, “ Fisher kernels on visual vocabularies for image categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8.
[63]
C.-C. Chang and C.-J. Lin, “ LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. Volume 2, no. Issue 3, pp. 27-1–27-27, 2011. {Online}. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm
[64]
A. Oliva and A. Torralba, “ Modeling the shape of the scene: A holistic representation of the spatial envelope,” Int. J. Comput. Vis., vol. Volume 42, no. Issue 3, pp. 145–175, 2001.

Cited By

View all
  • (2024)Towards Robust Evaluation of Aesthetic and Photographic Quality MetricsComplexity10.1155/2024/82235862024Online publication date: 1-Jan-2024
  • (2024)Text-guided Multi-Task Image Aesthetic Quality AssessmentProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690176(11-19)Online publication date: 28-Oct-2024
  • (2024)Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score LabelsProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690174(63-71)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 26, Issue 3
March 2017
473 pages

Publisher

IEEE Press

Publication History

Published: 01 March 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Robust Evaluation of Aesthetic and Photographic Quality MetricsComplexity10.1155/2024/82235862024Online publication date: 1-Jan-2024
  • (2024)Text-guided Multi-Task Image Aesthetic Quality AssessmentProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690176(11-19)Online publication date: 28-Oct-2024
  • (2024)Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score LabelsProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690174(63-71)Online publication date: 28-Oct-2024
  • (2024)Improving Image Aesthetic Assessment via Multiple Image Joint LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368712820:11(1-24)Online publication date: 21-Aug-2024
  • (2024)Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681175(2399-2408)Online publication date: 28-Oct-2024
  • (2024)Coarse-to-Fine Image Aesthetics Assessment With Dynamic Attribute SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.338945226(9316-9329)Online publication date: 16-Apr-2024
  • (2024)Semi-Supervised Adversarial Learning for Attribute-Aware Photo Aesthetic AssessmentIEEE Transactions on Multimedia10.1109/TMM.2021.311770926(4086-4096)Online publication date: 1-Jan-2024
  • (2024)Emotion-aware hierarchical interaction network for multimodal image aesthetics assessmentPattern Recognition10.1016/j.patcog.2024.110584154:COnline publication date: 1-Oct-2024
  • (2024)Confidence-based dynamic cross-modal memory network for image aesthetic assessmentPattern Recognition10.1016/j.patcog.2023.110227149:COnline publication date: 1-May-2024
  • (2024)A novel approach using deep convolutional neural network to classify the photographs based on leading-line by fine-tuning the pre-trained VGG16 neural networkMultimedia Tools and Applications10.1007/s11042-022-13338-583:1(3189-3214)Online publication date: 1-Jan-2024
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media