Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning
<p>Example of document layout analysis [<a href="#B18-applsci-13-09763" class="html-bibr">18</a>].</p> "> Figure 2
<p>Example of poster layout analysis.</p> "> Figure 3
<p>Multi-task learning model architecture [<a href="#B77-applsci-13-09763" class="html-bibr">77</a>].</p> "> Figure 4
<p>Brain-inspired neural network framework [<a href="#B85-applsci-13-09763" class="html-bibr">85</a>].</p> ">
Abstract
:1. Introduction
2. Complex-Image Layout Analysis Methods
2.1. Analysis of Complex Layout
2.2. Traditional Layout Analysis Methods
2.3. Layout Analysis Method Based on Machine Learning
2.3.1. Layout Analysis Method Based on Support Vector Machine (SVM)
2.3.2. Layout Analysis Based on Neural Networks
3. Methods for Assessing the Aesthetic Quality of Images
3.1. Traditional Methods of Assessing the Aesthetic Quality of Images
3.2. Deep-Learning-Based Method for Assessing the Aesthetic Quality of Images
3.2.1. Image Aesthetic-Assessment Method Based on Depth-Feature Extraction
3.2.2. Aesthetic Assessment Method of Images Based on Multi-Task Convolutional Networks
3.2.3. Image Aesthetic-Assessment Method Based on Fine-Tuned Convolutional Neural Network
3.2.4. Aesthetic Assessment Methods for Images in Brain-Inspired Deep Networks
3.2.5. Image Aesthetics-Assessment Method Based on Semi-Supervised Adversarial Learning
3.2.6. Aesthetic Assessment Method of Images Based on Multimodal Attention Networks
4. Datasets
4.1. Layout-Analysis Dataset
4.2. Image Aesthetic-Quality-Assessment Dataset
5. Summary and Future Prospects
- (1)
- Building a visual-communication-design-class image dataset
- (2)
- Modular-design network-model structure
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Deng, Y.; Loy, C.C.; Tang, X. Image aesthetic assessment: An experimental survey. IEEE Signal Process. Mag. 2017, 34, 80–106. [Google Scholar] [CrossRef]
- Luo, P. Social image aesthetic classification and optimization algorithm in machine learning. Neural Comput. Appl. 2023, 35, 4283–4293. [Google Scholar] [CrossRef]
- Lu, X.; Lin, Z.; Jin, H.; Yang, J.; Wang, J.Z. Rating image aesthetics using deep learning. IEEE Trans. Multimed. 2015, 17, 2021–2034. [Google Scholar] [CrossRef]
- Yang, J.; Zhou, Y.; Zhao, Y.; Lu, W.; Gao, X. MetaMP: Metalearning-Based Multipatch Image Aesthetics Assessment. IEEE Trans. Cybern. 2022, 53, 5716–5728. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Liu, D.; Chang, S.; Dolcos, F.; Beck, D.; Huang, T. Image aesthetics assessment using Deep Chatterjee’s machine. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 941–948. [Google Scholar]
- Kao, Y.; He, R.; Huang, K. Deep aesthetic quality assessment with semantic information. IEEE Trans. Image Process. 2017, 26, 1482–1495. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, X.; Lu, W.; He, L. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Trans. Multimed. 2019, 21, 2815–2826. [Google Scholar] [CrossRef]
- Apostolidis, K.; Mezaris, V. Image aesthetics assessment using fully convolutional neural networks. In Proceedings of the MultiMedia Modeling: 25th International Conference, Thessaloniki, Greece, 8–11 January 2019; pp. 361–373. [Google Scholar]
- Tan, H.; Xu, B.; Liu, A. Research and Extraction on Intelligent Generation Rules of Posters in Graphic Design. In Proceedings of the Cross-Cultural Design. Methods, Tools and User Experience: 11th International Conference, Orlando, FL, USA, 26–31 July 2019; pp. 570–582. [Google Scholar]
- Guo, S.; Jin, Z.; Sun, F.; Li, J.; Li, Z.; Shi, Y.; Cao, N. Vinci: An intelligent graphic design system for generating advertising posters. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Virtual (originally Yokohama, Japan), 8–13 May 2021; pp. 1–17. [Google Scholar]
- Huo, H.; Wang, F. A Study of Artificial Intelligence-Based Poster Layout Design in Visual Communication. Sci. Program. 2022, 2022, 1191073. [Google Scholar] [CrossRef]
- Yang, H.; Shi, P.; He, S.; Pan, D.; Ying, Z.; Lei, L. A comprehensive survey on image aesthetic quality assessment. In Proceedings of the IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; pp. 294–299. [Google Scholar]
- Zhang, Y. Layout analysis and understanding. Appl. Linguist. 1997, 2, 94–100. [Google Scholar]
- Binmakhashen, G.M.; Mahmoud, S.A. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 2019, 52, 1–36. [Google Scholar] [CrossRef]
- Namboodiri, A.M.; Jain, A.K. Document structure and layout analysis. In Digital Document Processing: Major Directions and Recent Advances; Springer: Berlin/Heidelberg, Germany, 2007; pp. 29–48. [Google Scholar]
- O’Gorman, L. The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. 1993, 15, 1162–1173. [Google Scholar] [CrossRef]
- Ittner, D.J.; Baird, H.S. Language-free layout analysis. In Proceedings of the 2nd International Conference on Document Analysis and Recognition (ICDAR’93), Tsukuba Science City, Japan, 20–22 October 1993; pp. 336–340. [Google Scholar]
- Zhong, X.; Tang, J.; Yepes, A.J. Publaynet: Largest dataset ever for document layout analysis. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Sydney International Convention Centre, Sydney, Australia, 20–25 September 2019; pp. 1015–1022. [Google Scholar]
- Nagy, G.; Seth, S.C. Hierarchical representation of optically scanned documents. In Proceedings of the 7th International Conference on Pattern Recognition (ICPR), Montréal, QC, Canada, 30 July–2 August 1984. [Google Scholar]
- Mao, S.; Rosenfeld, A.; Kanungo, T. Document structure analysis algorithms: A literature survey. Doc. Recognit. Retr. X 2003, 5010, 197–207. [Google Scholar]
- Ha, J.; Haralick, R.M.; Phillips, I.T. Document page decomposition by the bounding-box project. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 2, pp. 1119–1122. [Google Scholar]
- Zhu, W.; Chen, Q.; Wei, C.; Li, Z. A segmentation algorithm based on image projection for complex text layout. AIP Conference Proceedings. AIP Publ. LLC 2017, 1890, 030011. [Google Scholar]
- Wei, C.; Chen, Q.; Zhang, M. Research on Document Image Layout Segmentation Algorithm Based on Projection. Mod. Comput. 2016, 10, 33–38. [Google Scholar]
- Zhan, Y.; Wang, W.; Gao, W. A robust split-and-merge text segmentation approach for images. In Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, 20–24 August 2006; Volume 2, pp. 1002–1005. [Google Scholar]
- Strouthopoulos, C.; Papamarkos, N.; Chamzas, C. PLA using RLSA and a neural network. Eng. Appl. Artif. Intell. 1999, 12, 119–138. [Google Scholar] [CrossRef]
- Lu, Y.; Tan, C.L. Constructing area Voronoi diagram in document images. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, Republic of Korea, 29 August–1 September 2005; pp. 342–346. [Google Scholar]
- Xiao, F.; Xiao, L. A Chinese document layout analysis based on non-text images. In Proceedings of the 2009 International Forum on Computer Science-Technology and Applications, Chongqing, China, 25 December 2009; Volume 1, pp. 326–328. [Google Scholar]
- Guo, L.; Sun, X.; Wang, Z.; Yang, J. A Connectivity-based Page Segmentation Method. Comput. Eng. Appl. 2003, 05, 105–107. [Google Scholar]
- Yu, M.; Guo, Q.; Wang, D.; Yu, Y. Improved connectivity-based layout segmentation method. Comput. Eng. Appl. 2013, 49, 195–198. [Google Scholar]
- Chen, Y.; Wang, W.; Liu, H.; Cai, Z.; Zhao, P. Layout segmentation and description of Tibetan document images based on adaptive run length smoothing algorithm. Laser Optoelectron. Prog. 2021, 58, 172–179. [Google Scholar]
- Fu, L.; Qian, J.; Zhong, Y. Printed image layout segmentation method based on Chinese character connected component. Comput. Eng. Appl. 2015, 51, 178–182. [Google Scholar]
- Zujovic, J.; Pappas, T.N.; Neuhoff, D.L. Structural similarity metrics for texture analysis and retrieval. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 2225–2228. [Google Scholar]
- Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef]
- Wang, Y.; Lu, Y.; Li, Y. A new image segmentation method based on support vector machine. In Proceedings of the IEEE 4th International Conference on Image, Vision and Computing (ICIVC), Xiamen, China, 5–7 July 2019; pp. 177–181. [Google Scholar]
- Zhou, K.; Qiao, X.; Li, F. Research on color image segmentation based on support vector machine. Mod. Electron. Tech. 2019, 42, 103–106+111. [Google Scholar]
- Lu, Y.; Fang, J.; Zhang, S.; Liu, C. Research on layout segmentation based on support vector machine. Mod. Electron. Tech. 2020, 43, 149–153. [Google Scholar]
- Wu, Z.; Wang, Q. Leaf image segmentation based on support vector machine. Softw. Eng. 2022, 6, 25. [Google Scholar]
- Yang, A.; Bai, Y.; Liu, H.; Jin, K.; Xue, T.; Ma, W. Application of SVM and its Improved Model in Image Segmentation. Mob. Netw. Appl. 2022, 27, 851–861. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 3431–3440. [Google Scholar]
- Li, X.H.; Yin, F.; Xue, T.; Liu, L.; Ogier, J.M.; Liu, C.L. Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 514–519. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–16 July 2017; pp. 2881–2890. [Google Scholar]
- Zhou, J.; Hao, M.; Zhang, D.; Zou, P.; Zhang, W. Fusion PSPnet image segmentation based method for multi-focus image fusion. IEEE Photonics J. 2019, 11, 6501412. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062, 2014. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Kuala Lumpur, Malaysia, 18–20 December 2021; pp. 9716–9725. [Google Scholar]
- Wu, Y.; Jiang, J.; Huang, Z.; Tian, Y. FPANet: Feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell. 2022, 52, 3319–3336. [Google Scholar] [CrossRef]
- Tang, L.; Wan, L.; Wang, T.; Li, S. DECANet: Image Semantic Segmentation Method Based on Improved DeepLabv3+. Laser Optoelectron. Prog. 2023, 60, 92–100. [Google Scholar] [CrossRef]
- Datta, R.; Joshi, D.; Li, J.; Wang, J.Z. Studying aesthetics in photographic images using a computational approach. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 288–301. [Google Scholar]
- Liu, L.; Chen, R.; Wolf, L.; Cohen-Or, D. Optimizing photo composition. In Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2010; Volume 29, pp. 469–478. [Google Scholar]
- Wong, L.K.; Low, K.L. Saliency-enhanced image aesthetics class prediction. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 997–1000. [Google Scholar]
- Luo, Y.; Tang, X. Photo and video quality evaluation: Focusing on the subject. In Proceedings of the Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, 12–18 October 2008; pp. 386–399. [Google Scholar]
- Datta, R.; Li, J.; Wang, J.Z. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 105–108. [Google Scholar]
- Lv, P.; Fan, J.; Nie, X.; Dong, W.; Jiang, X.; Zhou, B.; Xu, M.; Xu, C. User-guided personalized image aesthetic assessment based on deep reinforcement learning. IEEE Trans. Multimed. 2021, 25, 736–749. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Sukthankar, R.; Shah, M. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the 18th ACM International Conference on Multimedia, Florence, Italy, 25–29 October 2010; pp. 271–280. [Google Scholar]
- Tong, H.; Li, M.; Zhang, H.J.; He, J.; Zhang, C. Classification of digital photos taken by photographers or home users. In Proceedings of the Advances in Multimedia Information Processing-PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, 30 November–3 December 2004; pp. 198–205. [Google Scholar]
- Aydın, T.O.; Smolic, A.; Gross, M. Automated aesthetic analysis of photographic images. IEEE Trans. Vis. Comput. Graph. 2014, 21, 31–42. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Wang, X.; Yamasaki, T.; Aizawa, K. Aspect-ratio-preserving multi-patch image aesthetics score prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wu, Y.; Bauckhage, C.; Thurau, C. The good, the bad, and the ugly: Predicting aesthetic image labels. In Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 1586–1589. [Google Scholar]
- Bhattacharya, S.; Sukthankar, R.; Shah, M. A holistic approach to aesthetic enhancement of photographs. ACM Trans. Multimed. Comput. (TOMM) 2011, 7, 1–21. [Google Scholar] [CrossRef]
- Dhar, S.; Ordonez, V.; Berg, T.L. High level describable attributes for predicting aesthetics and interestingness. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1657–1664. [Google Scholar]
- Tang, X.; Luo, W.; Wang, X. Content-based photo quality assessment. IEEE Trans. Multimed. 2013, 15, 1930–1943. [Google Scholar] [CrossRef]
- Lo, K.Y.; Liu, K.H.; Chen, C.S. Assessment of photo aesthetics with efficiency. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba International Congress Center, Tsukuba Science City, Japan, 11–15 November 2012; pp. 2186–2189. [Google Scholar]
- Celona, L.; Leonardi, M.; Napoletano, P.; Rozza, A. Composition and style attributes guided image aesthetic assessment. IEEE Trans. Image Process. 2022, 31, 5009–5024. [Google Scholar] [CrossRef] [PubMed]
- Yeh, M.C.; Cheng, Y.C. Relative features for photo quality assessment. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 2861–2864. [Google Scholar]
- Marchesotti, L.; Perronnin, F.; Larlus, D.; Csurka, G. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1784–1791. [Google Scholar]
- Marchesotti, L.; Perronnin, F.; Meylan, F. Learning beautiful (and ugly) attributes. In Proceedings of the BMVC, London, UK, 6 September 2013; Volume 7, pp. 1–11. [Google Scholar]
- Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2408–2415. [Google Scholar]
- Csurka, G.; Dance, C.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In Proceedings of the Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 11–14 May 2004; Volume 1, pp. 1–2. [Google Scholar]
- Wnag, W.; Yi, J.; Xu, X.; Wnag, L. Computational aesthetics of image classification and evaluation. J. Comput. Aided Des. Comput. Graph. 2014, 26, 1075–1083. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Zhang, W.; Zhai, G.; Yang, X.; Yan, J. Hierarchical features fusion for image aesthetics assessment. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–29 September 2019; pp. 3771–3775. [Google Scholar]
- Li, X.; Li, X.; Zhang, G.; Zhang, X. A novel feature fusion method for computing image aesthetic quality. IEEE Access 2020, 8, 63043–63054. [Google Scholar] [CrossRef]
- Jang, H.; Lee, J.S. Analysis of deep features for image aesthetic assessment. IEEE Access 2021, 9, 29850–29861. [Google Scholar] [CrossRef]
- Li, L.; Zhu, H.; Zhao, S.; Ding, G.; Jiang, H.; Tan, A. Personality driven multi-task learning for image aesthetic assessment. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 430–435. [Google Scholar]
- Liu, J.; Lv, J.; Yuan, M.; Zhang, J.; Su, Y. ABSNet: Aesthetics-Based Saliency Network Using Multi-Task Convolutional Network. IEEE Signal Process. Lett. 2020, 27, 2014–2018. [Google Scholar] [CrossRef]
- Tian, X. Using multi-task residual network to evaluate image aesthetic quality. In Proceedings of the IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; Volume 5, pp. 171–174. [Google Scholar]
- Chen, Y.; Pu, Y.; Zhao, Z.; Xu, D.; Qian, W. Image Aesthetic Assessment Based on Emotion-Assisted Multi-Task Learning Network. In Proceedings of the 6th International Conference on Multimedia Systems and Signal Processing, Shenzhen, China, 22–24 May 2021; pp. 15–21. [Google Scholar]
- Wang, Y.; Li, Y.; Porikli, F. Finetuning convolutional neural networks for visual aesthetics. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 3554–3559. [Google Scholar]
- Wen, K.; Wei, Y.; Dong, X. Survey of application of deep convolution neural network in image aesthetic evaluation. Comput. Eng. Appl. 2019, 55, 13–23+58. [Google Scholar]
- Li, Y.; Pu, Y.; Xu, D.; Qian, W.; Wang, L. Image aesthetic quality evaluation using convolution neural network embedded fine-tune. In Proceedings of the CCF Chinese Conference on Computer Vision, Tianjin, China, 11–14 October 2017; pp. 269–283. [Google Scholar]
- Wang, W.; Zhao, M.; Wang, L.; Huang, J.; Cai, C.; Xu, X. A multi-scene deep learning model for image aesthetic evaluation. Signal Process. Image Commun. 2016, 47, 511–518. [Google Scholar] [CrossRef]
- Wang, Z.; Chang, S.; Dolcos, F.; Beck, D.; Liu, D.; Huang, T.S. Brain-inspired deep networks for image aesthetics assessment. arXiv 2016, arXiv:1601.04155. [Google Scholar]
- Lemarchand, F. Doctor of Engineering, Computational Modelling of Human Aesthetic Preferences in the Visual Domain: A Brain-Inspired Approach. Ph.D. Thesis, University of Plymouth, Plymouth, UK, 2018. [Google Scholar]
- Liu, Z.; Wang, Z.; Yao, Y.; Zhang, L.; Shao, L. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Trans. Image Process. 2018. early access. [Google Scholar] [CrossRef] [PubMed]
- Xiang, X.; Cheng, Y.; Chen, J.; Lin, Q.; Allebach, J. Semi-supervised multi-task network for image aesthetic assessment. Electron. Imaging 2020, 32, 188-1–188-7. [Google Scholar] [CrossRef]
- Shu, Y.; Li, Q.; Liu, L.; Xu, G. Semi-supervised Adversarial Learning for Attribute-Aware Photo Aesthetic Assessment. IEEE Trans. Multimed. 2021. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, X.; He, L.; Lu, W. MSCAN: Multimodal Self-and-Collaborative Attention Network for image aesthetic prediction tasks. Neurocomputing 2021, 430, 14–23. [Google Scholar] [CrossRef]
- Miao, H.; Zhang, Y.; Wang, D.; Feng, S. Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis. Mathematic 2021, 9, 1437. [Google Scholar] [CrossRef]
- Liu, X.; Jiang, Y. Aesthetic assessment of website design based on multimodal fusion. Future Gener. Comput. Syst. 2021, 117, 433–438. [Google Scholar] [CrossRef]
- Li, M.; Xu, Y.; Cui, L.; Huang, S.; Wei, F.; Li, Z.; Zhou, M. DocBank: A benchmark dataset for document layout analysis. arXiv 2020, arXiv:2006.01038. [Google Scholar]
- Mondal, A.; Lipps, P.; Jawahar, C.V. IIIT-AR-13K: A new dataset for graphical object detection in documents. In Proceedings of the Document Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, 26–29 July 2020; pp. 216–230. [Google Scholar]
- Wang, Z.; Xu, Y.; Cui, L.; Shang, J.; Wei, F. Layoutreader: Pre-training of text and layout for reading order detection. arXiv 2021, arXiv:2108.11591. [Google Scholar]
- Abdallah, A.; Berendeyev, A.; Nuradin, I.; Nurseitov, D. Tncr: Table net detection and classification dataset. Neurocomputing 2022, 473, 79–97. [Google Scholar] [CrossRef]
- Zhu, W.; Sokhandan, N.; Yang, G.; Martin, S.; Sathyanarayana, S. DocBed: A multi-stage OCR solution for documents with complex layouts. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, pp. 12643–12649. [Google Scholar]
- Zhang, Z.; Yu, B.; Yu, H.; Liu, T.; Fu, C.; Li, J.; Tang, C.; Sun, J.; Li, Y. Layout-aware information extraction for document-grounded dialogue: Dataset, method and demonstration. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10 October 2022; pp. 7252–7260. [Google Scholar]
- Joshi, D.; Datta, R.; Fedorovskaya, E.; Luong, Q.T. Aesthetics and emotions in images. IEEE Signal Process. Mag. 2011, 28, 94–115. [Google Scholar] [CrossRef]
- Kong, S.; Shen, X.; Lin, Z.; Mech, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 662–679. [Google Scholar]
- Chang, K.Y.; Lu, K.H.; Chen, C.S. Aesthetic critiques generation for photos. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3514–3523. [Google Scholar]
- Schwarz, K.; Wieschollek, P.; Lensch, H.P.A. Will people like your image? Learning the aesthetic space. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 2048–2057. [Google Scholar]
- Wang, W.; Yang, S.; Zhang, W.; Zhang, J. Neural aesthetic image reviewer. IET Comput. Vis. 2019, 13, 749–758. [Google Scholar] [CrossRef]
- Jin, X.; Wu, L.; Zhao, G.; Li, X.; Zhang, X.; Ge, S.; Zou, D.; Zhou, B.; Zhou, X. Aesthetic attributes assessment of images. In Proceedings of the 27th ACM International Conference on Multimedia, Torino, Italy, 22–26 October 2018; pp. 311–319. [Google Scholar]
- Kang, C.; Valenzise, G.; Dufaux, F. Eva: An explainable visual aesthetics dataset. In Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends; Association for Computing Machinery: New York, NY, USA, 2020; pp. 5–13. [Google Scholar]
- Jin, X.; Wu, L.; Zhao, G.; Zhou, X.; Zhang, X.; Li, X. IDEA: A new dataset for image aesthetic scoring. Multimed. Tools Appl. 2020, 79, 14341–14355. [Google Scholar] [CrossRef]
- He, S.; Zhang, Y.; Xie, R.; Jiang, D.; Ming, A. Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks. In Proceeding of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022. [Google Scholar]
- Jin, X.; Li, X.; Lou, H.; Fan, C.; Deng, Q.; Xiao, C.; Cui, S.; Singh, A.K. Aesthetic attribute assessment of images numerically on mixed multi-attribute datasets. ACM Trans. Multimed. Comput. 2023, 18, 1–16. [Google Scholar] [CrossRef]
Method | Advantage | Disadvantages | Reference | |
---|---|---|---|---|
Top-down | Cyclic projection x-y tangent algorithm | This algorithm has a faster processing speed. | Does not work well with complex and skewed layouts. | [22] |
Recursive dichotomous projection algorithm | Optimized conventional projection methods. | The algorithm runs inefficiently and is time consuming. | [22] | |
Bottom-up | Run-Length Smoothing Algorithm | Simple algorithm, stronger noise immunity. | Higher dependence on thresholds and more computationally intensive. | [25] |
Connected-Region Algorithm | Ability to quickly detect connected areas in an image. | Merging rules are difficult to determine and must be processed using a large number of parameters. | [24] | |
Voronoi diagram algorithm | Ability to have good reliability and accuracy in electronic-document scenarios. | Does not support image-area splitting, does not work with skewed layout images. | [26] | |
Docstrum algorithm | Ability to cope with different text sizes and fonts. | Relies on a set of threshold parameters for clustering and does not support image-region splitting. | [16] | |
Hybrid | Texture analysis algorithms | The ability to simultaneously process graphics from both the overall and local aspects, not only to adapt to the more complex layout of the text image, and the implementation of high efficiency. | Texture block-size division is more difficult to determine. | [32] |
Method | Pub. Year | Advantage | Disadvantages |
---|---|---|---|
Wang et al. [34] | 2019 | SVM combined with mean clustering algorithm for automatic acquisition of training samples. | Not applicable to segmentation of complex layout images. |
Zhou et al. [35] | 2019 | Human observation of the color characteristics of the target and background areas, manual selection of sample points. | Requires manual selection of sample points by hand, which is time-consuming and labor-intensive. |
Lu et al. [36] | 2020 | Combining image phase consistency and texture features to form new feature vectors for layout segmentation. | The accuracy of determining the boundaries between graphs and images is low due to poor graph regularity and high ambiguity. |
Wu et al. [37] | 2022 | Classification of image pixel points by labelling foreground and background samples in the image. | Requires manual selection of sample points by hand, which is time-consuming and labor-intensive. |
Yang et al. [38] | 2022 | The SVM algorithm was improved by adding the hue–saturation-intensity (HIS) color space channel and selecting the RGB and HIS dual color space channels as feature vectors to classify the pixels. | The selected kernel functions and parameters are only applicable to a small number of image segmentations |
Method | Pub. Year | Backbone | Experiments | Major Contributions | |
---|---|---|---|---|---|
Datasets | MIoU (%) | ||||
DeepLabv1 [43] | 2014 | VGG-16 | Pascal VOC 2012 | 71.6 | Atrous convolution, fully connected CRFs |
FCN [39] | 2015 | VGG-16 | Pascal VOC 2011 | 62.7 | Pioneer of end-to-end semantic segmentation |
PSPNet [41] | 2017 | VGG-16/ResNet101 | Pascal VOC 2012 | 85.4 | Spatial Pyramid Pooling Module |
Cityscapes | 80.2 | ||||
DeepLabv2 [44] | 2017 | ResNet50 | Pascal VOC 2012 | 79.7 | Proposed Atrous Spatial Pyramid Pooling (ASPP) |
Cityscapes | 70.4 | ||||
DeepLabv3 [45] | 2017 | ResNet101 | Pascal VOC 2012 | 86.9 | Cascade or parallel ASPP modules |
Cityscapes | 81.3 | ||||
DeepLabv3+ [46] | 2018 | Xception | Pascal VOC 2012 | 89.0 | Added an upsampled decoder module |
Cityscapes | 82.1 | ||||
DANet [47] | 2019 | ResNet101 | Pascal VOC 2012 | 82.6 | Dual attention: positional attention module and channel attention module |
STDC [48] | 2021 | STDC2 | ImageNet | 76.4 | Proposed detail-aggregation module to learn the decoder |
Cityscapes | 77.0 | ||||
CamVid | 73.9 | ||||
FPANet [49] | 2022 | ResNet18 | Cityscapes | 75.9 | Using ResNet and Atrous Spatial Pyramid Pooling (ASPP) to extract more advanced semantic information |
DECANet [50] | 2023 | ResNet101 | Pascal VOC 2012 | 81.0 | Introducing Effective Channel Attention Networks (ECANet) at the Encoder |
Cityscapes | 76.0 |
Dataset | Year | Total Number of Images | Categories of Layout | Introduce | Reference Link |
---|---|---|---|---|---|
PubLayNet [18] | 2019 | 360,000 | 5 | The dataset is made up of 33 detailed categories (e.g., tables, images, paragraphs, etc.) and 2 base classes (text and non-text objects). Its layout is annotated with borders and polygon segments. | https://github.com/ibm-aur-nlp/PubLayNet (accessed on 9 June 2023) |
DocBank [93] | 2020 | 500,000 | 12 | The DocBank dataset is a document-level benchmark with fine-grained annotation-level annotations for layout analysis. Consisting of 500,000 document pages, it contains 12 types of semantic units. | https://github.com/doc-analysis/DocBank (accessed on 9 June 2023) |
IIIT-AR-13K [94] | 2021 | 13,000 | 5 | This dataset is the largest manually annotated dataset for graphical object detection and contains five categories: tables, graphics, natural images, logos, and signatures. | http://cvit.iiit.ac.in/usodi/iiitar13k.php (accessed on 9 June 2023) |
Reading Bank [95] | 2021 | 500,000 | - | A benchmark dataset for reading-order detection, containing 500 K document images with various document types and corresponding reading-order information. | https://github.com/microsoft/unilm/tree/master/layoutreader (accessed on 9 June 2023) |
TNCR [96] | 2021 | 9428 | 5 | The dataset can be used as a base study for table detection, structure recognition, and table classification, and contains five different table classes. | https://github.com/abdoelsayed2016/TNCR_Dataset (accessed on 9 June 2023) |
NewsNet7 [97] | 2022 | 3000 | 7 | The dataset contains 3000 fully annotated real newspaper images and is primarily used for layout analysis of various complex-layout documents. | not yet public |
LIE [98] | 2022 | 4061 | - | The dataset was constructed from 400 documents containing 4061 fully annotated pages and was used primarily for multiple-layout-format analysis. | https://github.com/jsvine/pdfplumber (accessed on 9 June 2023) |
Dataset | Year | Total Number of Images | Aesthetic Grade | Introduce |
---|---|---|---|---|
Photo.net [99] | 2011 | 20,278 | (0,7) | Each image in the dataset has been rated by at least 10 people on a scale from 0 to 7, with 7 being the most aesthetically pleasing image. |
AVA [70] | 2012 | 255,530 | (1,10) | Each image was rated by 78 to 549 raters with scores ranging from 1 to 10. The average score was used as the truth label for each image. The dataset authors labeled each image with one-to-two semantic tags, for a total of sixty-six semantic tags in text form for the entire dataset. |
AADB [100] | 2016 | 10,000 | (1,5) | Images are scored by five raters on a scale of 1 to 5, with each image having an overall score and 11 aesthetic-attribute scores. |
PCCD [101] | 2017 | 4235 | (1,10) | The dataset is more comprehensively labeled, containing evaluation scores, distributions and multi-person verbal comments for one overall and six aesthetic factors, each with a rating range of 1 to 10, ultimately being normalized to [0,1]. |
AROD [102] | 2018 | 380,000 | - | The dataset is calculated by capturing the number of views and comments on each image from Flickr to obtain an aesthetic score. |
AVA-Reviews [103] | 2018 | 40,000 | - | Each image follows six linguistic comments, which are labeled without regard to multiple aesthetic factors. |
DPC-Captions [104] | 2019 | 154,384 | - | The dataset contains annotations of up to five aesthetic attributes of an image through knowledge transfer from the fully annotated small-scale dataset PCCD. |
EVA [105] | 2020 | 4070 | - | A minimum of 30 votes per image was included to assess the difficulty of the aesthetic rating, the rating of the 4 complementary aesthetic attributes, and the relative importance of each attribute in forming the aesthetic opinion. |
IDEA [106] | 2020 | 9191 | (0,9) | The dataset is distributed almost in equilibrium, with 1000 images for each score from 0 to 8 and 191 images for a score of 9. |
TAD66K [107] | 2022 | 66,327 | (1,10) | The dataset contains 47 popular themes, and each image has been intensively annotated by more than 1200 people using specialized thematic evaluation criteria. |
AMD-A [108] | 2023 | 16,924 | (0,1) | The dataset was divided into two groups, one (11,166 images) for the overall aesthetic score regression and the other (16,924 images) for the classification and regression of the aesthetic attribute scores. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pu, Y.; Liu, D.; Chen, S.; Zhong, Y. Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning. Appl. Sci. 2023, 13, 9763. https://doi.org/10.3390/app13179763
Pu Y, Liu D, Chen S, Zhong Y. Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning. Applied Sciences. 2023; 13(17):9763. https://doi.org/10.3390/app13179763
Chicago/Turabian StylePu, Yumei, Danfei Liu, Siyuan Chen, and Yunfei Zhong. 2023. "Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning" Applied Sciences 13, no. 17: 9763. https://doi.org/10.3390/app13179763
APA StylePu, Y., Liu, D., Chen, S., & Zhong, Y. (2023). Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning. Applied Sciences, 13(17), 9763. https://doi.org/10.3390/app13179763