Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

CAPTAIN: Comprehensive Composition Assistance for Photo Taking

Published: 27 January 2022 Publication History

Abstract

Many people are interested in taking astonishing photos and sharing them with others. Emerging high-tech hardware and software facilitate the ubiquitousness and functionality of digital photography. Because composition matters in photography, researchers have leveraged some common composition techniques, such as the rule of thirds and the perspective-related techniques, in providing photo-taking assistance. However, composition techniques developed by professionals are far more diverse than well-documented techniques can cover. We present a new approach to leverage the underexplored photography ideas, which are virtually unlimited, diverse, and correlated. We propose a comprehensive fork-join framework, named CAPTAIN (Composition Assistance for Photo Taking), to guide a photographer with a variety of photography ideas. The framework consists of a few components: integrated object detection, photo genre classification, artistic pose clustering, and personalized aesthetics-aware image retrieval. CAPTAIN is backed by a large managed dataset crawled from a Website with ideas from photography enthusiasts and professionals. The work proposes steps to decompose a given amateurish shot into composition ingredients and compose them to bring the photographer a list of useful and related ideas. The work addresses personal preferences for composition by presenting a user-specified preference list of photography ideas. We have conducted many experiments on the newly proposed components and reported findings. A user study demonstrates that the work is useful to those taking photos.

References

[1]
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3686–3693.
[2]
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 3 (2009), 1–24.
[3]
Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 271–280.
[4]
Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2011. A holistic approach to aesthetic enhancement of photographs. ACM Trans. Multimedia Comput. Commun. Appl. 7, 1 (2011), 1–21.
[5]
Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Deep visual-semantic quantization for efficient image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1328–1337.
[6]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 172–186.
[7]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 7291–7299.
[8]
Hui-Tang Chang, Po-Cheng Pan, Yu-Chiang Frank Wang, and Ming-Syan Chen. 2015. R2P: Recomposition and retargeting of photographic images. In Proceedings of the ACM International Conference on Multimedia. ACM, 927–930.
[9]
Yuan-Yang Chang and Hwann-Tzong Chen. 2009. Finding good composition in panoramic scenes. In Proceedings of the International Conference on Computer Vision (ICCV’09). IEEE, 2225–2231.
[10]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the British Machine Vision Conference. BMVA Press.
[11]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 834–848.
[12]
Taeg Sang Cho, Moshe Butman, Shai Avidan, and William T. Freeman. 2008. The patch transform and its applications to image editing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, 1–8.
[13]
Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’11), Vol. 15. PMLR, 215–223.
[14]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 288–301.
[15]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248–255.
[16]
J. C. Dunn. 1973. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybernet. 3, 3 (1973), 32–57.
[17]
Farshid Farhat, Mohammad Mahdi Kamani, Sahil Mishra, and James Z. Wang. 2017. Intelligent portrait composition assistance: integrating deep-learned models and photography idea retrieval. In Proceedings of the ACM Conference on Multimedia, Thematic Workshops. ACM, 17–25.
[18]
Y. W. Guo, M. Liu, T. T. Gu, and W. P. Wang. 2012. Improving photo composition elegantly: Considering image similarity during composition optimization. Comput. Graph. Forum 31, 7 (2012), 2193–2202.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, 770–778.
[20]
Siqiong He, Zihan Zhou, Farshid Farhat, and James Z. Wang. 2018. Discovering triangles in portraits for supporting photographic creation. IEEE Trans. Multimedia 20, 2 (2018), 496–508.
[21]
Ahmet Iscen, Yannis Avrithis, Giorgos Tolias, Teddy Furon, and Ondrej Chum. 2018. Fast spectral ranking for similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 7632–7641.
[22]
Laurent Itti, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 11 (1998), 1254–1259.
[23]
Mohammad Mahdi Kamani, Farshid Farhat, Stephen Wistar, and James Z Wang. 2016. Shape matching using skeleton context for automated bow echo detection. In Proceedings of the International Conference on Big Data. IEEE, 901–908.
[24]
Mohammad Mahdi Kamani, Farshid Farhat, Stephen Wistar, and James Z. Wang. 2018. Skeleton matching with applications in severe weather detection. Appl. Soft Comput. 70 (2018), 1154–1166.
[25]
Yan Ke, Xiaoou Tang, and Feng Jing. 2006. The design of high-level features for photo quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. IEEE, 419–426.
[26]
David J. Ketchen and Christopher L. Shook. 1996. The application of cluster analysis in strategic management research: An analysis and critique. Strateg. Manage. J. 17, 6 (1996), 441–458.
[27]
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo aesthetics ranking network with attributes and content adaptation. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, Cham, Germany, 662–679.
[28]
Bert Krages. 2012. The Art of Composition. Skyhorse Publishing, New York, NY.
[29]
David Lauer and Stephen Pentak. 2011. Design Basics. Wadsworth Publishing, Belmont, CA.
[30]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[31]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr.2004), 361–397.
[32]
Jia Li, Lei Yao, and James Z. Wang. 2015. Photo composition feedback and enhancement. In Mobile Cloud Visual Media Computing. Springer, Cham, Germany, 113–144.
[33]
Ke Li, Bo Yan, Jun Li, and Aditi Majumder. 2015. Seam carving based aesthetics enhancement for photos. Sign. Process.: Image Commun. 39 (2015), 509–516.
[34]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14). Springer, Cham, Germany, 740–755.
[35]
Ligang Liu, Yong Jin, and Qingbiao Wu. 2010. Realtime aesthetic image retargeting. Comput. Aesthet. 10 (2010), 1–8.
[36]
Zhenguang Liu, Zepeng Wang, Yiyang Yao, Luming Zhang, and Ling Shao. 2018. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Trans. Image Process. (2018), 1–1.
[37]
Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z. Wang. 2015. Rating image aesthetics using deep learning. IEEE Trans. Multimedia 17, 11 (2015), 2021–2034.
[38]
Yiwen Luo and Xiaoou Tang. 2008. Photo and video quality evaluation: Focusing on the subject. In Proceedings of the European Conference on Computer Vision (ECCV’08). Springer, Berlin, 386–399.
[39]
Long Mai, Hailin Jin, Zhe Lin, Chen Fang, Jonathan Brandt, and Feng Liu. 2017. Spatial-semantic image search by visual feature synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1121–1130.
[40]
Long Mai, Hailin Jin, and Feng Liu. 2016. Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, 497–506.
[41]
Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the International Conference on Computer Vision (ICCV’11). IEEE, 1784–1791.
[42]
Joani Mitro. 2016. Content-based image retrieval tutorial. arXiv:1608.03811. Retrieved from https://arxiv.org/abs/1608.03811.
[43]
Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 2408–2415.
[44]
Bingbing Ni, Mengdi Xu, Bin Cheng, Meng Wang, Shuicheng Yan, and Qi Tian. 2013. Learning to photograph: A compositional perspective. IEEE Trans. Multimedia 15, 5 (2013), 1138–1151.
[45]
Jaesik Park, Joon-Young Lee, Yu-Wing Tai, and In So Kweon. 2012. Modeling photo composition and its application to photo re-arrangement. In Proceedings of the IEEE Conference on Image Processing. IEEE, 2741–2744.
[46]
Yael Pritch, Eitam Kav-Venaki, and Shmuel Peleg. 2009. Shift-map image editing. In Proceedings of the International Conference on Computer Vision (ICCV’09), Vol. 9. IEEE, 151–158.
[47]
Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1655–1668.
[48]
Yogesh Singh Rawat. 2015. Real-time assistance in multimedia capture using social media. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 641–644.
[49]
Yogesh Singh Rawat and Mohan S Kankanhalli. 2014. Context-based photography learning using crowdsourced images and social media. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 217–220.
[50]
Yogesh Singh Rawat and Mohan S. Kankanhalli. 2015. Context-aware photography learning for smart mobile devices. ACM Trans. Multimedia Comput. Commun. Appl. 12, 1s (2015), 1–24.
[51]
Yogesh Singh Rawat and Mohan S. Kankanhalli. 2016. Clicksmart: A context-aware viewpoint recommendation system for mobile photography. IEEE Trans. Circ. Syst. Vid. Technol. 27, 1 (2016), 149–158.
[52]
Yogesh Singh Rawat, Mubarak Shah, and Mohan S. Kankanhalli. 2019. Photography and exploration of tourist locations based on optimal foraging theory. IEEE Trans. Circ. Syst. Vid. Technol. 30, 7 (2019), 2276–2287.
[53]
Yogesh Singh Rawat, Mingli Song, and Mohan S Kankanhalli. 2017. A spring-electric graph model for socialized group photography. IEEE Trans. Multimedia 20, 3 (2017), 754–766.
[54]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/abs/1804.02767.
[55]
Jian Ren, Xiaohui Shen, Zhe Lin, Radomír Mech, and David J. Foran. 2017. Personalized image aesthetics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 638–647.
[56]
S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks.IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137.
[57]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53–65.
[58]
Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 1–3 (2008), 157–173.
[59]
A. Samii, R. Měch, and Z. Lin. 2015. Data-driven automatic cropping using semantic composition search. Comput. Graph. Forum 34, 1 (2015), 141–151.
[60]
Anthony Santella, Maneesh Agrawala, Doug DeCarlo, David Salesin, and Michael Cohen. 2006. Gaze-based interaction for semi-automatic photo cropping. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’06). ACM, New York, NY, 771–780.
[61]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 806–813.
[62]
Fred Stentiford. 2007. Attention based auto image cropping. In Proceedings of the International Conference on Computer Vision Systems.
[63]
Bongwon Suh, Haibin Ling, Benjamin B. Bederson, and David W. Jacobs. 2003. Automatic thumbnail cropping and its effectiveness. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST’03). ACM, New York, NY, 95–104.
[64]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, 5693–5703.
[65]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, 1–9.
[66]
Hossein Talebi and Peyman Milanfar. 2018. Nima: Neural image assessment. IEEE Trans. Image Process. 27, 8 (2018), 3998–4011.
[67]
Roberto Valenzuela. 2012. Picture Perfect Practice: A Self-training Guide to Mastering the Challenges of Taking Photographs. New Riders, Indianapolis, IN.
[68]
Patricia P. Wang, Wei Zhang, Jianguo Li, and Yimin Zhang. 2008. Online photography assistance by exploring geo-referenced photos on MID/UMPC. In Workshop on Multimedia Signal Processing. IEEE, 6–10.
[69]
Zijun Wei, Jianming Zhang, Xiaohui Shen, Zhe Lin, Radomír Mech, Minh Hoai, and Dimitris Samaras. 2018. Good view hunting: Learning photo composition from dense view pairs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 5437–5446.
[70]
Lai-Kuan Wong and Kok-Lim Low. 2009. Saliency-enhanced image aesthetics class prediction. In Proceedings of the IEEE Conference on Image Processing. IEEE, 997–1000.
[71]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In Proceedings of the International Conference on Machine Learning (ICML’16). PMLR, 478–487.
[72]
Jianzhou Yan, Stephen Lin, Sing Kang, and Xiaoou Tang. 2013. Learning the change for automatic image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 971–978.
[73]
Che-Hua Yeh, Brian A. Barsky, and Ming Ouhyoung. 2014. Personalized photograph ranking and selection system considering positive and negative user feedback. ACM Trans. Multimedia Comput. Commun. Appl. 10, 4 (2014), 1–20.
[74]
Wenyuan Yin, Tao Mei, Chang Wen Chen, and Shipeng Li. 2013. Socialized mobile photography: Learning to photograph with social context via mobile devices. IEEE Trans. Multimedia 16, 1 (2013), 184–200.
[75]
Mingju Zhang, Lei Zhang, Yanfeng Sun, Lin Feng, and Wei-ying Ma. 2005. Auto cropping for digital photographs. In Proceedings of the IEEE Conference on Multimedia and Expo. IEEE.
[76]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 2881–2890.
[77]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ADE20K dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 633–641.
[78]
Zihan Zhou, Farshid Farhat, and James Z. Wang. 2017. Detecting dominant vanishing points in natural scenes with application to composition-sensitive image retrieval. IEEE Trans. Multimedia 19, 12 (2017), 2651–2665.

Cited By

View all
  • (2023)The Inter-Relationship Between Photographic Aesthetics and Technical QualityModeling Visual Aesthetics, Emotion, and Artistic Style10.1007/978-3-031-50269-9_14(231-255)Online publication date: 25-Nov-2023
  • (2022)Augmented Reality Based Video Shooting Guidance for Novice UsersProceedings of the ACM on Human-Computer Interaction10.1145/35467506:MHCI(1-20)Online publication date: 20-Sep-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
January 2022
517 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505205
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2022
Accepted: 01 April 2021
Revised: 01 April 2021
Received: 01 May 2020
Published in TOMM Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Image aesthetics
  2. deep learning
  3. image retrieval
  4. recommender system

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)11
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Inter-Relationship Between Photographic Aesthetics and Technical QualityModeling Visual Aesthetics, Emotion, and Artistic Style10.1007/978-3-031-50269-9_14(231-255)Online publication date: 25-Nov-2023
  • (2022)Augmented Reality Based Video Shooting Guidance for Novice UsersProceedings of the ACM on Human-Computer Interaction10.1145/35467506:MHCI(1-20)Online publication date: 20-Sep-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media