Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3689094.3689464acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

Hand Gesture Recognition in Buddhist Art Images: Evaluation of a Keypoint-based Approach

Published: 28 October 2024 Publication History

Abstract

In this paper, we explore the application of keypoint-based human pose estimation techniques for recognizing hand gestures, known as "mudra"s in Buddhist art images. We propose a pipeline that utilizes state-of-the-art pose estimation models to detect and classify gestures from statues, sculptures, and paintings of Buddha. Our approach involves extracting extended and normalized keypoint features from hand gestures and evaluating their effectiveness in classification tasks. We validate our method using a dataset of 543 images, achieving an average classification accuracy over 0.7 across seven gesture categories. Our results suggest that keypoint-based features provide a viable representation for gesture recognition, potentially offering a reusable framework for analyzing other types of cultural and artistic content. We also discuss the challenges and implications of applying pose estimation techniques to historical art forms, contributing to the broader field of AI-assisted cultural heritage analysis.

References

[1]
Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. 2018. PoseTrack: A Benchmark for Human Pose Estimation and Tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5167--5176. https://doi.org/10.1109/CVPR.2018.00542
[2]
Mykhaylo Andriluka, Leonid Pishchulin, Peter V. Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3686--3693. https://doi.org/10.1109/CVPR.2014.471
[3]
Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, and Neeraj Kumar. 2013. Localizing Parts of Faces Using a Consensus of Exemplars. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 35, 12 (2013), 2930--2940. https://doi.org/10.1109/TPAMI.2013.23
[4]
Valentine Bernasconi, Eva Cetinic, and Leonardo Impett. 2023. A Computational Approach to Hand Pose Recognition in Early Modern Paintings. J. Imaging, Vol. 9, 6 (2023), 120. https://doi.org/10.3390/JIMAGING9060120
[5]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 1 (2021), 172--186. https://doi.org/10.1109/TPAMI.2019.2929257
[6]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1302--1310. https://doi.org/10.1109/CVPR.2017.143
[7]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv preprint arXiv:1906.07155 (2019).
[8]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded Pyramid Network for Multi-Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7103--7112. https://doi.org/10.1109/CVPR.2018.00742
[9]
MMPose Contributors. 2020. OpenMMLab Pose Estimation Toolbox and Benchmark. https://github.com/open-mmlab/mmpose.
[10]
Noa Garcia, Benjamin Renoust, and Yuta Nakashima. 2019. Context-Aware Embeddings for Automatic Art Analysis. In International Conference on Multimedia Retrieval (ICMR). 25--33. https://doi.org/10.1145/3323873.3325028
[11]
Francisco Gomez-Donoso, Sergio Orts-Escolano, and Miguel Cazorla. 2019. Large-scale multiview 3D hand pose dataset. Image Vis. Comput., Vol. 81 (2019), 25--33. https://doi.org/10.1016/J.IMAVIS.2018.12.001
[12]
Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Whole-Body Human Pose Estimation in the Wild. In European Conference on Computer Vision (ECCV). 196--214.
[13]
Marta Kipke, Lukas Brinkmeyer, Souaybou Bagayoko, Lars Schmidt-Thieme, and Martin Langner. 2022. Deep Level Annotation for Painter Attribution on Greek Vases utilizing Object Detection. In ACM International Workshop on Structuring and Understanding of Multimedia HeritAge Contents (Lisboa, Portugal). 23--31. https://doi.org/10.1145/3552464.3555684
[14]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV). 740--755. https://doi.org/10.1007/978--3--319--10602--1_48
[15]
Wentao Liu, Jie Chen, Cheng Li, Chen Qian, Xiao Chu, and Xiaolin Hu. 2018. A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. 7170--7177. https://doi.org/10.1609/AAAI.V32I1.12334
[16]
Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas K. Maier, and Vincent Christlein. 2023. Enhancing Human Pose Estimation in Ancient Vase Paintings via Perceptually-grounded Style Transfer Learning. ACM Journal on Computing and Cultural Heritage, Vol. 16, 1 (2023), 1--17. https://doi.org/10.1145/3569089
[17]
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. GANerated Hands for Real-Time 3D Hand Tracking From Monocular RGB. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation / IEEE Computer Society, 49--59. https://doi.org/10.1109/CVPR.2018.00013
[18]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In European Conference on Computer Vision (ECCV). 483--499. https://doi.org/10.1007/978--3--319--46484--8_29
[19]
George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In European Conference on Computer Vision (ECCV), Vol. 11218. 282--299. https://doi.org/10.1007/978--3-030-01264--9_17
[20]
Yiming Qian, Cheikh Brahim El Vaigh, Yuta Nakashima, Benjamin Renoust, Hajime Nagahara, and Yutaka Fujioka. 2021. Built Year Prediction from Buddha Face with Heterogeneous Labels. In ACM International Workshop on Structuring and Understanding of Multimedia heritAge Contents. 5--12. https://doi.org/10.1145/3475720.3484441
[21]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 6 (Jun 2017), 1137--1149.
[22]
Joseph Schlecht, Bernd Carqué, and Björn Ommer. 2011. Detecting gestures in medieval images. In IEEE International Conference on Image Processing (ICIP). 1285--1288. https://doi.org/10.1109/ICIP.2011.6115669
[23]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR). 5693--5703.
[24]
Nikolai Ufer, Max Simon, Sabine Lang, and Björn Ommer. 2021. Large-scale interactive retrieval in art collections using multi-style feature aggregation. PLOS ONE, Vol. 16, 11 (Nov. 2021), 1--38. https://doi.org/10.1371/journal.pone.0259718
[25]
Li Weng. 2023. Beyond Built Year Prediction: The Bag of Time Model and a Case Study of Buddha Images. In ACM International Workshop on AnalySis, Understanding and ProMotion of HeritAge Contents (Ottawa ON, Canada). 59--67. https://doi.org/10.1145/3607542.3617352
[26]
Laura Willot, Dan Vodislav, Valerie Gouet-Brunet, Livio De Luca, and Adeline Manuel. 2023. Clustering for the Analysis and Enrichment of Corpus of Images for the Spatio-temporal Monitoring of Restoration Sites. In ACM International Workshop on AnalySis, Understanding and ProMotion of HeritAge Contents (Ottawa ON, Canada). 39--47. https://doi.org/10.1145/3607542.3617353
[27]
Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple Baselines for Human Pose Estimation and Tracking. In European Conference on Computer Vision (ECCV). 472--487. https://doi.org/10.1007/978--3-030-01231--1_29
[28]
Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang. 2023. ZoomNAS: Searching for Whole-Body Human Pose Estimation in the Wild. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 45, 4 (2023), 5296--5313. https://doi.org/10.1109/TPAMI.2022.3197352
[29]
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7093--7102.
[30]
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2879--2886. https://doi.org/10.1109/CVPR.2012.6248014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SUMAC '24: Proceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents
October 2024
67 pages
ISBN:9798400712050
DOI:10.1145/3689094
  • Program Chairs:
  • Valerie Gouet-Brunet,
  • Ronak Kosti,
  • Li Weng
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. buddhist art
  2. cultural heritage
  3. hand gesture
  4. image classification
  5. keypoint detection
  6. pose estimation

Qualifiers

  • Research-article

Funding Sources

  • Zhejiang Federation of Humanities and Social Sciences

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

Overall Acceptance Rate 5 of 6 submissions, 83%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 21
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)21
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media