research-article

Robust facial marker tracking based on a synthetic analysis of optical flows and the YOLO network

Authors:

Wei ZhangAuthors Info & Claims

The Visual Computer, Volume 40, Issue 4

Pages 2471 - 2489

https://doi.org/10.1007/s00371-023-02931-w

Published: 12 June 2023 Publication History

Abstract

Current marker-based facial motion capture methods might lose the target markers in some cases, such as those with considerable occlusion and blur. Manually revising these statuses requires extensive labor-intensive work. Thus, a robust marker tracking method that provides long-term stability must be developed, thereby simplifying manual operations. In this paper, we present a new facial marker tracking system that focuses on the accuracy and stability of performance capture. The tracking system includes a synthetic analysis step with the robust optical flow tracking method and the proposed Marker-YOLO detector. To illustrate the strength of our system, a real dataset of the performance of voluntary actors was obtained, and ground truth labels were given by artists for subsequent experiments. The results showed that our approach outperforms state-of-the-art trackers such as SiamDW and ECO in specific tasks while running at a real-time speed of 38 fps. The root-mean-squared error and area under the curve results verified the improvements in the accuracy and stability of our approach.

References

[1]

Ekman P Facial expression and emotion Am. Psychol. 1993 48 4 384-392

[2]

Nusseck M, Cunningham DW, Wallraven C, and Bülthoff HH The contribution of different facial regions to the recognition of conversational expressions J. Vis. 2008 8 8 1-1

[3]

Luo, L., Weng, D., Ding, N., Hao, J., Tu, Z.: The effect of avatar facial expressions on trust building in social virtual reality. Visual Comput. (2022)

[4]

Zollhöfer M, Thies J, Garrido P, Bradley D, Beeler T, Pérez P, Stamminger M, Nießner M, and Theobalt C State of the art on monocular 3d face reconstruction, tracking, and applications Comput. Graph. Forum 2018 37 2 523-550

[5]

Bhat, K.S., Goldenthal, R., Ye, Y., Mallet, R., Koperwas, M.: High fidelity facial animation capture and retargeting with contours. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’13, pp. 7–14. Association for Computing Machinery, New York, NY, USA (2013).

Digital Library

[6]

R3dS. WRAP4D. R3dS. WRAP4D. https://www.russian3dscanner.com/wrap4d/

[7]

Bregler, C., Bhat, K., Saltzman, J., Allen, B.: Ilm’s multitrack: a new visual tracking framework for high-end vfx production. In: SIGGRAPH 2009: Talks. SIGGRAPH ’09. Association for Computing Machinery, New York, NY, USA (2009).

Digital Library

[8]

Vicon Motion Systems Ltd. CaraPost. Vicon Motion Systems Ltd. CaraPost. https://www.vicon.com/

[9]

Jocher, G., Stoken, A., Borovec, J., NanoCode012, Chaurasia, A., TaoXie, Changyu, L., V, A., Laughing, tkianai, yxNONG, Hogan, A., lorenzomammana, AlexWang1900, Hajek, J., Diaconu, L., Marc, Kwon, Y., oleg, wanghaoyang0106, Defretin, Y., Lohia, A., ml5ah, Milanko, B., Fineran, B., Khromov, D., Yiwei, D., Doug, Durgesh, Ingham, F.: ultralytics/yolov5: v5.0 - yolov5-p6 1280 models, aws, supervise.ly and youtube integrations (2021).

[10]

Lindeberg T Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention Int. J. Comput. Vis. 1993 11 3 283-318

Digital Library

[11]

Jonker R and Volgenant A A shortest augmenting path algorithm for dense and sparse linear assignment problems Computing 1987 38 4 325-340

Digital Library

[12]

Williams, L.: Performance-driven facial animation. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’90, pp. 235–242. Association for Computing Machinery, New York, NY, USA (1990).

Digital Library

[13]

Guenter, B., Grimm, C., Wood, D., Malvar, H., Pighin, F.: Making faces. In: ACM SIGGRAPH 2006 Courses. SIGGRAPH ’06, p. 18. Association for Computing Machinery, New York, NY, USA (2006).

Digital Library

[14]

Lin I-C and Ouhyoung M Mirror mocap: automatic and efficient capture of dense 3D facial motion parameters from video Vis. Comput. 2005 21 6 355-372

[15]

Bickel B, Botsch M, Angst R, Matusik W, Otaduy M, Pfister H, and Gross M Multi-scale capture of facial geometry and motion ACM Trans. Graph. 2007 26 3 33

Digital Library

[16]

Bickel, B., Lang, M., Botsch, M., Otaduy, M.A., Gross, M.: Pose-space animation and transfer of facial details. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’08, pp. 57–66. Eurographics Association, Goslar, DEU (2008)

[17]

Borshukov, G., Montgomery, J., Werner, W.: Playable universal capture: Compression and real-time sequencing of image-based facial animation. In: ACM SIGGRAPH 2006 Courses. SIGGRAPH ’06, p. 8. Association for Computing Machinery, New York, NY, USA (2006).

Digital Library

[18]

Choe B, Lee H, and Ko H-S Performance-driven muscle-based facial animation J. Vis. Comput. Anim. 2001 12 2 67-79

[19]

Huang, H., Chai, J., Tong, X., Wu, H.-T.: Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. In: ACM SIGGRAPH 2011 Papers. SIGGRAPH ’11. Association for Computing Machinery, New York, NY, USA (2011).

Digital Library

[20]

Ravikumar, S., Davidson, C., Kit, D., Campbell, N., Benedetti, L., Cosker, D.: Reading between the dots: Combining 3d markers and facs classification for high-quality blendshape facial animation. In: Proceedings of Graphics Interface 2016. GI 2016, pp. 143–151. Canadian Human-Computer Communications Society/Société canadienne du dialogue humain-machine (2016).

[21]

Fang X, Wei X, Zhang Q, and Zhou D Forward non-rigid motion tracking for facial mocap Vis. Comput. 2014 30 2 139-157

Digital Library

[22]

Moser, L., Hendler, D., Roble, D.: Masquerade: Fine-scale details for head-mounted camera motion capture data. In: ACM SIGGRAPH 2017 Talks. SIGGRAPH ’17. Association for Computing Machinery, New York, NY, USA (2017).

Digital Library

[23]

Moser, L., Williams, M., Hendler, D., Roble, D.: High-quality, cost-effective facial motion capture pipeline with 3d regression. In: ACM SIGGRAPH 2018 Talks. SIGGRAPH ’18. Association for Computing Machinery, New York, NY, USA (2018).

Digital Library

[24]

Cootes TF, Edwards GJ, and Taylor CJ Active appearance models IEEE Trans. Pattern Anal. Mach. Intell. 2001 23 6 681-685

Digital Library

[25]

Chuang, E., Bregler, C.: Performance driven facial animation using blendshape interpolation. Computer Science Technical Report, Stanford University 2(2), 3 (2002)

[26]

Chai, J.-x., Xiao, J., Hodgins, J.: Vision-based control of 3d facial animation. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’03, pp. 193–206. Eurographics Association, Goslar, DEU (2003)

[27]

Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), pp. 117–124 (2011).

[28]

Moiza G, Tal A, Shimshoni I, Barnett D, and Moses Y Image-based animation of facial expressions Vis. Comput. 2002 18 7 445-467

[29]

Cao C, Hou Q, and Zhou K Displaced dynamic expression regression for real-time facial tracking and animation ACM Trans. Graph. 2014

Digital Library

[30]

Liu S, Wang J, Zhang M, and Wang Z Three-dimensional cartoon facial animation based on art rules Vis. Comput. 2013 29 11 1135-1149

Digital Library

[31]

Wu C, Bradley D, Gross M, and Beeler T An anatomically-constrained local deformation model for monocular face capture ACM Trans. Graph. 2016

Digital Library

[32]

Barrielle V and Stoiber N Realtime performance-driven physical simulation for facial animation Comput. Graph. Forum 2019 38 1 151-166

[33]

IMAGE METRICS. Live Driver

^{TM}

. IMAGE METRICS. Live Driver

^{TM}

. http://www.image-metrics.com

[34]

DYNAMIXYZ. HMC & Grabber

^{TM}

. DYNAMIXYZ. HMC & Grabber

^{TM}

. http://www.dynamixyz.com

[35]

Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence—Volume 2. IJCAI’81, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1981)

[36]

Bouguet J-Y Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm Intel corporation 2001 5 1–10 4

[37]

Blache L, Loscos C, and Lucas L Robust motion flow for mesh tracking of freely moving actors Vis. Comput. 2016 32 2 205-216

Digital Library

[38]

Zhao J, Mao X, and Zhang J Learning deep facial expression features from image and optical flow sequences using 3D CNN Vis. Comput. 2018 34 10 1461-1475

Digital Library

[39]

Kim, Y.H., Martínez, A.M., Kak, A.C.: A local approach for robust optical flow estimation under varying illumination. In: Proceedings of the British Machine Vision Conference, pp. 91–19110. BMVA Press, UK (2004).

[40]

Senst T, Eiselein V, and Sikora T Robust local optical flow for feature tracking IEEE Trans. Circuits Syst. Video Technol. 2012 22 9 1377-1387

Digital Library

[41]

Senst, T., Geistert, J., Sikora, T.: Robust local optical flow: Long-range motions and varying illuminations. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 4478–4482 (2016).

[42]

Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

[43]

Vihlman M and Visala A Optical flow in deep visual tracking Proceedings of the AAAI Conference on Artificial Intelligence 2020 34 07 12112-12119

[44]

Qu Z, Shi H, Tan S, Song B, and Tao Y A flow-guided self-calibration siamese network for visual tracking Vis. Comput. 2023 39 2 625-637

Digital Library

[45]

King DE Dlib-ml: A machine learning toolkit J. Mach. Learn. Res. 2009 10 1755-1758

Digital Library

[46]

Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)

[47]

He K, Zhang X, Ren S, and Sun J Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Trans. Pattern Anal. Mach. Intell. 2015 37 9 1904-1916

Digital Library

[48]

Wang, K., Liew, J.H., Zou, Y., Zhou, D., Feng, J.: Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

[49]

Ekman, P., Friesen, W., Hager, J.: The Facial Action Coding System (2002)

[50]

Arriaga, O., Valdenegro-Toro, M., Plöger, P.: Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557 (2017)

[51]

Ma B, Huang L, Shen J, Shao L, Yang M-H, and Porikli F Visual tracking under motion blur IEEE Trans. Image Process. 2016 25 12 5867-5876

Digital Library

[52]

Guo, Q., Cheng, Z., Juefei-Xu, F., Ma, L., Xie, X., Liu, Y., Zhao, J.: Learning to adversarially blur visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10839–10848 (2021)

[53]

Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

[54]

Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

[55]

Bhat G, Danelljan M, Van Gool L, and Timofte R Vedaldi A, Bischof H, Brox T, and Frahm J-M Know your surroundings: Exploiting scene information for object tracking Computer Vision—ECCV 2020 2020 Cham Springer 205-221

Digital Library

[56]

Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference 2014. BMVA Press, UK (2014).

[57]

Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

[58]

Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

[59]

Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

[60]

Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

[61]

Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014)

[62]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

[63]

Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 194–119417 (2017)

Recommendations

Robust tracking via monocular active vision for an intelligent teaching system

The research of this paper investigates a practical intelligent tracking teaching system, addressing the problem of teacher detection and tracking via monocular active vision in real time. The split lines and position-based visual servo rules are ...
Robust visual tracking using randomized forest and online appearance model
ACIIDS'11: Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II

We propose a robust tracker based on tracking, learning and detection to follow an object in a long term. Our tracker consists of three different parts: a short term tracker, a detector, and an online object model. For the shortterm tracker, we employ ...
Online optical marker-based hand tracking with deep labels

Optical marker-based motion capture is the dominant way for obtaining high-fidelity human body animation for special effects, movies, and video games. However, motion capture has seen limited application to the human hand due to the difficulty of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 40, Issue 4

Apr 2024

748 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 12 June 2023

Accepted: 26 May 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Key-Area Research and Development Program of Guangdong Province
National Natural Science Foundation of China
111 Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents