Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Robust facial marker tracking based on a synthetic analysis of optical flows and the YOLO network

Published: 12 June 2023 Publication History

Abstract

Current marker-based facial motion capture methods might lose the target markers in some cases, such as those with considerable occlusion and blur. Manually revising these statuses requires extensive labor-intensive work. Thus, a robust marker tracking method that provides long-term stability must be developed, thereby simplifying manual operations. In this paper, we present a new facial marker tracking system that focuses on the accuracy and stability of performance capture. The tracking system includes a synthetic analysis step with the robust optical flow tracking method and the proposed Marker-YOLO detector. To illustrate the strength of our system, a real dataset of the performance of voluntary actors was obtained, and ground truth labels were given by artists for subsequent experiments. The results showed that our approach outperforms state-of-the-art trackers such as SiamDW and ECO in specific tasks while running at a real-time speed of 38 fps. The root-mean-squared error and area under the curve results verified the improvements in the accuracy and stability of our approach.

References

[1]
Ekman P Facial expression and emotion Am. Psychol. 1993 48 4 384-392
[2]
Nusseck M, Cunningham DW, Wallraven C, and Bülthoff HH The contribution of different facial regions to the recognition of conversational expressions J. Vis. 2008 8 8 1-1
[3]
Luo, L., Weng, D., Ding, N., Hao, J., Tu, Z.: The effect of avatar facial expressions on trust building in social virtual reality. Visual Comput. (2022)
[4]
Zollhöfer M, Thies J, Garrido P, Bradley D, Beeler T, Pérez P, Stamminger M, Nießner M, and Theobalt C State of the art on monocular 3d face reconstruction, tracking, and applications Comput. Graph. Forum 2018 37 2 523-550
[5]
Bhat, K.S., Goldenthal, R., Ye, Y., Mallet, R., Koperwas, M.: High fidelity facial animation capture and retargeting with contours. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’13, pp. 7–14. Association for Computing Machinery, New York, NY, USA (2013).
[7]
Bregler, C., Bhat, K., Saltzman, J., Allen, B.: Ilm’s multitrack: a new visual tracking framework for high-end vfx production. In: SIGGRAPH 2009: Talks. SIGGRAPH ’09. Association for Computing Machinery, New York, NY, USA (2009).
[8]
Vicon Motion Systems Ltd. CaraPost. Vicon Motion Systems Ltd. CaraPost. https://www.vicon.com/
[9]
Jocher, G., Stoken, A., Borovec, J., NanoCode012, Chaurasia, A., TaoXie, Changyu, L., V, A., Laughing, tkianai, yxNONG, Hogan, A., lorenzomammana, AlexWang1900, Hajek, J., Diaconu, L., Marc, Kwon, Y., oleg, wanghaoyang0106, Defretin, Y., Lohia, A., ml5ah, Milanko, B., Fineran, B., Khromov, D., Yiwei, D., Doug, Durgesh, Ingham, F.: ultralytics/yolov5: v5.0 - yolov5-p6 1280 models, aws, supervise.ly and youtube integrations (2021).
[10]
Lindeberg T Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention Int. J. Comput. Vis. 1993 11 3 283-318
[11]
Jonker R and Volgenant A A shortest augmenting path algorithm for dense and sparse linear assignment problems Computing 1987 38 4 325-340
[12]
Williams, L.: Performance-driven facial animation. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’90, pp. 235–242. Association for Computing Machinery, New York, NY, USA (1990).
[13]
Guenter, B., Grimm, C., Wood, D., Malvar, H., Pighin, F.: Making faces. In: ACM SIGGRAPH 2006 Courses. SIGGRAPH ’06, p. 18. Association for Computing Machinery, New York, NY, USA (2006).
[14]
Lin I-C and Ouhyoung M Mirror mocap: automatic and efficient capture of dense 3D facial motion parameters from video Vis. Comput. 2005 21 6 355-372
[15]
Bickel B, Botsch M, Angst R, Matusik W, Otaduy M, Pfister H, and Gross M Multi-scale capture of facial geometry and motion ACM Trans. Graph. 2007 26 3 33
[16]
Bickel, B., Lang, M., Botsch, M., Otaduy, M.A., Gross, M.: Pose-space animation and transfer of facial details. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’08, pp. 57–66. Eurographics Association, Goslar, DEU (2008)
[17]
Borshukov, G., Montgomery, J., Werner, W.: Playable universal capture: Compression and real-time sequencing of image-based facial animation. In: ACM SIGGRAPH 2006 Courses. SIGGRAPH ’06, p. 8. Association for Computing Machinery, New York, NY, USA (2006).
[18]
Choe B, Lee H, and Ko H-S Performance-driven muscle-based facial animation J. Vis. Comput. Anim. 2001 12 2 67-79
[19]
Huang, H., Chai, J., Tong, X., Wu, H.-T.: Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. In: ACM SIGGRAPH 2011 Papers. SIGGRAPH ’11. Association for Computing Machinery, New York, NY, USA (2011).
[20]
Ravikumar, S., Davidson, C., Kit, D., Campbell, N., Benedetti, L., Cosker, D.: Reading between the dots: Combining 3d markers and facs classification for high-quality blendshape facial animation. In: Proceedings of Graphics Interface 2016. GI 2016, pp. 143–151. Canadian Human-Computer Communications Society/Société canadienne du dialogue humain-machine (2016).
[21]
Fang X, Wei X, Zhang Q, and Zhou D Forward non-rigid motion tracking for facial mocap Vis. Comput. 2014 30 2 139-157
[22]
Moser, L., Hendler, D., Roble, D.: Masquerade: Fine-scale details for head-mounted camera motion capture data. In: ACM SIGGRAPH 2017 Talks. SIGGRAPH ’17. Association for Computing Machinery, New York, NY, USA (2017).
[23]
Moser, L., Williams, M., Hendler, D., Roble, D.: High-quality, cost-effective facial motion capture pipeline with 3d regression. In: ACM SIGGRAPH 2018 Talks. SIGGRAPH ’18. Association for Computing Machinery, New York, NY, USA (2018).
[24]
Cootes TF, Edwards GJ, and Taylor CJ Active appearance models IEEE Trans. Pattern Anal. Mach. Intell. 2001 23 6 681-685
[25]
Chuang, E., Bregler, C.: Performance driven facial animation using blendshape interpolation. Computer Science Technical Report, Stanford University 2(2), 3 (2002)
[26]
Chai, J.-x., Xiao, J., Hodgins, J.: Vision-based control of 3d facial animation. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. SCA ’03, pp. 193–206. Eurographics Association, Goslar, DEU (2003)
[27]
Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), pp. 117–124 (2011).
[28]
Moiza G, Tal A, Shimshoni I, Barnett D, and Moses Y Image-based animation of facial expressions Vis. Comput. 2002 18 7 445-467
[29]
Cao C, Hou Q, and Zhou K Displaced dynamic expression regression for real-time facial tracking and animation ACM Trans. Graph. 2014
[30]
Liu S, Wang J, Zhang M, and Wang Z Three-dimensional cartoon facial animation based on art rules Vis. Comput. 2013 29 11 1135-1149
[31]
Wu C, Bradley D, Gross M, and Beeler T An anatomically-constrained local deformation model for monocular face capture ACM Trans. Graph. 2016
[32]
Barrielle V and Stoiber N Realtime performance-driven physical simulation for facial animation Comput. Graph. Forum 2019 38 1 151-166
[33]
IMAGE METRICS. Live DriverTM. IMAGE METRICS. Live DriverTM. http://www.image-metrics.com
[34]
DYNAMIXYZ. HMC & GrabberTM. DYNAMIXYZ. HMC & GrabberTM. http://www.dynamixyz.com
[35]
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence—Volume 2. IJCAI’81, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1981)
[36]
Bouguet J-Y Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm Intel corporation 2001 5 1–10 4
[37]
Blache L, Loscos C, and Lucas L Robust motion flow for mesh tracking of freely moving actors Vis. Comput. 2016 32 2 205-216
[38]
Zhao J, Mao X, and Zhang J Learning deep facial expression features from image and optical flow sequences using 3D CNN Vis. Comput. 2018 34 10 1461-1475
[39]
Kim, Y.H., Martínez, A.M., Kak, A.C.: A local approach for robust optical flow estimation under varying illumination. In: Proceedings of the British Machine Vision Conference, pp. 91–19110. BMVA Press, UK (2004).
[40]
Senst T, Eiselein V, and Sikora T Robust local optical flow for feature tracking IEEE Trans. Circuits Syst. Video Technol. 2012 22 9 1377-1387
[41]
Senst, T., Geistert, J., Sikora, T.: Robust local optical flow: Long-range motions and varying illuminations. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 4478–4482 (2016).
[42]
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
[43]
Vihlman M and Visala A Optical flow in deep visual tracking Proceedings of the AAAI Conference on Artificial Intelligence 2020 34 07 12112-12119
[44]
Qu Z, Shi H, Tan S, Song B, and Tao Y A flow-guided self-calibration siamese network for visual tracking Vis. Comput. 2023 39 2 625-637
[45]
King DE Dlib-ml: A machine learning toolkit J. Mach. Learn. Res. 2009 10 1755-1758
[46]
Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
[47]
He K, Zhang X, Ren S, and Sun J Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Trans. Pattern Anal. Mach. Intell. 2015 37 9 1904-1916
[48]
Wang, K., Liew, J.H., Zou, Y., Zhou, D., Feng, J.: Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
[49]
Ekman, P., Friesen, W., Hager, J.: The Facial Action Coding System (2002)
[50]
Arriaga, O., Valdenegro-Toro, M., Plöger, P.: Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557 (2017)
[51]
Ma B, Huang L, Shen J, Shao L, Yang M-H, and Porikli F Visual tracking under motion blur IEEE Trans. Image Process. 2016 25 12 5867-5876
[52]
Guo, Q., Cheng, Z., Juefei-Xu, F., Ma, L., Xie, X., Liu, Y., Zhao, J.: Learning to adversarially blur visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10839–10848 (2021)
[53]
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[54]
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
[55]
Bhat G, Danelljan M, Van Gool L, and Timofte R Vedaldi A, Bischof H, Brox T, and Frahm J-M Know your surroundings: Exploiting scene information for object tracking Computer Vision—ECCV 2020 2020 Cham Springer 205-221
[56]
Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference 2014. BMVA Press, UK (2014).
[57]
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[58]
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
[59]
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[60]
Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
[61]
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014)
[62]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[63]
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 194–119417 (2017)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics
The Visual Computer: International Journal of Computer Graphics  Volume 40, Issue 4
Apr 2024
748 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 12 June 2023
Accepted: 26 May 2023

Author Tags

  1. Motion capture
  2. Head-mounted cameras
  3. Long-term tracking

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media