Nothing Special   »   [go: up one dir, main page]

Skip to main content

RICA\(^2\): Rubric-Informed, Calibrated Assessment of Actions

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

The ability to quantify how well an action is carried out, also known as action quality assessment (AQA), has attracted recent interest in the vision community. Unfortunately, prior methods often ignore the score rubric used by human experts and fall short of quantifying the uncertainty of the model prediction. To bridge the gap, we present RICA\(^2\) —a deep probabilistic model that integrates score rubric and accounts for prediction uncertainty for AQA. Central to our method lies in stochastic embeddings of action steps, defined on a graph structure that encodes the score rubric. The embeddings spread probabilistic density in the latent space and allow our method to represent model uncertainty. The graph encodes the scoring criteria, based on which the quality scores can be decoded. We demonstrate that our method establishes new state of the art on public benchmarks, including FineDiving, MTL-AQA, and JIGSAWS, with superior performance in score prediction and uncertainty calibration. Our code is available at https://abrarmajeedi.github.io/rica2_aqa/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For the sake of brevity, we omit the subscript as long as there is no confusion.

References

  1. Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. In: International Conference on Learning Representations (2016)

    Google Scholar 

  2. Bai, Y., et al.: Action quality assessment with temporal parsing transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, pp. 422–438. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19772-7_25

    Chapter  Google Scholar 

  3. Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)

  4. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  5. Chen, C.H., Hu, Y.H., Yen, T.Y., Radwin, R.G.: Automated video exposure assessment of repetitive hand activity level for a load transfer task. Hum. Factors 55(2), 298–308 (2013)

    Article  Google Scholar 

  6. Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)

    Google Scholar 

  7. Chung, H.W., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024). http://jmlr.org/papers/v25/23-0870.html

  8. Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  9. Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6202–6211 (2019). https://doi.org/10.1109/ICCV.2019.00630

  10. Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In: Modeling and Monitoring of Computer Assisted Interventions (M2CAI) – MICCAI Workshop (2014)

    Google Scholar 

  11. Gordon, A.S.: Automated video assessment of human performance. In: Proceedings of AI-ED, vol. 2 (1995)

    Google Scholar 

  12. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)

    Google Scholar 

  13. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    Article  Google Scholar 

  14. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)

    Google Scholar 

  15. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=SJU4ayYgl

  16. Li, W., Huang, X., Lu, J., Feng, J., Zhou, J.: Learning probabilistic ordinal embeddings for uncertainty-aware regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13896–13905 (2021)

    Google Scholar 

  17. Likert, R.: A Technique for the Measurement of Attitudes. Archives of Psychology (1932)

    Google Scholar 

  18. Liu, D., et al.: Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)

    Google Scholar 

  19. Martin, J., Martin, J., et al.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 84(2), 273–278 (1997)

    Google Scholar 

  20. Matsuyama, H., Kawaguchi, N., Lim, B.Y.: IRIS: interpretable rubric-informed segmentation for action quality assessment. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 368–378 (2023)

    Google Scholar 

  21. Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1059–1069. ACL (2014)

    Google Scholar 

  22. Oh, S.J., Gallagher, A.C., Murphy, K.P., Schroff, F., Pan, J., Roth, J.: Modeling uncertainty with hedged instance embeddings. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=r1xQQhAqKX

  23. Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6331–6340 (2019)

    Google Scholar 

  24. Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 304–313 (2019)

    Google Scholar 

  25. Parmar, P., Tran Morris, B.: Learning to score Olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)

    Google Scholar 

  26. Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36

    Chapter  Google Scholar 

  27. Prassas, S., Kwon, Y.H., Sands, W.A.: Biomechanical research in artistic gymnastics: a review. Sports Biomech. 5(2), 261–291 (2006)

    Article  Google Scholar 

  28. Qiu, Y., Wang, J., Jin, Z., Chen, H., Zhang, M., Guo, L.: Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomed. Sig. Process. Control 72, 103323 (2022)

    Article  Google Scholar 

  29. Santoro, A., et al.: A simple neural network module for relational reasoning. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  30. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)

    Article  Google Scholar 

  31. Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M.J., Putzgruber, D.: Cataract-101: video dataset of 101 cataract surgeries. In: César, P., Zink, M., Murray, N. (eds.) Proceedings of the 9th ACM Multimedia Systems Conference, MMSys 2018, Amsterdam, The Netherlands, June 12-15, 2018, pp. 421–425. ACM (2018)

    Google Scholar 

  32. Shi, Y., Jain, A.K.: Probabilistic face embeddings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6902–6911 (2019)

    Google Scholar 

  33. Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  34. Sun, J.J., Zhao, J., Chen, L.C., Schroff, F., Adam, H., Liu, T.: View-invariant probabilistic embedding for human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 53–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_4

    Chapter  Google Scholar 

  35. Tang, Y., et al.: Uncertainty-aware score distribution learning for action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9839–9848 (2020)

    Google Scholar 

  36. Tishby, N.: The information bottleneck method. In: Proceedings of the 37th Allerton Conference on Communication and Computation, 1999 (1999)

    Google Scholar 

  37. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  38. Vilnis, L., McCallum, A.: Word representations via Gaussian embedding. In: International Conference on Learning Representations (2015)

    Google Scholar 

  39. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. ECCV 2016. LNCS, vol. 9912. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

  40. Wang, S., Yang, D., Zhai, P., Chen, C., Zhang, L.: TSA-NET: tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4902–4910 (2021)

    Google Scholar 

  41. Waters, T.R., Putz-Anderson, V., Garg, A.: Applications Manual for the Revised NIOSH Lifting Equation (1994)

    Google Scholar 

  42. Xiao, F., Sigal, L., Jae Lee, Y.: Weakly-supervised visual grounding of phrases with linguistic structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5945–5954 (2017)

    Google Scholar 

  43. Xu, C., Fu, Y., Zhang, B., Chen, Z., Jiang, Y.G., Xue, X.: Learning to score figure skating sport videos. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4578–4590 (2019)

    Google Scholar 

  44. Xu, J., Rao, Y., Yu, X., Chen, G., Zhou, J., Lu, J.: FineDiving: a fine-grained dataset for procedure-aware action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2949–2958 (2022)

    Google Scholar 

  45. Xu, K., Li, J., Zhang, M., Du, S.S., ichi Kawarabayashi, K., Jegelka, S.: What can neural networks reason about? In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rJxbJeHFPS

  46. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  47. Yu, X., Rao, Y., Zhao, W., Lu, J., Zhou, J.: Group-aware contrastive regression for action quality assessment. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7899–7908. IEEE Computer Society, Los Alamitos, CA, USA (2021)

    Google Scholar 

  48. Zhang, B., Chen, J., Xu, Y., Zhang, H., Yang, X., Geng, X.: Auto-encoding score distribution regression for action quality assessment. Neural Comput. Appl. 36(2), 929–942 (2023)

    Google Scholar 

  49. Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. Int. J. Comput. Vis. 126(10), 1084–1102 (2018)

    Google Scholar 

  50. Zhou, C., Huang, Y.: Uncertainty-driven action quality assessment. arXiv preprint arXiv:2207.14513 (2022)

  51. Zhou, K., Ma, Y., Shum, H.P.H., Liang, X.: Hierarchical graph convolutional networks for action quality assessment. IEEE Trans. Circ. Syst. Vid. Technol. 33(12), 7749–7763 (2023)

    Google Scholar 

  52. Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Soft proposal networks for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1841–1850 (2017)

    Google Scholar 

Download references

Acknowledgement:

This work was supported by the UW Madison Office of the Vice Chancellor for Research with funding from the Wisconsin Alumni Research Foundation, by National Science Foundation under Grant No. CNS 2333491, and by the Army Research Lab under contract number W911NF-2020221.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1922 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Majeedi, A., Gajjala, V.R., GNVV, S.S.S.N., Li, Y. (2025). RICA\(^2\): Rubric-Informed, Calibrated Assessment of Actions. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15121. Springer, Cham. https://doi.org/10.1007/978-3-031-73036-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73036-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73035-1

  • Online ISBN: 978-3-031-73036-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics