RICA $$^2$$ : Rubric-Informed, Calibrated Assessment of Actions

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15121))

Included in the following conference series:

European Conference on Computer Vision

171 Accesses

Abstract

The ability to quantify how well an action is carried out, also known as action quality assessment (AQA), has attracted recent interest in the vision community. Unfortunately, prior methods often ignore the score rubric used by human experts and fall short of quantifying the uncertainty of the model prediction. To bridge the gap, we present RICA$^2$ —a deep probabilistic model that integrates score rubric and accounts for prediction uncertainty for AQA. Central to our method lies in stochastic embeddings of action steps, defined on a graph structure that encodes the score rubric. The embeddings spread probabilistic density in the latent space and allow our method to represent model uncertainty. The graph encodes the scoring criteria, based on which the quality scores can be decoded. We demonstrate that our method establishes new state of the art on public benchmarks, including FineDiving, MTL-AQA, and JIGSAWS, with superior performance in score prediction and uncertainty calibration. Our code is available at https://abrarmajeedi.github.io/rica2_aqa/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment

Vamos: Versatile Action Models for Video Understanding

Auto-encoding score distribution regression for action quality assessment

Article 03 October 2023

Notes

1.
For the sake of brevity, we omit the subscript as long as there is no confusion.

References

Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. In: International Conference on Learning Representations (2016)
Google Scholar
Bai, Y., et al.: Action quality assessment with temporal parsing transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, pp. 422–438. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19772-7_25
Chapter Google Scholar
Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Chen, C.H., Hu, Y.H., Yen, T.Y., Radwin, R.G.: Automated video exposure assessment of repetitive hand activity level for a load transfer task. Hum. Factors 55(2), 298–308 (2013)
Article Google Scholar
Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)
Google Scholar
Chung, H.W., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024). http://jmlr.org/papers/v25/23-0870.html
Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6202–6211 (2019). https://doi.org/10.1109/ICCV.2019.00630
Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In: Modeling and Monitoring of Computer Assisted Interventions (M2CAI) – MICCAI Workshop (2014)
Google Scholar
Gordon, A.S.: Automated video assessment of human performance. In: Proceedings of AI-ED, vol. 2 (1995)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)
Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Article Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=SJU4ayYgl
Li, W., Huang, X., Lu, J., Feng, J., Zhou, J.: Learning probabilistic ordinal embeddings for uncertainty-aware regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13896–13905 (2021)
Google Scholar
Likert, R.: A Technique for the Measurement of Attitudes. Archives of Psychology (1932)
Google Scholar
Liu, D., et al.: Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)
Google Scholar
Martin, J., Martin, J., et al.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 84(2), 273–278 (1997)
Google Scholar
Matsuyama, H., Kawaguchi, N., Lim, B.Y.: IRIS: interpretable rubric-informed segmentation for action quality assessment. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 368–378 (2023)
Google Scholar
Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1059–1069. ACL (2014)
Google Scholar
Oh, S.J., Gallagher, A.C., Murphy, K.P., Schroff, F., Pan, J., Roth, J.: Modeling uncertainty with hedged instance embeddings. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=r1xQQhAqKX
Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6331–6340 (2019)
Google Scholar
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 304–313 (2019)
Google Scholar
Parmar, P., Tran Morris, B.: Learning to score Olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Google Scholar
Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36
Chapter Google Scholar
Prassas, S., Kwon, Y.H., Sands, W.A.: Biomechanical research in artistic gymnastics: a review. Sports Biomech. 5(2), 261–291 (2006)
Article Google Scholar
Qiu, Y., Wang, J., Jin, Z., Chen, H., Zhang, M., Guo, L.: Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomed. Sig. Process. Control 72, 103323 (2022)
Article Google Scholar
Santoro, A., et al.: A simple neural network module for relational reasoning. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M.J., Putzgruber, D.: Cataract-101: video dataset of 101 cataract surgeries. In: César, P., Zink, M., Murray, N. (eds.) Proceedings of the 9th ACM Multimedia Systems Conference, MMSys 2018, Amsterdam, The Netherlands, June 12-15, 2018, pp. 421–425. ACM (2018)
Google Scholar
Shi, Y., Jain, A.K.: Probabilistic face embeddings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6902–6911 (2019)
Google Scholar
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Sun, J.J., Zhao, J., Chen, L.C., Schroff, F., Adam, H., Liu, T.: View-invariant probabilistic embedding for human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 53–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_4
Chapter Google Scholar
Tang, Y., et al.: Uncertainty-aware score distribution learning for action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9839–9848 (2020)
Google Scholar
Tishby, N.: The information bottleneck method. In: Proceedings of the 37th Allerton Conference on Communication and Computation, 1999 (1999)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Vilnis, L., McCallum, A.: Word representations via Gaussian embedding. In: International Conference on Learning Representations (2015)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. ECCV 2016. LNCS, vol. 9912. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Wang, S., Yang, D., Zhai, P., Chen, C., Zhang, L.: TSA-NET: tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4902–4910 (2021)
Google Scholar
Waters, T.R., Putz-Anderson, V., Garg, A.: Applications Manual for the Revised NIOSH Lifting Equation (1994)
Google Scholar
Xiao, F., Sigal, L., Jae Lee, Y.: Weakly-supervised visual grounding of phrases with linguistic structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5945–5954 (2017)
Google Scholar
Xu, C., Fu, Y., Zhang, B., Chen, Z., Jiang, Y.G., Xue, X.: Learning to score figure skating sport videos. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4578–4590 (2019)
Google Scholar
Xu, J., Rao, Y., Yu, X., Chen, G., Zhou, J., Lu, J.: FineDiving: a fine-grained dataset for procedure-aware action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2949–2958 (2022)
Google Scholar
Xu, K., Li, J., Zhang, M., Du, S.S., ichi Kawarabayashi, K., Jegelka, S.: What can neural networks reason about? In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rJxbJeHFPS
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Yu, X., Rao, Y., Zhao, W., Lu, J., Zhou, J.: Group-aware contrastive regression for action quality assessment. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7899–7908. IEEE Computer Society, Los Alamitos, CA, USA (2021)
Google Scholar
Zhang, B., Chen, J., Xu, Y., Zhang, H., Yang, X., Geng, X.: Auto-encoding score distribution regression for action quality assessment. Neural Comput. Appl. 36(2), 929–942 (2023)
Google Scholar
Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. Int. J. Comput. Vis. 126(10), 1084–1102 (2018)
Google Scholar
Zhou, C., Huang, Y.: Uncertainty-driven action quality assessment. arXiv preprint arXiv:2207.14513 (2022)
Zhou, K., Ma, Y., Shum, H.P.H., Liang, X.: Hierarchical graph convolutional networks for action quality assessment. IEEE Trans. Circ. Syst. Vid. Technol. 33(12), 7749–7763 (2023)
Google Scholar
Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Soft proposal networks for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1841–1850 (2017)
Google Scholar

Download references

Acknowledgement:

This work was supported by the UW Madison Office of the Vice Chancellor for Research with funding from the Wisconsin Alumni Research Foundation, by National Science Foundation under Grant No. CNS 2333491, and by the Army Research Lab under contract number W911NF-2020221.

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison Wisconsin, 53706, USA
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV & Yin Li

Authors

Abrar Majeedi
View author publications
You can also search for this author in PubMed Google Scholar
Viswanatha Reddy Gajjala
View author publications
You can also search for this author in PubMed Google Scholar
Satya Sai Srinath Namburi GNVV
View author publications
You can also search for this author in PubMed Google Scholar
Yin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1922 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Majeedi, A., Gajjala, V.R., GNVV, S.S.S.N., Li, Y. (2025). RICA$^2$: Rubric-Informed, Calibrated Assessment of Actions. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15121. Springer, Cham. https://doi.org/10.1007/978-3-031-73036-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-73036-8_9
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73035-1
Online ISBN: 978-3-031-73036-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RICA\(^2\): Rubric-Informed, Calibrated Assessment of Actions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment

Vamos: Versatile Action Models for Video Understanding

Auto-encoding score distribution regression for action quality assessment

Notes

References

Acknowledgement:

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1922 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

RICA\(^2\): Rubric-Informed, Calibrated Assessment of Actions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment

Vamos: Versatile Action Models for Video Understanding

Auto-encoding score distribution regression for action quality assessment

Notes

References

Acknowledgement:

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1922 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation