Abstract
Secondary school experimental evaluation is an essential component of secondary school science education. However, it faces several challenges, including obstacles to precise assessment within limited time and the presence of inconsistent evaluation criteria. Hence, it has become imperative to explore and harness artificial intelligence technology to improve secondary school experimental evaluation. Yet existing applicable online action detection (OAD) algorithms are hindered by limitation to historical context and inefficiency, leading to setbacks in realistic experimental evaluations. Based on this, we present Exemplar-enhanced Transformer (ExpT), a real-time mechanism for online action detection that more accurately and efficiently assesses the experiments conducted by students. By leveraging exemplars through temporal cross attention, the ExpT model provides complementary guidance for modeling temporal dependencies, along with the reduction of excessive attention. We evaluate ExpT on two realistic chemistry experiment datasets for online action detection, and it significantly outperforms all existing methods.
Supported by the Natural Science Foundation of China (Nos. 62377029, 22033002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Cao, S., Luo, W., Wang, B., Zhang, W., Ma, L.: E2e-load: end-to-end long-form online action detection. arXiv preprint arXiv:2306.07703 (2023)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, G., et al.: Videollm: modeling video sequence with large language models. arXiv preprint arXiv:2305.13292 (2023)
Chen, J., Mittal, G., Yu, Y., Kong, Y., Chen, M.: Gatehub: gated history unit with background suppression for online action detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19925–19934 (2022)
De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C., Tuytelaars, T.: Online action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 269–284. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_17
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Eun, H., Moon, J., Park, J., Jung, C., Kim, C.: Learning to discriminate information for online action detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 809–818 (2020)
Gao, M., Zhou, Y., Xu, R., Socher, R., Xiong, C.: Woad: weakly supervised online action detection in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1915–1923 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. IEEE Trans. Big Data 7(3), 535–547 (2019)
Kim, J., Misu, T., Chen, Y.T., Tawari, A., Canny, J.: Grounding human-to-vehicle advice for self-driving vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10591–10599 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Nawhal, M., Mori, G.: Activity graph transformer for temporal action localization. arXiv preprint arXiv:2101.08540 (2021)
Neimark, D., Bar, O., Zohar, M., Asselmann, D.: Video transformer network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3163–3172 (2021)
Pang, G., Yan, C., Shen, C., Hengel, A.V.D., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020)
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14372–14381 (2020)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Sharir, G., Noy, A., Zelnik-Manor, L.: An image is worth 16\(\times \)16 words, what is a video worth? arXiv preprint arXiv:2103.13915 (2021)
Shu, T., Xie, D., Rothrock, B., Todorovic, S., Chun Zhu, S.: Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4576–4584 (2015)
Tan, J., Tang, J., Wang, L., Wu, G.: Relaxed transformer decoders for direct action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13526–13535 (2021)
Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv. Neural. Inf. Process. Syst. 35, 10078–10093 (2022)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, X., et al.: OADTR: online action detection with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7565–7575 (2021)
Xu, M., Gao, M., Chen, Y.T., Davis, L.S., Crandall, D.J.: Temporal recurrent networks for online action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5532–5541 (2019)
Xu, M., Xiong, Y., Chen, H., Li, X., Xia, W., Tu, Z., Soatto, S.: Long short-term transformer for online action detection. Adv. Neural. Inf. Process. Syst. 34, 1086–1099 (2021)
Yang, L., Han, J., Zhang, D.: Colar: effective and efficient online action detection by consulting exemplars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3160–3169 (2022)
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Zhao, Y., Krähenbühl, P.: Real-time online video detection with temporal smoothing transformers. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 485–502. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19830-4_28
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yuan, H., Zheng, Z., Gu, Y., Zhou, J., Chen, Y. (2024). ExpT: Online Action Detection via Exemplar-Enhanced Transformer for Secondary School Experimental Evaluation. In: Hong, W., Kanaparan, G. (eds) Computer Science and Education. Teaching and Curriculum. ICCSE 2023. Communications in Computer and Information Science, vol 2024. Springer, Singapore. https://doi.org/10.1007/978-981-97-0791-1_30
Download citation
DOI: https://doi.org/10.1007/978-981-97-0791-1_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0790-4
Online ISBN: 978-981-97-0791-1
eBook Packages: Computer ScienceComputer Science (R0)