research-article

Get a head start: on-demand pedagogical policy selection in intelligent tutoring

AUTHORs:

Min ChiAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 1354, Pages 12136 - 12144

https://doi.org/10.1609/aaai.v38i11.29102

Published: 20 February 2024 Publication History

Abstract

Reinforcement learning (RL) is broadly employed in human-involved systems to enhance human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since online policy learning and evaluation can be high-stake. Intelligent tutoring has raised tremendous attentions as highly challenging when applying OPE to human-involved systems, due to that students' subgroups can favor different pedagogical policies and the costly procedure that policies have to be induced fully offline and then directly deployed to the upcoming semester. In this work, we formulate on-demand pedagogical policy selection (ODPS) to tackle the challenges for OPE in intelligent tutoring. We propose a pipeline, EDUPLANNER, as a concrete solution for ODPS. Our pipeline results in an theoretically unbiased estimator, and enables efficient and customized policy selection by identifying subgroups over both historical data and on-arrival initial logs. We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EDUPLANNER, especially for the ones associated with low-performing subgroups.

References

[1]

Abdelshiheed, M.; Hostetter, J. W.; Barnes, T.; and Chi, M. 2023. Bridging declarative, procedural, and conditional metacognitive knowledge gap using deep reinforcement learning.

[2]

Abdelshiheed, M.; Zhou, G.; Maniktala, M.; Barnes, T.; and Chi, M. 2020. Metacognition and Motivation: The Role of Time-Awareness in Preparation for Future Learning.

[3]

Castro-Wunsch, K.; Ahadi, A.; and Petersen, A. 2017. Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education, 111-116.

[4]

Chandak, Y.; Shankar, S.; Bastian, N.; da Silva, B.; Brunskill, E.; and Thomas, P. S. 2022. Off-Policy Evaluation for Action-Dependent Non-Stationary Environments. Advances in Neural Information Processing Systems, 35: 9217-9232.

[5]

Chi, M.; VanLehn, K.; Litman, D.; and Jordan, P. 2011. Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21(1): 137-180.

Digital Library

[6]

Doroudi, S.; Thomas, P. S.; and Brunskill, E. 2017. Importance Sampling for Fair Policy Selection. Grantee Submission.

[7]

Gao, G.; Gao, Q.; Yang, X.; Ju, S.; Pajic, M.; and Chi, M. 2024. On Trajectory Augmentations for Off-Policy Evaluation. In The Twelfth International Conference on Learning Representations.

[8]

Gao, G.; Ju, S.; Ausin, M. S.; and Chi, M. 2023a. HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 1504-1513.

[9]

Gao, G.; Marwan, S.; and Price, T. W. 2021. Early performance prediction using interpretable patterns in programming process data. In Proceedings of the 52nd ACM technical symposium on computer science education, 342-348.

[10]

Gao, Q.; Gao, G.; Chi, M.; and Pajic, M. 2022. Variational Latent Branching Model for Off-Policy Evaluation. In The Eleventh International Conference on Learning Representations.

[11]

Gao, Q.; Gao, G.; Dong, J.; Tarokh, V.; Chi, M.; and Pajic, M. 2023b. Off-Policy Evaluation for Human Feedback. In Thirty-seventh Conference on Neural Information Processing Systems.

[12]

Gao, Q.; Schmidt, S. L.; Chowdhury, A.; Feng, G.; Peters, J. J.; Genty, K.; Grill, W. M.; Turner, D. A.; and Pajic, M. 2023c. Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), 44-55.

Digital Library

[13]

Hafner, D.; Lillicrap, T.; Ba, J.; and Norouzi, M. 2020. Dream to Control: Learning Behaviors by Latent Imagination. In International Conference on Learning Representations.

[14]

Hallac, D.; Vare, S.; Boyd, S.; and Leskovec, J. 2017. Toeplitz inverse covariance-based clustering of multivariate time series data. In ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 215-223.

[15]

Jiang, N.; and Li, L. 2016. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, 652-661. PMLR.

[16]

Keramati, R.; Gottesman, O.; Celi, L. A.; Doshi-Velez, F.; and Brunskill, E. 2022. Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation. In Conference on Health, Inference, and Learning, 397-410. PMLR.

[17]

Krishnan, S.; Haas, D.; Franklin, M. J.; and Wu, E. 2016. Towards reliable interactive data cleaning: A user survey and recommendations. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 1-5.

[18]

Kumar, A.; Singh, A.; Tian, S.; Finn, C.; and Levine, S. 2022. A Workflow for Offline Model-Free Robotic Reinforcement Learning. In Conference on Robot Learning, 417-428. PMLR.

[19]

Le, H.; Voloshin, C.; and Yue, Y. 2019. Batch policy learning under constraints. In International Conference on Machine Learning, 3703-3712. PMLR.

[20]

Lee, A. X.; Nagabandi, A.; Abbeel, P.; and Levine, S. 2020. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33: 741-752.

[21]

Lee, J.; Tucker, G.; Nachum, O.; and Dai, B. 2022. Model selection in batch policy optimization. In International Conference on Machine Learning, 12542-12569. PMLR.

[22]

Liu, E.; Stephan, M.; Nie, A.; Piech, C.; Brunskill, E.; and Finn, C. 2022. Giving Feedback on Interactive Student Programs with Meta-Exploration. Advances in Neural Information Processing Systems, 35: 36282-36294.

[23]

Mandel, T.; Liu, Y.-E.; Levine, S.; Brunskill, E.; and Popovic, Z. 2014. Offline policy evaluation across representations with applications to educational games. In AAMAS, volume 1077.

[24]

Mao, Y. 2019. One minute is enough: Early prediction of student success and event-level difficulty during novice programming tasks. In In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019).

[25]

Marwan, S.; Dombe, A.; and Price, T. W. 2020. Unproductive help-seeking in programming: What it is and how to address it. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education, 54-60.

[26]

Miyaguchi, K. 2022. A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation. arXiv e-prints, arXiv-2201.

[27]

Nachum, O.; Chow, Y.; Dai, B.; and Li, L. 2019. Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in Neural Information Processing Systems, 32.

[28]

Nie, A.; Brunskill, E.; and Piech, C. 2021. Play to grade: testing coding games as classifying Markov decision process. Advances in Neural Information Processing Systems, 34: 1506-1518.

[29]

Nie, A.; Flet-Berliac, Y.; Jordan, D.; Steenbergen, W.; and Brunskill, E. 2022. Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data. Advances in Neural Information Processing Systems, 35: 14810-14823.

[30]

Paduraru, C. 2013. Off-policy evaluation in Markov decision processes.

[31]

Precup, D. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, 80.

[32]

Rowe, J.; Mott, B.; and Lester, J. 2014. Optimizing player experience in interactive narrative planning: a modular reinforcement learning approach. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 10, 160-166.

[33]

Rybkin, O.; Zhu, C.; Nagabandi, A.; Daniilidis, K.; Mordatch, I.; and Levine, S. 2021. Model-based reinforcement learning via latent-space collocation. In International Conference on Machine Learning, 9190-9201. PMLR.

[34]

Sanz Ausin, M.; Maniktala, M.; Barnes, T.; and Chi, M. 2020. Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In International Conference on Artificial Intelligence in Education, 472-485. Springer.

[35]

Schwonke, R.; Renkl, A.; Krieg, C.; Wittwer, J.; Aleven, V.; and Salden, R. 2009. The worked-example effect: Not an artefact of lousy control conditions. Computers in human behavior, 25(2): 258-266.

[36]

Shen, S.; and Chi, M. 2016. Reinforcement learning: the sooner the better, or the later the better? In Proceedings of the 2016 conference on user modeling adaptation and personalization, 37-44.

[37]

Sinclair, R. R.; Martin, J. E.; and Michel, R. P. 1999. Fulltime and part-time subgroup differences in job attitudes and demographic characteristics. Journal of Vocational Behavior, 55(3): 337-357.

[38]

Sweller, J.; and Cooper, G. A. 1985. The use of worked examples as a substitute for problem solving in learning algebra. Cognition and instruction, 2(1): 59-89.

[39]

Thomas, P.; and Brunskill, E. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, 2139-2148. PMLR.

[40]

VanLehn, K. 2006. The Behavior of Tutoring Systems. International Journal Artificial Intelligence in Education, 16(3): 227-265.

Digital Library

[41]

Voloshin, C.; Jiang, N.; and Yue, Y. 2021. Minimax model learning. In International Conference on Artificial Intelligence and Statistics, 1612-1620. PMLR.

[42]

Voloshin, C.; Le, H. M.; Jiang, N.; and Yue, Y. 2021. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

[43]

Wang, P.; Rowe, J.; Min, W.; Mott, B.; and Lester, J. 2017. Interactive narrative personalization with deep reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.

[44]

Yang, M.; Dai, B.; Nachum, O.; Tucker, G.; and Schuurmans, D. 2022. Offline policy selection under uncertainty. In International Conference on Artificial Intelligence and Statistics, 4376-4396. PMLR.

[45]

Yang, M.; Nachum, O.; Dai, B.; Li, L.; and Schuurmans, D. 2020a. Off-policy evaluation via the regularized lagrangian. Advances in Neural Information Processing Systems, 33: 6551-6561.

[46]

Yang, X.; Zhang, Y.; and Chi, M. 2021. Multi-series time-aware sequence partitioning for disease progression modeling. In IJCAI.

[47]

Yang, X.; Zhou, G.; Taub, M.; Azevedo, R.; and Chi, M. 2020b. Student Subtyping via EM-Inverse Reinforcement Learning. International Educational Data Mining Society.

[48]

Zhang, M. R.; Paine, T. L.; Nachum, O.; Paduraru, C.; Tucker, G.; Wang, Z.; and Norouzi, M. 2021. Autoregressive dynamics models for offline policy evaluation and optimization. arXiv preprint arXiv:2104.13877.

[49]

Zhang, S.; Liu, B.; and Whiteson, S. 2020. Gradientdice: Rethinking generalized offline estimation of stationary values. In International Conference on Machine Learning, 11194-11203. PMLR.

[50]

Zhong, R.; Zhang, D.; Schäfer, L.; Albrecht, S.; and Hanna, J. 2022. Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning. Advances in Neural Information Processing Systems, 35: 37376-37388.

[51]

Zhou, G.; Azizsoltani, H.; Ausin, M. S.; Barnes, T.; and Chi, M. 2022. Leveraging granularity: Hierarchical reinforcement learning for pedagogical policy induction. International journal of artificial intelligence in education, 32(2): 454-500.

Index Terms

Get a head start: on-demand pedagogical policy selection in intelligent tutoring
1. Applied computing
  1. Education
2. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning under uncertainty
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

What can AI get from neuroscience?
50 years of artificial intelligence

The human brain is the best example of intelligence known, with unsurpassed ability for complex, real-time interaction with a dynamic world. AI researchers trying to imitate its remarkable functionality will benefit by learning more about neuroscience, ...
Jump-start reinforcement learning
ICML'23: Proceedings of the 40th International Conference on Machine Learning

Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently learning policies from scratch can be very difficult, particularly for tasks that present exploration ...
Serious Games Get Smart: Intelligent Game‐Based Learning Environments

Intelligent game‐based learning environments integrate commercial game technologies with AI methods from intelligent tutoring systems and intelligent narrative technologies. This article introduces the Crystal Island intelligent game‐based learning ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten