Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1609/aaai.v38i11.29102guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Get a head start: on-demand pedagogical policy selection in intelligent tutoring

Published: 20 February 2024 Publication History

Abstract

Reinforcement learning (RL) is broadly employed in human-involved systems to enhance human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since online policy learning and evaluation can be high-stake. Intelligent tutoring has raised tremendous attentions as highly challenging when applying OPE to human-involved systems, due to that students' subgroups can favor different pedagogical policies and the costly procedure that policies have to be induced fully offline and then directly deployed to the upcoming semester. In this work, we formulate on-demand pedagogical policy selection (ODPS) to tackle the challenges for OPE in intelligent tutoring. We propose a pipeline, EDUPLANNER, as a concrete solution for ODPS. Our pipeline results in an theoretically unbiased estimator, and enables efficient and customized policy selection by identifying subgroups over both historical data and on-arrival initial logs. We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EDUPLANNER, especially for the ones associated with low-performing subgroups.

References

[1]
Abdelshiheed, M.; Hostetter, J. W.; Barnes, T.; and Chi, M. 2023. Bridging declarative, procedural, and conditional metacognitive knowledge gap using deep reinforcement learning.
[2]
Abdelshiheed, M.; Zhou, G.; Maniktala, M.; Barnes, T.; and Chi, M. 2020. Metacognition and Motivation: The Role of Time-Awareness in Preparation for Future Learning.
[3]
Castro-Wunsch, K.; Ahadi, A.; and Petersen, A. 2017. Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education, 111-116.
[4]
Chandak, Y.; Shankar, S.; Bastian, N.; da Silva, B.; Brunskill, E.; and Thomas, P. S. 2022. Off-Policy Evaluation for Action-Dependent Non-Stationary Environments. Advances in Neural Information Processing Systems, 35: 9217-9232.
[5]
Chi, M.; VanLehn, K.; Litman, D.; and Jordan, P. 2011. Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21(1): 137-180.
[6]
Doroudi, S.; Thomas, P. S.; and Brunskill, E. 2017. Importance Sampling for Fair Policy Selection. Grantee Submission.
[7]
Gao, G.; Gao, Q.; Yang, X.; Ju, S.; Pajic, M.; and Chi, M. 2024. On Trajectory Augmentations for Off-Policy Evaluation. In The Twelfth International Conference on Learning Representations.
[8]
Gao, G.; Ju, S.; Ausin, M. S.; and Chi, M. 2023a. HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 1504-1513.
[9]
Gao, G.; Marwan, S.; and Price, T. W. 2021. Early performance prediction using interpretable patterns in programming process data. In Proceedings of the 52nd ACM technical symposium on computer science education, 342-348.
[10]
Gao, Q.; Gao, G.; Chi, M.; and Pajic, M. 2022. Variational Latent Branching Model for Off-Policy Evaluation. In The Eleventh International Conference on Learning Representations.
[11]
Gao, Q.; Gao, G.; Dong, J.; Tarokh, V.; Chi, M.; and Pajic, M. 2023b. Off-Policy Evaluation for Human Feedback. In Thirty-seventh Conference on Neural Information Processing Systems.
[12]
Gao, Q.; Schmidt, S. L.; Chowdhury, A.; Feng, G.; Peters, J. J.; Genty, K.; Grill, W. M.; Turner, D. A.; and Pajic, M. 2023c. Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), 44-55.
[13]
Hafner, D.; Lillicrap, T.; Ba, J.; and Norouzi, M. 2020. Dream to Control: Learning Behaviors by Latent Imagination. In International Conference on Learning Representations.
[14]
Hallac, D.; Vare, S.; Boyd, S.; and Leskovec, J. 2017. Toeplitz inverse covariance-based clustering of multivariate time series data. In ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 215-223.
[15]
Jiang, N.; and Li, L. 2016. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, 652-661. PMLR.
[16]
Keramati, R.; Gottesman, O.; Celi, L. A.; Doshi-Velez, F.; and Brunskill, E. 2022. Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation. In Conference on Health, Inference, and Learning, 397-410. PMLR.
[17]
Krishnan, S.; Haas, D.; Franklin, M. J.; and Wu, E. 2016. Towards reliable interactive data cleaning: A user survey and recommendations. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 1-5.
[18]
Kumar, A.; Singh, A.; Tian, S.; Finn, C.; and Levine, S. 2022. A Workflow for Offline Model-Free Robotic Reinforcement Learning. In Conference on Robot Learning, 417-428. PMLR.
[19]
Le, H.; Voloshin, C.; and Yue, Y. 2019. Batch policy learning under constraints. In International Conference on Machine Learning, 3703-3712. PMLR.
[20]
Lee, A. X.; Nagabandi, A.; Abbeel, P.; and Levine, S. 2020. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33: 741-752.
[21]
Lee, J.; Tucker, G.; Nachum, O.; and Dai, B. 2022. Model selection in batch policy optimization. In International Conference on Machine Learning, 12542-12569. PMLR.
[22]
Liu, E.; Stephan, M.; Nie, A.; Piech, C.; Brunskill, E.; and Finn, C. 2022. Giving Feedback on Interactive Student Programs with Meta-Exploration. Advances in Neural Information Processing Systems, 35: 36282-36294.
[23]
Mandel, T.; Liu, Y.-E.; Levine, S.; Brunskill, E.; and Popovic, Z. 2014. Offline policy evaluation across representations with applications to educational games. In AAMAS, volume 1077.
[24]
Mao, Y. 2019. One minute is enough: Early prediction of student success and event-level difficulty during novice programming tasks. In In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019).
[25]
Marwan, S.; Dombe, A.; and Price, T. W. 2020. Unproductive help-seeking in programming: What it is and how to address it. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education, 54-60.
[26]
Miyaguchi, K. 2022. A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation. arXiv e-prints, arXiv-2201.
[27]
Nachum, O.; Chow, Y.; Dai, B.; and Li, L. 2019. Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in Neural Information Processing Systems, 32.
[28]
Nie, A.; Brunskill, E.; and Piech, C. 2021. Play to grade: testing coding games as classifying Markov decision process. Advances in Neural Information Processing Systems, 34: 1506-1518.
[29]
Nie, A.; Flet-Berliac, Y.; Jordan, D.; Steenbergen, W.; and Brunskill, E. 2022. Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data. Advances in Neural Information Processing Systems, 35: 14810-14823.
[30]
Paduraru, C. 2013. Off-policy evaluation in Markov decision processes.
[31]
Precup, D. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, 80.
[32]
Rowe, J.; Mott, B.; and Lester, J. 2014. Optimizing player experience in interactive narrative planning: a modular reinforcement learning approach. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 10, 160-166.
[33]
Rybkin, O.; Zhu, C.; Nagabandi, A.; Daniilidis, K.; Mordatch, I.; and Levine, S. 2021. Model-based reinforcement learning via latent-space collocation. In International Conference on Machine Learning, 9190-9201. PMLR.
[34]
Sanz Ausin, M.; Maniktala, M.; Barnes, T.; and Chi, M. 2020. Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In International Conference on Artificial Intelligence in Education, 472-485. Springer.
[35]
Schwonke, R.; Renkl, A.; Krieg, C.; Wittwer, J.; Aleven, V.; and Salden, R. 2009. The worked-example effect: Not an artefact of lousy control conditions. Computers in human behavior, 25(2): 258-266.
[36]
Shen, S.; and Chi, M. 2016. Reinforcement learning: the sooner the better, or the later the better? In Proceedings of the 2016 conference on user modeling adaptation and personalization, 37-44.
[37]
Sinclair, R. R.; Martin, J. E.; and Michel, R. P. 1999. Fulltime and part-time subgroup differences in job attitudes and demographic characteristics. Journal of Vocational Behavior, 55(3): 337-357.
[38]
Sweller, J.; and Cooper, G. A. 1985. The use of worked examples as a substitute for problem solving in learning algebra. Cognition and instruction, 2(1): 59-89.
[39]
Thomas, P.; and Brunskill, E. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, 2139-2148. PMLR.
[40]
VanLehn, K. 2006. The Behavior of Tutoring Systems. International Journal Artificial Intelligence in Education, 16(3): 227-265.
[41]
Voloshin, C.; Jiang, N.; and Yue, Y. 2021. Minimax model learning. In International Conference on Artificial Intelligence and Statistics, 1612-1620. PMLR.
[42]
Voloshin, C.; Le, H. M.; Jiang, N.; and Yue, Y. 2021. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[43]
Wang, P.; Rowe, J.; Min, W.; Mott, B.; and Lester, J. 2017. Interactive narrative personalization with deep reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.
[44]
Yang, M.; Dai, B.; Nachum, O.; Tucker, G.; and Schuurmans, D. 2022. Offline policy selection under uncertainty. In International Conference on Artificial Intelligence and Statistics, 4376-4396. PMLR.
[45]
Yang, M.; Nachum, O.; Dai, B.; Li, L.; and Schuurmans, D. 2020a. Off-policy evaluation via the regularized lagrangian. Advances in Neural Information Processing Systems, 33: 6551-6561.
[46]
Yang, X.; Zhang, Y.; and Chi, M. 2021. Multi-series time-aware sequence partitioning for disease progression modeling. In IJCAI.
[47]
Yang, X.; Zhou, G.; Taub, M.; Azevedo, R.; and Chi, M. 2020b. Student Subtyping via EM-Inverse Reinforcement Learning. International Educational Data Mining Society.
[48]
Zhang, M. R.; Paine, T. L.; Nachum, O.; Paduraru, C.; Tucker, G.; Wang, Z.; and Norouzi, M. 2021. Autoregressive dynamics models for offline policy evaluation and optimization. arXiv preprint arXiv:2104.13877.
[49]
Zhang, S.; Liu, B.; and Whiteson, S. 2020. Gradientdice: Rethinking generalized offline estimation of stationary values. In International Conference on Machine Learning, 11194-11203. PMLR.
[50]
Zhong, R.; Zhang, D.; Schäfer, L.; Albrecht, S.; and Hanna, J. 2022. Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning. Advances in Neural Information Processing Systems, 35: 37376-37388.
[51]
Zhou, G.; Azizsoltani, H.; Ausin, M. S.; Barnes, T.; and Chi, M. 2022. Leveraging granularity: Hierarchical reinforcement learning for pedagogical policy induction. International journal of artificial intelligence in education, 32(2): 454-500.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence
February 2024
23861 pages
ISBN:978-1-57735-887-9

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media