Do we use the Right Measure? Challenges in Evaluating Reward Learning Algorithms

Nils Wilde, Javier Alonso-Mora
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1553-1562, 2023.

Abstract

Reward learning is a highly active area of research in human-robot interaction (HRI), allowing a broad range of users to specify complex robot behaviour. Experiments with simulated user input play a major role in the development and evaluation of reward learning algorithms due to the availability of a ground truth. In this paper, we review measures for evaluating reward learning algorithms used in HRI, most of which fall into two classes. In a theoretical worst case analysis and several examples, we show that both classes of measures can fail to effectively indicate how good the learned robot behaviour is. Thus, our work contributes to the characterization of sim-to-real gaps of reward learning in HRI.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-wilde23a, title = {Do we use the Right Measure? Challenges in Evaluating Reward Learning Algorithms}, author = {Wilde, Nils and Alonso-Mora, Javier}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1553--1562}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/wilde23a/wilde23a.pdf}, url = {https://proceedings.mlr.press/v205/wilde23a.html}, abstract = {Reward learning is a highly active area of research in human-robot interaction (HRI), allowing a broad range of users to specify complex robot behaviour. Experiments with simulated user input play a major role in the development and evaluation of reward learning algorithms due to the availability of a ground truth. In this paper, we review measures for evaluating reward learning algorithms used in HRI, most of which fall into two classes. In a theoretical worst case analysis and several examples, we show that both classes of measures can fail to effectively indicate how good the learned robot behaviour is. Thus, our work contributes to the characterization of sim-to-real gaps of reward learning in HRI.} }
Endnote
%0 Conference Paper %T Do we use the Right Measure? Challenges in Evaluating Reward Learning Algorithms %A Nils Wilde %A Javier Alonso-Mora %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-wilde23a %I PMLR %P 1553--1562 %U https://proceedings.mlr.press/v205/wilde23a.html %V 205 %X Reward learning is a highly active area of research in human-robot interaction (HRI), allowing a broad range of users to specify complex robot behaviour. Experiments with simulated user input play a major role in the development and evaluation of reward learning algorithms due to the availability of a ground truth. In this paper, we review measures for evaluating reward learning algorithms used in HRI, most of which fall into two classes. In a theoretical worst case analysis and several examples, we show that both classes of measures can fail to effectively indicate how good the learned robot behaviour is. Thus, our work contributes to the characterization of sim-to-real gaps of reward learning in HRI.
APA
Wilde, N. & Alonso-Mora, J.. (2023). Do we use the Right Measure? Challenges in Evaluating Reward Learning Algorithms. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1553-1562 Available from https://proceedings.mlr.press/v205/wilde23a.html.

Related Material