Abstract
Weight is a parameter used for measuring the priority in multi-objective reinforcement learning when linearly scalarizing the reward vector for each objective. The weights need to be set in advance; however, most real-world problems have numerous objectives. Therefore, adjusting the weights requires many trials and errors by the designer. In addition, a method to automatically estimate weights is needed to reduce the burden on designers to set weights. In this paper, we propose a novel method for estimating the weights based on the reward vector for each objective and the expert trajectories using the framework of inverse reinforcement learning (IRL). In particular, we adopt deep IRL with deep reinforcement learning and multiplicative weights apprenticeship learning for fast weight estimation in a continuous state space. Through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space, we verified that our novel weight estimation method is superior to the projection method and Bayesian optimization.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the eleventh annual conference on Computational learning theory, pp 101–103
Ikenaga A, Arai S (2018) Inverse reinforcement learning approach for elicitation of preferences in multi-objective sequential optimization. In: IEEE international conference on agents (ICA) 2018, pp 117–118
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, pp 1–8
Syed U, Bowling M, Schapire RE (2008) Apprenticeship learning using linear programming. In: Proceedings of the 25th international conference on machine learning, pp 1032–1039
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84(1–2):51–80
Pelikan M, Goldberg DE, Cantú-Paz E (1999) BOA: the Bayesian optimization algorithm. In: Proceedings of the genetic and evolutionary computation conference GECCO-99, vol 1, pp 525–532
Moffaert V, Drugan M, Nowé A (2013) Scalarized multi-objective reinforcement learning: Novel design techniques. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) 2013, pp 191–199
Saaty RW (1987) The analytic hierarchy process-what it is and how it is used. Math Model 9(3–5):161–176
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Ito H, Matsubayashi T, Kurashima T, Toda H (2019) Scalable Bayesian optimization with memory retention (in Japanese). In: The 33rd annual conference of the Japanese Society for Artificial Intelligence, p 1J3J202
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).
About this article
Cite this article
Takayama, N., Arai, S. Multi-objective deep inverse reinforcement learning for weight estimation of objectives. Artif Life Robotics 27, 594–602 (2022). https://doi.org/10.1007/s10015-022-00773-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-022-00773-8