Abstract
The emerging field of value awareness engineering claims that software agents and systems should be value-aware, i.e. they must make decisions in accordance with human values. In this context, such agents must be capable of explicitly reasoning as to how far different courses of action are aligned with these values. For this purpose, values are often modelled as preferences over states or actions, which are then aggregated to determine the sequences of actions that are maximally aligned with a certain value. Recently, additional value admissibility constraints at this level have been considered as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Original definition from Montes and Sierra uses \([-1,1]\)-bounded functions, which is used to model both promotion and demotion of the value. For this theory, those specific bounds are not mandatory.
- 3.
Resolution 64/292, 07/28/2010, of the United Nations General Assembly, vid. https://www.un.org/spanish/waterforlifedecade/human_right_to_water.shtml.
- 4.
Royal Decree 1/2001, of July 20, approving the Revised Text of the Water Law, Article 60.
- 5.
Royal Decree 3/2023, of January 10, establishing the technical-sanitary criteria for the quality of drinking water, its control, and supply, Article 9.
- 6.
To calculate \(f_{gini}\) we still use the original states.
- 7.
- 8.
With some abuse of notation, Q(s, a) denotes in reality the Q-table value of the leveled representation of state s under action a.
- 9.
Different initial states and algorithms give different length paths. For visual purposes, the average historic reward time series seen in Fig. 5 have been calculated by making all of them the same length, enlarging the shorter ones by repeating their ending rewards until getting as long as the longest series, which is previously cropped up to a maximum length of 1.2 times the default initial state experiment series. All the metrics, however, are calculated w.r.t. the original lengths and then averaged.
- 10.
Unlike the ADQL policy, \(\epsilon \)-ADQL and \(\epsilon \)-CADQL are controlled in the sense that both adhere to \(\epsilon \)-local behaviour during training.
References
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe exploration in continuous action spaces (2018)
Das, S., Egecioglu, O., El Abbadi, A.: Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans. Knowl. Data Eng. 24(4), 590–604 (2012). https://doi.org/10.1109/TKDE.2010.267
Foundation, T.F.: Gymnasium (2023). https://gymnasium.farama.org
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)
Government, S.: Strategic project for economic recovery and transformation of digitalization of the water cycle. report 2022. Technical report, Ministry for the Ecological Transition and Demographic Challenge (2022)
Guo, T., Yuan, Y., Zhao, P.: Admission-based reinforcement-learning algorithm in sequential social dilemmas. Appl. Sci. 13(3) (2023). https://doi.org/10.3390/app13031807, https://www.mdpi.com/2076-3417/13/3/1807
Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Holgado-Sánchez, A., Arias, J., Moreno-Rebato, M., Ossowski, S.: On admissible behaviours for goal-oriented decision-making of value-aware agents. In: Malvone, V., Murano, A. (eds.) EUMAS 2023. LNCS, vol. 14282, pp. 415–424. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43264-4_27
Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep constrained Q-learning (2020)
p Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, pp. 780–788. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2022)
Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759
Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487
Moulin, H.: Fair Division and Collective Welfare. MIT Press, Cambridge (2004)
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670 (2000)
Osman, N., d’Inverno, M.: A computational framework of human values for ethical AI (2023)
Perello-Moragues, A., Poch, M., Sauri, D., Popartan, L.A., Noriega, P.: Modelling domestic water use in metropolitan areas using socio-cognitive agents. Water 13(8) (2021). https://doi.org/10.3390/w13081024, https://www.mdpi.com/2073-4441/13/8/1024
Plata-Pérez, L., Sánchez-Pérez, J., Sánchez-Sánchez, F.: An elementary characterization of the gini index. Math. Soc. Sci. 74, 79–83 (2015)
Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24, 9 (2022). https://doi.org/10.1007/s10676-022-09635-0
Schwartz, S.H.: An overview of the Schwartz theory of basic values. Online Read. Psychol. Cult. 2(1), 11 (2012)
Sierra, C., Osman, N., Noriega, P., Sabater-Mir, J., Perelló, A.: Value alignment: a formal approach. CoRR abs/2110.09240 (2021). arXiv:2110.09240
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimization (2018)
van der Weide, T.L., Dignum, F., Meyer, J.J.C., Prakken, H., Vreeswijk, G.A.W.: Practical reasoning using values. In: McBurney, P., Rahwan, I., Parsons, S., Maudet, N. (eds.) ArgMAS 2009. LNCS, vol. 6057, pp. 79–93. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12805-9_5
Acknowledgements
This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Holgado-Sánchez, A., Arias, J., Billhardt, H., Ossowski, S. (2024). Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2023. Lecture Notes in Computer Science(), vol 14520. Springer, Cham. https://doi.org/10.1007/978-3-031-58202-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-58202-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58204-2
Online ISBN: 978-3-031-58202-8
eBook Packages: Computer ScienceComputer Science (R0)