Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14520))

Included in the following conference series:

International Workshop on Value Engineering in AI

124 Accesses

Abstract

The emerging field of value awareness engineering claims that software agents and systems should be value-aware, i.e. they must make decisions in accordance with human values. In this context, such agents must be capable of explicitly reasoning as to how far different courses of action are aligned with these values. For this purpose, values are often modelled as preferences over states or actions, which are then aggregated to determine the sequences of actions that are maximally aligned with a certain value. Recently, additional value admissibility constraints at this level have been considered as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On Admissible Behaviours for Goal-Oriented Decision-Making of Value-Aware Agents

Uniformly constrained reinforcement learning

Article 06 December 2023

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Notes

1.
digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.
2.
Original definition from Montes and Sierra uses $[-1,1]$-bounded functions, which is used to model both promotion and demotion of the value. For this theory, those specific bounds are not mandatory.
3.
Resolution 64/292, 07/28/2010, of the United Nations General Assembly, vid. https://www.un.org/spanish/waterforlifedecade/human_right_to_water.shtml.
4.
Royal Decree 1/2001, of July 20, approving the Revised Text of the Water Law, Article 60.
5.
Royal Decree 3/2023, of January 10, establishing the technical-sanitary criteria for the quality of drinking water, its control, and supply, Article 9.
6.
To calculate $f_{gini}$ we still use the original states.
7.
Double Q-learning reduces training biases of normal Q-learning (though reducing sample efficiency). This RL algorithm choice is not critical, though, as our proposed Algorithms 1 and 2 will work fine with typical Q-learning too.
8.
With some abuse of notation, Q(s, a) denotes in reality the Q-table value of the leveled representation of state s under action a.
9.
Different initial states and algorithms give different length paths. For visual purposes, the average historic reward time series seen in Fig. 5 have been calculated by making all of them the same length, enlarging the shorter ones by repeating their ending rewards until getting as long as the longest series, which is previously cropped up to a maximum length of 1.2 times the default initial state experiment series. All the metrics, however, are calculated w.r.t. the original lengths and then averaged.
10.
Unlike the ADQL policy, $\epsilon $-ADQL and $\epsilon $-CADQL are controlled in the sense that both adhere to $\epsilon $-local behaviour during training.

References

Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Google Scholar
Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)
Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)
Google Scholar
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe exploration in continuous action spaces (2018)
Google Scholar
Das, S., Egecioglu, O., El Abbadi, A.: Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans. Knowl. Data Eng. 24(4), 590–604 (2012). https://doi.org/10.1109/TKDE.2010.267
Google Scholar
Foundation, T.F.: Gymnasium (2023). https://gymnasium.farama.org
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)
MathSciNet Google Scholar
Government, S.: Strategic project for economic recovery and transformation of digitalization of the water cycle. report 2022. Technical report, Ministry for the Ecological Transition and Demographic Challenge (2022)
Google Scholar
Guo, T., Yuan, Y., Zhao, P.: Admission-based reinforcement-learning algorithm in sequential social dilemmas. Appl. Sci. 13(3) (2023). https://doi.org/10.3390/app13031807, https://www.mdpi.com/2076-3417/13/3/1807
Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
Holgado-Sánchez, A., Arias, J., Moreno-Rebato, M., Ossowski, S.: On admissible behaviours for goal-oriented decision-making of value-aware agents. In: Malvone, V., Murano, A. (eds.) EUMAS 2023. LNCS, vol. 14282, pp. 415–424. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43264-4_27
Google Scholar
Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep constrained Q-learning (2020)
Google Scholar
p Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, pp. 780–788. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2022)
Google Scholar
Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759
Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487
MathSciNet Google Scholar
Moulin, H.: Fair Division and Collective Welfare. MIT Press, Cambridge (2004)
Google Scholar
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670 (2000)
Google Scholar
Osman, N., d’Inverno, M.: A computational framework of human values for ethical AI (2023)
Google Scholar
Perello-Moragues, A., Poch, M., Sauri, D., Popartan, L.A., Noriega, P.: Modelling domestic water use in metropolitan areas using socio-cognitive agents. Water 13(8) (2021). https://doi.org/10.3390/w13081024, https://www.mdpi.com/2073-4441/13/8/1024
Plata-Pérez, L., Sánchez-Pérez, J., Sánchez-Sánchez, F.: An elementary characterization of the gini index. Math. Soc. Sci. 74, 79–83 (2015)
MathSciNet Google Scholar
Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24, 9 (2022). https://doi.org/10.1007/s10676-022-09635-0
Google Scholar
Schwartz, S.H.: An overview of the Schwartz theory of basic values. Online Read. Psychol. Cult. 2(1), 11 (2012)
Google Scholar
Sierra, C., Osman, N., Noriega, P., Sabater-Mir, J., Perelló, A.: Value alignment: a formal approach. CoRR abs/2110.09240 (2021). arXiv:2110.09240
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimization (2018)
Google Scholar
van der Weide, T.L., Dignum, F., Meyer, J.J.C., Prakken, H., Vreeswijk, G.A.W.: Practical reasoning using values. In: McBurney, P., Rahwan, I., Parsons, S., Maudet, N. (eds.) ArgMAS 2009. LNCS, vol. 6057, pp. 79–93. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12805-9_5
Google Scholar

Download references

Acknowledgements

This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.

Author information

Authors and Affiliations

CETINIA, Universidad Rey Juan Carlos de Madrid, 28933, Móstoles, Spain
Andrés Holgado-Sánchez, Joaquín Arias, Holger Billhardt & Sascha Ossowski

Authors

Andrés Holgado-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Joaquín Arias
View author publications
You can also search for this author in PubMed Google Scholar
Holger Billhardt
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Ossowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrés Holgado-Sánchez .

Editor information

Editors and Affiliations

Artificial Intelligence Research Institute, Bellaterra, Spain
Nardine Osman
Studio Stelluti, Brussel, Belgium
Luc Steels

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holgado-Sánchez, A., Arias, J., Billhardt, H., Ossowski, S. (2024). Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2023. Lecture Notes in Computer Science(), vol 14520. Springer, Cham. https://doi.org/10.1007/978-3-031-58202-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-58202-8_9
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58204-2
Online ISBN: 978-3-031-58202-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Admissible Behaviours for Goal-Oriented Decision-Making of Value-Aware Agents

Uniformly constrained reinforcement learning

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Admissible Behaviours for Goal-Oriented Decision-Making of Value-Aware Agents

Uniformly constrained reinforcement learning

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation