Nothing Special   »   [go: up one dir, main page]

Skip to main content

Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation

  • Conference paper
  • First Online:
Value Engineering in Artificial Intelligence (VALE 2023)

Abstract

The emerging field of value awareness engineering claims that software agents and systems should be value-aware, i.e. they must make decisions in accordance with human values. In this context, such agents must be capable of explicitly reasoning as to how far different courses of action are aligned with these values. For this purpose, values are often modelled as preferences over states or actions, which are then aggregated to determine the sequences of actions that are maximally aligned with a certain value. Recently, additional value admissibility constraints at this level have been considered as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.

  2. 2.

    Original definition from Montes and Sierra uses \([-1,1]\)-bounded functions, which is used to model both promotion and demotion of the value. For this theory, those specific bounds are not mandatory.

  3. 3.

    Resolution 64/292, 07/28/2010, of the United Nations General Assembly, vid. https://www.un.org/spanish/waterforlifedecade/human_right_to_water.shtml.

  4. 4.

    Royal Decree 1/2001, of July 20, approving the Revised Text of the Water Law, Article 60.

  5. 5.

    Royal Decree 3/2023, of January 10, establishing the technical-sanitary criteria for the quality of drinking water, its control, and supply, Article 9.

  6. 6.

    To calculate \(f_{gini}\) we still use the original states.

  7. 7.

    Double Q-learning reduces training biases of normal Q-learning (though reducing sample efficiency). This RL algorithm choice is not critical, though, as our proposed Algorithms 1 and 2 will work fine with typical Q-learning too.

  8. 8.

    With some abuse of notation, Q(sa) denotes in reality the Q-table value of the leveled representation of state s under action a.

  9. 9.

    Different initial states and algorithms give different length paths. For visual purposes, the average historic reward time series seen in Fig. 5 have been calculated by making all of them the same length, enlarging the shorter ones by repeating their ending rewards until getting as long as the longest series, which is previously cropped up to a maximum length of 1.2 times the default initial state experiment series. All the metrics, however, are calculated w.r.t. the original lengths and then averaged.

  10. 10.

    Unlike the ADQL policy, \(\epsilon \)-ADQL and \(\epsilon \)-CADQL are controlled in the sense that both adhere to \(\epsilon \)-local behaviour during training.

References

  1. Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)

    Google Scholar 

  2. Bench-Capon, T., Atkinson, K., McBurney, P.: Using argumentation to model agent decision making in economic experiments. Auton. Agent. Multi-Agent Syst. 25, 183–208 (2012)

    Google Scholar 

  3. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2023)

    Google Scholar 

  5. Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe exploration in continuous action spaces (2018)

    Google Scholar 

  6. Das, S., Egecioglu, O., El Abbadi, A.: Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans. Knowl. Data Eng. 24(4), 590–604 (2012). https://doi.org/10.1109/TKDE.2010.267

    Google Scholar 

  7. Foundation, T.F.: Gymnasium (2023). https://gymnasium.farama.org

  8. Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 123–156 (2012)

    MathSciNet  Google Scholar 

  9. Government, S.: Strategic project for economic recovery and transformation of digitalization of the water cycle. report 2022. Technical report, Ministry for the Ecological Transition and Demographic Challenge (2022)

    Google Scholar 

  10. Guo, T., Yuan, Y., Zhao, P.: Admission-based reinforcement-learning algorithm in sequential social dilemmas. Appl. Sci. 13(3) (2023). https://doi.org/10.3390/app13031807, https://www.mdpi.com/2076-3417/13/3/1807

  11. Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

    Google Scholar 

  12. Holgado-Sánchez, A., Arias, J., Moreno-Rebato, M., Ossowski, S.: On admissible behaviours for goal-oriented decision-making of value-aware agents. In: Malvone, V., Murano, A. (eds.) EUMAS 2023. LNCS, vol. 14282, pp. 415–424. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43264-4_27

    Google Scholar 

  13. Kalweit, G., Huegle, M., Werling, M., Boedecker, J.: Deep constrained Q-learning (2020)

    Google Scholar 

  14. p Lera-Leri, R., Bistaffa, F., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.: Towards pluralistic value alignment: aggregating value systems through LP-regression. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, pp. 780–788. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2022)

    Google Scholar 

  15. Montes, N., Osman, N., Sierra, C., Slavkovik, M.: Value engineering for autonomous agents. CoRR abs/2302.08759 (2023). https://doi.org/10.48550/arXiv.2302.08759

  16. Montes, N., Sierra, C.: Synthesis and properties of optimally value-aligned normative systems. J. Artif. Intell. Res. 74, 1739–1774 (2022). https://doi.org/10.1613/jair.1.13487

    MathSciNet  Google Scholar 

  17. Moulin, H.: Fair Division and Collective Welfare. MIT Press, Cambridge (2004)

    Google Scholar 

  18. Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670 (2000)

    Google Scholar 

  19. Osman, N., d’Inverno, M.: A computational framework of human values for ethical AI (2023)

    Google Scholar 

  20. Perello-Moragues, A., Poch, M., Sauri, D., Popartan, L.A., Noriega, P.: Modelling domestic water use in metropolitan areas using socio-cognitive agents. Water 13(8) (2021). https://doi.org/10.3390/w13081024, https://www.mdpi.com/2073-4441/13/8/1024

  21. Plata-Pérez, L., Sánchez-Pérez, J., Sánchez-Sánchez, F.: An elementary characterization of the gini index. Math. Soc. Sci. 74, 79–83 (2015)

    MathSciNet  Google Scholar 

  22. Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learning. Ethics Inf. Technol. 24, 9 (2022). https://doi.org/10.1007/s10676-022-09635-0

    Google Scholar 

  23. Schwartz, S.H.: An overview of the Schwartz theory of basic values. Online Read. Psychol. Cult. 2(1), 11 (2012)

    Google Scholar 

  24. Sierra, C., Osman, N., Noriega, P., Sabater-Mir, J., Perelló, A.: Value alignment: a formal approach. CoRR abs/2110.09240 (2021). arXiv:2110.09240

  25. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  26. Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimization (2018)

    Google Scholar 

  27. van der Weide, T.L., Dignum, F., Meyer, J.J.C., Prakken, H., Vreeswijk, G.A.W.: Practical reasoning using values. In: McBurney, P., Rahwan, I., Parsons, S., Maudet, N. (eds.) ArgMAS 2009. LNCS, vol. 6057, pp. 79–93. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12805-9_5

    Google Scholar 

Download references

Acknowledgements

This work has been supported by grant VAE: TED2021-131295B-C33 funded by MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”, by grant COSASS: PID2021-123673OB-C32 funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”, and by the AGROBOTS Project of Universidad Rey Juan Carlos funded by the Community of Madrid, Spain.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrés Holgado-Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holgado-Sánchez, A., Arias, J., Billhardt, H., Ossowski, S. (2024). Algorithms for Learning Value-Aligned Policies Considering Admissibility Relaxation. In: Osman, N., Steels, L. (eds) Value Engineering in Artificial Intelligence. VALE 2023. Lecture Notes in Computer Science(), vol 14520. Springer, Cham. https://doi.org/10.1007/978-3-031-58202-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-58202-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-58204-2

  • Online ISBN: 978-3-031-58202-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics