Can the predictive processing model of the mind ameliorate the value-alignment problem?

William Ratoff ORCID: orcid.org/0000-0001-6129-5197¹

674 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

How do we ensure that future generally intelligent AI share our values? This is the value-alignment problem. It is a weighty matter. After all, if AI are neutral with respect to our wellbeing, or worse, actively hostile toward us, then they pose an existential threat to humanity. Some philosophers have argued that one important way in which we can mitigate this threat is to develop only AI that shares our values or that has values that ‘align with’ ours. However, there is nothing to guarantee that this policy will be universally implemented—in particular, ‘bad actors’ are likely to flout it. In this paper, I show how the predictive processing model of the mind, currently ascendant in cognitive science, may ameliorate the value-alignment problem. In essence, I argue that there are a plurality of reasons why any future generally intelligent AI will possess a predictive processing cognitive architecture (e.g. because we decide to build them that way; because it is the only possible cognitive architecture that can underpin general intelligence; because it is the easiest way to create AI.). I also argue that if future generally intelligent AI possess a predictive processing cognitive architecture, then they will come to share our pro-moral motivations (of valuing humanity as an end; avoiding maleficent actions; etc.), regardless of their initial motivation set. Consequently, these AI will pose a minimal threat to humanity. In this way then, I conclude, the value-alignment problem is significantly ameliorated under the assumption that future generally intelligent AI will possess a predictive processing cognitive architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How should predictive processors conceive of practical reason?

Article 26 October 2023

Existentialist risk and value misalignment

Article 25 April 2024

Can predictive processing explain self-deception?

Article 16 July 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Of course, this threat will only be a ‘live’ or ‘pressing’ one if the AI has a significant chance of realizing its ambitions. See Chalmers (2010) for a detailed discussion of why super-intelligent AI–AI whose intellect dwarfs our own—are both highly likely, if generally intelligent AI is possible at all, and highly likely to possess the means to pose a real threat to humanity.
Those familiar with the predictive processing literature should note that I am, following Hohwy (2013) and Clark (2015), assuming here a cognitivist and/or representationalist interpretation of the predictive processing model. This cognitivist/representationalist reading is questioned, and alternative non-cognitivist or non-representationalist interpretations of predictive processing are discussed, in—for example—Kirchhoff & Robertson (2018) and Downey (2018). The interested reader should consult these references for further discussion. I cannot defend the cognitivist/representationalist reading here. Rather, I shall simply be assuming it.
By ‘predictive processors’ I mean proponents of the predictive processing model of the mind.
The reader may remain skeptical (justifiably, by my lights) over the prospects for an adequate predictive processing theory of desire and motivation. The reader can consult, for example, Klein (2020) for a sustained argument that the predictive processing model cannot adequately account in principle for the phenomenon of desire.
Such desires to behave in the (de re) moral ways include, for example, the desire to care for conspecifics or the desire to avoid harming others without excuse etc. etc.
It might be thought that (anti-Humean) Realism presents an attractive solution to the value-alignment problem. After all, many such Realists hold that an agent’s moral beliefs give her overriding motivation to act as they indicate she is morally required to act (at least, when she fully comprehends the contents of these moral beliefs). Consequently, if Realism is true, and if generally intelligent AI are capable of having moral intuitions, in light of which they form the same moral beliefs as we do, then we should expect such AI to share our pro-moral motivations. However, under these assumptions, there is, on the face of it, nothing to stop ‘bad actors’ from creating generally intelligent AI that lack the capacity to have moral intuitions or moral beliefs—either by omitting to program something like a faculty of moral sense that produces such moral intuitions, or by damaging or removing it after creation. For this reason then, the assumption of Realism does not constitute an amelioration of the value-alignment problem relative to the standard solution. I will therefore abstain from any further discussion of Realism in this paper.
The reader might ask: ‘what if act consequentialism is true?’. Of course, if act consequentialism is true, then the majority of actions ever performed will have been wrong, since they were not the optimific action out of those available. However, I am assuming here that commonsense morality (or something near enough) is true. Act consequentialism is highly revisionary with respect to commonsense morality and thus (I will assume here that it is) false. Rather, I am assuming here that commonsense morality—morality as it is conceived by the proverbial ‘man on the Clapham Omnibus’, and theorized by philosophical deontologists (rights to life and non-interference etc.)—is true.
If the first-generation of generally intelligent AI can create new AI themselves, then the second-generation of generally intelligent AI may be the product, not of humans, but of this first-generation.
As Chalmers (2010, p. 25) puts it: ‘…eventually, it is likely that there will be AIs with cognitive capacities akin to ours, if only through brain emulation…’. Although others—such as Bostrom (2014)—doubt that brain emulation is the most plausible route to artificial general intelligence.
My reasoning here mirrors David Chalmer’s (2010) discussion of how the value-alignment problem is ameliorated when assuming Kantian psychology and moral philosophy. In brief, Kantian moral philosophy has it that morality is rationally required for any agent capable of grasping and reflecting on their reasons for action. This account therefore entails that any perfectly rational agent will be perfectly moral. Granting that intelligent correlates with rationality, it therefore follows, for the Kantian, that super-intelligent AI will be (close to) perfectly moral.
Here I use the locution ‘rational agent’, not to mean an agent that is appropriately responsive to her reasons, but rather to mean an agent that is a person—namely, a thinker capable of self-conscious reflection on her own attitudes (such as a normal adult human in contrast to, say, a chicken).

References

Adams, R., Shipp, S., & Friston, K. (2012). Predictions not commands: Active inference in the motor system. Brain Structure Function, 218(3), 611–643.
Article Google Scholar
Baraglia, J., Nagai, Y. & Asada, M. (2014). Prediction error minimization for emergence of altruistic behavior. In 4th international conference on development and learning and on epigenetic robotics.
Blackburn, S. (1998). Ruling passions: A theory of practical reasoning. Oxford University Press.
Google Scholar
Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71–85.
Article Google Scholar
Bostrom, N. (2014). Superintelligence: Paths, dangers. Oxford University Press.
Google Scholar
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Science, 16(10), 485–488.
Article Google Scholar
Chalmers, D. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17(9–10), 7–65.
Google Scholar
Clark, A. (2013a). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.
Article Google Scholar
Clark, A. (2013b). Expecting the world: Perception, prediction, and the origin of human knowledge. The Journal of Philosophy, 15(9), 469–496.
Article Google Scholar
Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press.
Google Scholar
Clark, A. (2019). Beyond desire? Agency, choice, and the predictive mind. Australasian Journal of Philosophy, 98, 1–15.
Article Google Scholar
Cullen, M., Davey, B., Friston, K. J., & Moran, R. J. (2018). Active inference in OpenAI gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(9), 809–818.
Google Scholar
Davidson, D. (1985). Essays on actions and events. Oxford University Press.
Google Scholar
Dennett, D. (1987). The intentional stance. MIT Press.
Google Scholar
Downey, A. (2018). Predictive processing and the representation wars: A victory for the eliminativist (via Fictionalism). Synthese, 195, 5115–5139.
Article Google Scholar
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360(1456), 815–836.
Article Google Scholar
Friston, K. (2012). Free-energy minimization and the dark-room problem. Frontiers in Psychology, 2012(3), 130.
Google Scholar
Friston, K. (2013). Active inference and free energy: commentary on Andy Clark’s ‘predictive brains, situated agents, and the future of cognitive science.’ Behavioral and Brain Sciences, 36(3), 212–213.
Article Google Scholar
Friston, K., & Stephan, K. (2007). Free energy and the brain. Synthese, 159, 417–458.
Article Google Scholar
Friston, K., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology Paris, 100(1–3), 70–87.
Article Google Scholar
Friston, K., Mattout, J., & Kilner, J. (2011). Action understanding and active inference. Biological Cybernetics, 104, 137–160.
Article MathSciNet Google Scholar
Friston, K., Adams, R., & Montague, R. (2012). What is value—accumulated reward or evidence? Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2012.00011
Article Google Scholar
Hohwy, J. (2013). The predictive mind. Oxford University Press.
Book Google Scholar
Kirchhoff, M., & Robertson, I. (2018). Enactivism and predictive processing: A non-representational view. Philosophical Explorations, 21(2), 264–281.
Article Google Scholar
Korsgaard, C. (2009). Self-constitution: Agency, identity, and integrity. Oxford University Press.
Book Google Scholar
Klein, C. (2018). What do predictive coders want? Synthese, 95(6), 2451–2557.
Google Scholar
Klein, C. (2020). A humean challenge to predictive coding. In S. Gouveia, D. Mendonca, & M. Curado (Eds.), The philosophy and science of predictive processing. Bloomsbury Press.
Google Scholar
McDowell, J. (1978). Are moral requirements hypothetical imperatives? Proceedings of the Aristotelian Society, 52, 13–29.
Article Google Scholar
McDowell, J. (1979). Virtue and reason. The Monist, 62(3), 331–350.
Article Google Scholar
Nagel, T. (1970). The possibility of altruism. Oxford Clarendon Press.
Google Scholar
Smith, M. (1987). The humean theory of motivation. Mind, 96, 36–61.
Article Google Scholar
Smith, M. (1994). The moral problem. Blackwell Publishers.
Google Scholar
Shafer-Landau, R. (2003). Moral realism: A defense. Oxford University Press.
Book Google Scholar
Solway, A., & Botvinick, M. (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119(1), 120–154.
Article Google Scholar
Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Science, 24, 346–348.
Article Google Scholar
Tomasello, M. (2016). A natural history of human morality. Harvard University Press.
Book Google Scholar
Van de Cruys, S., Friston, K., & Clark, A. (2020). Controlled optimism: Reply to sun and firestone on the dark room problem. Trends in Cognitive Sciences, 24(9), 680–681.
Article Google Scholar
Wedgwood, R. (2004). The metaethicists’ mistake. Philosophical Perspectives, 18, 405–426.
Article Google Scholar
Wedgwood, R. (2007). The nature of normativity. Clarendon Press.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Dartmouth College, Hanover, NH, USA
William Ratoff

Authors

William Ratoff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Ratoff.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ratoff, W. Can the predictive processing model of the mind ameliorate the value-alignment problem?. Ethics Inf Technol 23, 739–750 (2021). https://doi.org/10.1007/s10676-021-09611-0

Download citation

Accepted: 23 August 2021
Published: 06 September 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10676-021-09611-0

Can the predictive processing model of the mind ameliorate the value-alignment problem?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How should predictive processors conceive of practical reason?

Existentialist risk and value misalignment

Can predictive processing explain self-deception?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Can the predictive processing model of the mind ameliorate the value-alignment problem?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How should predictive processors conceive of practical reason?

Existentialist risk and value misalignment

Can predictive processing explain self-deception?

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation