Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Can the predictive processing model of the mind ameliorate the value-alignment problem?

  • Original Paper
  • Published:
Ethics and Information Technology Aims and scope Submit manuscript

Abstract

How do we ensure that future generally intelligent AI share our values? This is the value-alignment problem. It is a weighty matter. After all, if AI are neutral with respect to our wellbeing, or worse, actively hostile toward us, then they pose an existential threat to humanity. Some philosophers have argued that one important way in which we can mitigate this threat is to develop only AI that shares our values or that has values that ‘align with’ ours. However, there is nothing to guarantee that this policy will be universally implemented—in particular, ‘bad actors’ are likely to flout it. In this paper, I show how the predictive processing model of the mind, currently ascendant in cognitive science, may ameliorate the value-alignment problem. In essence, I argue that there are a plurality of reasons why any future generally intelligent AI will possess a predictive processing cognitive architecture (e.g. because we decide to build them that way; because it is the only possible cognitive architecture that can underpin general intelligence; because it is the easiest way to create AI.). I also argue that if future generally intelligent AI possess a predictive processing cognitive architecture, then they will come to share our pro-moral motivations (of valuing humanity as an end; avoiding maleficent actions; etc.), regardless of their initial motivation set. Consequently, these AI will pose a minimal threat to humanity. In this way then, I conclude, the value-alignment problem is significantly ameliorated under the assumption that future generally intelligent AI will possess a predictive processing cognitive architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Of course, this threat will only be a ‘live’ or ‘pressing’ one if the AI has a significant chance of realizing its ambitions. See Chalmers (2010) for a detailed discussion of why super-intelligent AI–AI whose intellect dwarfs our own—are both highly likely, if generally intelligent AI is possible at all, and highly likely to possess the means to pose a real threat to humanity.

  2. Those familiar with the predictive processing literature should note that I am, following Hohwy (2013) and Clark (2015), assuming here a cognitivist and/or representationalist interpretation of the predictive processing model. This cognitivist/representationalist reading is questioned, and alternative non-cognitivist or non-representationalist interpretations of predictive processing are discussed, in—for example—Kirchhoff & Robertson (2018) and Downey (2018). The interested reader should consult these references for further discussion. I cannot defend the cognitivist/representationalist reading here. Rather, I shall simply be assuming it.

  3. By ‘predictive processors’ I mean proponents of the predictive processing model of the mind.

  4. The reader may remain skeptical (justifiably, by my lights) over the prospects for an adequate predictive processing theory of desire and motivation. The reader can consult, for example, Klein (2020) for a sustained argument that the predictive processing model cannot adequately account in principle for the phenomenon of desire.

  5. Such desires to behave in the (de re) moral ways include, for example, the desire to care for conspecifics or the desire to avoid harming others without excuse etc. etc.

  6. It might be thought that (anti-Humean) Realism presents an attractive solution to the value-alignment problem. After all, many such Realists hold that an agent’s moral beliefs give her overriding motivation to act as they indicate she is morally required to act (at least, when she fully comprehends the contents of these moral beliefs). Consequently, if Realism is true, and if generally intelligent AI are capable of having moral intuitions, in light of which they form the same moral beliefs as we do, then we should expect such AI to share our pro-moral motivations. However, under these assumptions, there is, on the face of it, nothing to stop ‘bad actors’ from creating generally intelligent AI that lack the capacity to have moral intuitions or moral beliefs—either by omitting to program something like a faculty of moral sense that produces such moral intuitions, or by damaging or removing it after creation. For this reason then, the assumption of Realism does not constitute an amelioration of the value-alignment problem relative to the standard solution. I will therefore abstain from any further discussion of Realism in this paper.

  7. The reader might ask: ‘what if act consequentialism is true?’. Of course, if act consequentialism is true, then the majority of actions ever performed will have been wrong, since they were not the optimific action out of those available. However, I am assuming here that commonsense morality (or something near enough) is true. Act consequentialism is highly revisionary with respect to commonsense morality and thus (I will assume here that it is) false. Rather, I am assuming here that commonsense morality—morality as it is conceived by the proverbial ‘man on the Clapham Omnibus’, and theorized by philosophical deontologists (rights to life and non-interference etc.)—is true.

  8. If the first-generation of generally intelligent AI can create new AI themselves, then the second-generation of generally intelligent AI may be the product, not of humans, but of this first-generation.

  9. As Chalmers (2010, p. 25) puts it: ‘…eventually, it is likely that there will be AIs with cognitive capacities akin to ours, if only through brain emulation…’. Although others—such as Bostrom (2014)—doubt that brain emulation is the most plausible route to artificial general intelligence.

  10. My reasoning here mirrors David Chalmer’s (2010) discussion of how the value-alignment problem is ameliorated when assuming Kantian psychology and moral philosophy. In brief, Kantian moral philosophy has it that morality is rationally required for any agent capable of grasping and reflecting on their reasons for action. This account therefore entails that any perfectly rational agent will be perfectly moral. Granting that intelligent correlates with rationality, it therefore follows, for the Kantian, that super-intelligent AI will be (close to) perfectly moral.

  11. Here I use the locution ‘rational agent’, not to mean an agent that is appropriately responsive to her reasons, but rather to mean an agent that is a person—namely, a thinker capable of self-conscious reflection on her own attitudes (such as a normal adult human in contrast to, say, a chicken).

References

  • Adams, R., Shipp, S., & Friston, K. (2012). Predictions not commands: Active inference in the motor system. Brain Structure Function, 218(3), 611–643.

    Article  Google Scholar 

  • Baraglia, J., Nagai, Y. & Asada, M. (2014). Prediction error minimization for emergence of altruistic behavior. In 4th international conference on development and learning and on epigenetic robotics.

  • Blackburn, S. (1998). Ruling passions: A theory of practical reasoning. Oxford University Press.

    Google Scholar 

  • Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71–85.

    Article  Google Scholar 

  • Bostrom, N. (2014). Superintelligence: Paths, dangers. Oxford University Press.

    Google Scholar 

  • Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Science, 16(10), 485–488.

    Article  Google Scholar 

  • Chalmers, D. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17(9–10), 7–65.

    Google Scholar 

  • Clark, A. (2013a). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.

    Article  Google Scholar 

  • Clark, A. (2013b). Expecting the world: Perception, prediction, and the origin of human knowledge. The Journal of Philosophy, 15(9), 469–496.

    Article  Google Scholar 

  • Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press.

    Google Scholar 

  • Clark, A. (2019). Beyond desire? Agency, choice, and the predictive mind. Australasian Journal of Philosophy, 98, 1–15.

    Article  Google Scholar 

  • Cullen, M., Davey, B., Friston, K. J., & Moran, R. J. (2018). Active inference in OpenAI gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(9), 809–818.

    Google Scholar 

  • Davidson, D. (1985). Essays on actions and events. Oxford University Press.

    Google Scholar 

  • Dennett, D. (1987). The intentional stance. MIT Press.

    Google Scholar 

  • Downey, A. (2018). Predictive processing and the representation wars: A victory for the eliminativist (via Fictionalism). Synthese, 195, 5115–5139.

    Article  Google Scholar 

  • Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360(1456), 815–836.

    Article  Google Scholar 

  • Friston, K. (2012). Free-energy minimization and the dark-room problem. Frontiers in Psychology, 2012(3), 130.

    Google Scholar 

  • Friston, K. (2013). Active inference and free energy: commentary on Andy Clark’s ‘predictive brains, situated agents, and the future of cognitive science.’ Behavioral and Brain Sciences, 36(3), 212–213.

    Article  Google Scholar 

  • Friston, K., & Stephan, K. (2007). Free energy and the brain. Synthese, 159, 417–458.

    Article  Google Scholar 

  • Friston, K., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology Paris, 100(1–3), 70–87.

    Article  Google Scholar 

  • Friston, K., Mattout, J., & Kilner, J. (2011). Action understanding and active inference. Biological Cybernetics, 104, 137–160.

    Article  MathSciNet  Google Scholar 

  • Friston, K., Adams, R., & Montague, R. (2012). What is value—accumulated reward or evidence? Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2012.00011

    Article  Google Scholar 

  • Hohwy, J. (2013). The predictive mind. Oxford University Press.

    Book  Google Scholar 

  • Kirchhoff, M., & Robertson, I. (2018). Enactivism and predictive processing: A non-representational view. Philosophical Explorations, 21(2), 264–281.

    Article  Google Scholar 

  • Korsgaard, C. (2009). Self-constitution: Agency, identity, and integrity. Oxford University Press.

    Book  Google Scholar 

  • Klein, C. (2018). What do predictive coders want? Synthese, 95(6), 2451–2557.

    Google Scholar 

  • Klein, C. (2020). A humean challenge to predictive coding. In S. Gouveia, D. Mendonca, & M. Curado (Eds.), The philosophy and science of predictive processing. Bloomsbury Press.

    Google Scholar 

  • McDowell, J. (1978). Are moral requirements hypothetical imperatives? Proceedings of the Aristotelian Society, 52, 13–29.

    Article  Google Scholar 

  • McDowell, J. (1979). Virtue and reason. The Monist, 62(3), 331–350.

    Article  Google Scholar 

  • Nagel, T. (1970). The possibility of altruism. Oxford Clarendon Press.

    Google Scholar 

  • Smith, M. (1987). The humean theory of motivation. Mind, 96, 36–61.

    Article  Google Scholar 

  • Smith, M. (1994). The moral problem. Blackwell Publishers.

    Google Scholar 

  • Shafer-Landau, R. (2003). Moral realism: A defense. Oxford University Press.

    Book  Google Scholar 

  • Solway, A., & Botvinick, M. (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119(1), 120–154.

    Article  Google Scholar 

  • Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Science, 24, 346–348.

    Article  Google Scholar 

  • Tomasello, M. (2016). A natural history of human morality. Harvard University Press.

    Book  Google Scholar 

  • Van de Cruys, S., Friston, K., & Clark, A. (2020). Controlled optimism: Reply to sun and firestone on the dark room problem. Trends in Cognitive Sciences, 24(9), 680–681.

    Article  Google Scholar 

  • Wedgwood, R. (2004). The metaethicists’ mistake. Philosophical Perspectives, 18, 405–426.

    Article  Google Scholar 

  • Wedgwood, R. (2007). The nature of normativity. Clarendon Press.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Ratoff.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ratoff, W. Can the predictive processing model of the mind ameliorate the value-alignment problem?. Ethics Inf Technol 23, 739–750 (2021). https://doi.org/10.1007/s10676-021-09611-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10676-021-09611-0

Keywords

Navigation