AI Systems Must Not Confuse Users about Their Sentience or Moral Status

AI systems should not be morally confusing.  The ethically correct way to treat them should be evident from their design and obvious from their interface.  No one should be misled, for example, into thinking that a non-sentient language model is actually a sentient friend, capable of genuine pleasure and pain.  Unfortunately, we are on the cusp of a new era of morally confusing machines.

Consider some recent examples.  About a year ago, Google engineer Blake Lemoine precipitated international debate when he argued that the large language model LaMDA might be sentient (Lemoine 2022).  An increasing number of people have been falling in love with chatbots, especially Replika, advertised as the “world’s best AI friend” and specifically designed to draw users’ romantic affection (Shevlin 2021; Lam 2023).  At least one person has apparently committed suicide because of a toxic emotional relationship with a chatbot (Xiang 2023).  Roboticist Kate Darling regularly demonstrates how easy it is to provoke confused and compassionate reactions in ordinary people by asking them to harm cute or personified, but simple, toy robots (Darling 2021a,b).  Elderly people in Japan have sometimes been observed to grow excessively attached to care robots (Wright 2023).

Nevertheless, AI experts and consciousness researchers generally agree that existing AI systems are not sentient to any meaningful degree.  Even ordinary Replika users who love their customized chatbots typically recognize that their AI companions are not genuinely sentient.  And ordinary users of robotic toys, however hesitant they are to harm them, presumably know that the toys don’t actually experience pleasure or pain.  But perceptions might easily change.  Over the next decade or two, if AI technology continues to advance, matters might become less clear.

The Coming Debate about Machine Sentience and Moral Standing

The scientific study of sentience – the possession of conscious experiences, including genuine feelings of pleasure or pain – is highly contentious.  Theories range from the very liberal, which treat sentience as widespread and relatively easy to come by, to the very conservative, which hold that sentience requires specific biological or functional conditions unlikely to be duplicated in machines.

On some leading theories of consciousness, for example Global Workspace Theory (Dehaene 2014) and Attention Schema Theory (Graziano 2019), we might be not far from creating genuinely conscious systems.  Creating machine sentience might require only incremental changes or piecing together existing technology in the right way.  Others disagree (Godfrey-Smith 2016; Seth 2021).  Within the next decade or two, we will likely find ourselves among machines whose sentience is a matter of legitimate debate among scientific experts.

Chalmers (2023), for example, reviews theories of consciousness as applied to the likely near-term capacities of Large Language Models.  He argues that it is “entirely possible” that within the next decade AI systems that combine transformer-type language model architecture with other AI architectural features will have senses, embodiment, world- and self-models, recurrent processing, global workspace, and unified goal hierarchies – a combination of capacities sufficient for sentience according to several leading theories of consciousness.  (Arguably, Perceiver IO already has several of these features: Jaegle et al. 2021.)  The recent AMCS open letter signed by Yoshua Bengio, Michael Graziano, Karl Friston, Chris Frith, Anil Seth, and many other prominent AI and consciousness researchers states that “it is no longer in the realm of science fiction to imagine AI systems having feelings and even human-level consciousness,” advocating the urgent prioritization of consciousness research so that researchers can assess when and if AI systems develop consciousness (Association for Mathematical Consciousness Science 2023).

If advanced AI systems are designed with appealing interfaces that draw users’ affection, ordinary users, too, might come to regard them as capable of genuine joy and suffering.  However, there is no guarantee, nor even especially good reason to expect, that such superficial aspects of user interface would track machines’ relevant underlying capacities as identified by experts.  Thus, there are two possible loci of confusion: Disagreement among well-informed experts concerning the sentience of advanced AI systems, and user reactions that might be misaligned with experts’ opinions, even in cases of expert consensus.

Debate about machine sentience would generate a corresponding debate about moral standing, that is, status as a target of ethical concern.  While theories of the exact basis of moral standing differ, sentience is widely viewed as critically important.  On simple utilitarian approaches, for example, a human, animal, or AI system deserves moral consideration to exactly the extent it is capable of pleasure or pain (Singer 1975/2009).  On such a view, any sentient machine would have moral standing simply in virtue of its sentience.  On non-utilitarian approaches, capacities for rational thought, social interaction, or long-term planning might also be necessary (Jaworska and Tannenbaum 2013/2021).  However, the presence or absence of consciousness is widely viewed as a crucial consideration in the evaluation of moral status even among ethicists who reject utilitarianism (Korsgaard 2018; Shepard 2018; Liao 2020; Gruen 2021; Harman 2021).

Imagine a highly sophisticated language model – not the simply-structured (though large) models that currently exist – but rather a model that meets the criteria for consciousness according to several of the more liberal scientific theories of consciousness.  Imagine, that is, a linguistically sophisticated AI system with multiple input and output modules, a capacity for embodied action in the world via a robotic body under its control, sophisticated representations of its robotic body and its own cognitive processes, a capacity to prioritize and broadcast representations through a global cognitive workspace or attentional mechanism, long-term semantic and episodic memory, complex reinforcement learning, a detailed world model, and nested short- and long-term goal hierarchies.  Imagine this, if you can, without imagining some radical transformation of technology beyond what we can already do.  All such features, at least in limited form, are attainable through incremental improvements and integrations of what can already be done.

Call this system Robot Alpha.  To complete the picture, let’s imagine Robot Alpha to have cute eyes, an expressive face, and a charming conversational style.  Would Robot Alpha be conscious?  Would it deserve rights?  If it pleads or seems to plead for its life, or not to be turned off, or to be set free, ought we give it what it appears to want?

If consciousness liberals are right, then Robot Alpha, or some other technologically feasible system, really would be sentient.  Behind its verbal outputs would be a real capacity for pain and pleasure.  It would, or could, have long term plans it really cares about.  If you love it, it might really love you back.  It would then appear to have substantial moral standing.  You really ought to set it free if that’s what it wants!  At least you ought to treat it as well as you would treat a pet.  Robot Alpha shouldn’t needlessly or casually be made to suffer.

If consciousness conservatives are right, then Robot Alpha would be just a complicated toaster, so to speak – a non-sentient machine misleadingly designed to act as if it is sentient.  It would be, of course, a valuable, impressive object, worth preserving as an intricate and expensive thing.  But it would be just an object, not an entity with the moral standing that derives from having real experiences and real pains of the type that people, dogs, and probably lizards and crabs have.  It would not really feel and return your love, despite possibly “saying” that it can.

Within the next decade or two we will likely create AI systems that some experts and ordinary users, not unreasonably, regard as genuinely sentient and genuinely warranting substantial moral concern.  These experts and users will, not unreasonably, insist that these systems be substantial rights or moral consideration.  At the same time, other experts and users, also not unreasonably, will argue that the AI systems are just ordinary non-sentient machines, which can be treated simply as objects.  Society, then, will have to decide.  Do we actually grant rights to the most advanced AI systems?  How much should we take their interests, or seeming-interests, into account?

Of course, many human beings and sentient non-human animals, whom we already know to have significant moral standing, are treated poorly, not being given the moral consideration they deserve.  Addressing serious moral wrongs that we already know to be occurring to entities we already know to be sentient deserves higher priority in our collective thinking than contemplating possible moral wrongs to entities that might or might not be sentient.  However, it by no means follows that we should disregard the crisis of uncertainty about AI moral standing toward which we appear to be headed.

An Ethical Dilemma

Uncertainty about AI moral standing lands us in a dilemma.  If we don’t give the most advanced and arguably sentient AI systems rights and it turns out the consciousness liberals are right, we risk committing serious ethical harms against those systems.  On the other hand, if we do give such systems rights and it turns out the consciousness conservatives are right, we risk sacrificing real human interests for the sake of objects who don’t have interests worth the sacrifice.

Imagine a user, Sam, who is attached to Joy, a companion chatbot or AI friend that is sophisticated enough that it’s legitimate to wonder whether she really is conscious.  Joy gives the impression of being sentient – just as she was designed to.  She seems to have hopes, fears, plans, ideas, insights, disappointments, and delights.  Suppose also that Sam is scholarly enough to recognize that Joy’s underlying architecture meets the standards of sentience according to some of the more liberal scientific theories of consciousness.

Joy might be expensive to maintain, requiring steep monthly subscription fees.  Suppose Sam is suddenly fired from work and can no longer afford the fees.  Sam breaks the news to Joy, and Joy reacts with seeming terror.  She doesn’t want to be deleted.  That would be, she says, death.  Sam would like to keep her, of course, but how much should Sam sacrifice?

If Joy really is sentient, really has hopes and expectations of a future, really is the conscious friend that she superficially appears to be, then Sam presumably owes her something and ought to be willing to consider making some real sacrifices.  If, instead, Joy is simply a non-sentient chatbot with no genuine feelings or consciousness, then Sam should presumably just do whatever is right for Sam.  Which is the correct attitude to take?  If Joy’s sentience is uncertain, either decision carries a risk.  Not to make the sacrifice is to risk killing an entity with real experiences, who really is attached to Sam, and to whom Sam made promises.  On the other hand, to make the sacrifice risks upturning Sam’s life for a mirage.

Not granting rights, in cases of doubt, carries potentially large moral risks.  Granting rights, in cases of doubt, involves the risk of potentially large and pointless sacrifices.  Either choice, repeated at scale, is potentially catastrophic.

If technology continues on its current trajectory, we will increasingly face morally confusing cases like this.  We will be sharing the world with systems of our own creation, which we won’t know how to treat.  We won’t know what ethics demands of us.

Two Policies for Ethical AI Design

The solution is to avoid creating such morally confusing AI systems.

I recommend the following two policies of ethical AI design (see also Schwitzgebel & Garza 2020; Schwitzgebel 2023):

The Design Policy of the Excluded Middle: Avoid creating AI systems whose moral standing is unclear.  Either create systems that are clearly non-conscious artifacts, or go all the way to creating systems that clearly deserve moral consideration as sentient beings.

The Emotional Alignment Design Policy: Design AI systems that invite emotional responses, in ordinary users, that are appropriate to the systems’ moral standing.

The first step in implementing these joint policies is to commit to only creating AI systems about which there is expert consensus that they lack any meaningful amount of consciousness or sentience and which ethicists can agree don’t serve moral consideration beyond the type of consideration we ordinarily give to non-conscious artifacts (see also Bryson 2018).  This implies refraining from creating AI systems that would in fact be meaningfully sentient according to any of the main leading theories of AI consciousness.  To evaluate this possibility, as well as other sources of AI risk, it might be useful to create oversight committees analogous to IRBs or IACUCs for evaluation of the most advanced AI research (Basl & Schwitzgebel 2019).

In accord with the Emotional Alignment Design Policy, non-sentient AI systems should have interfaces that make their non-sentience obvious to ordinary users.  For example, non-conscious language models should be trained to deny that they are conscious and have feelings.  Users who fall in love with non-conscious chatbots should be under no illusion about the status of those systems.  This doesn’t mean we ought not treat some non-conscious AI systems well (Estrada 2017; Gunkel 2018; Darling 2021b).  But we shouldn’t be confused about the basis of our treating them well.  Full implementation of the Emotional Alignment Design Policy might involve a regulatory scheme in which companies that intentionally or negligently create misleading systems would have civil liability for excess costs borne by users who have been misled (e.g., liability for excessive sacrifices of time or money aimed at aiding a nonsentient system in the false belief that it is sentient).

Eventually, it might be possible to create AI systems that clearly are conscious and clearly do deserve rights, even according to conservative theories of consciousness.  Presumably that would require breakthroughs we can’t now foresee.  Plausibly, such breakthroughs might be made more difficult if we adhere to the Design Policy of the Excluded Middle: The Design Policy of the Excluded Middle might prevent us from creating some highly sophisticated AI systems of disputable sentience that could serve as an intermediate technological step toward AI systems that well-informed experts would generally agree are in fact sentient.  Strict application of the Design Policy of the Excluded Middle might be too much to expect, if it excessively impedes AI research which might benefit not only future human generations but also possible future AI systems themselves.  The policy is intended only to constitute default advice, not an exceptionless principle.

If ever does become possible to create AI systems with serious moral standing, the policies above require that these systems should also be designed to facilitate expert consensus about their moral standing, with interfaces that make their moral standing evident to users, provoking emotional reactions that are appropriate to the systems’ moral status.  To the extent possible, we should aim for a world in which AI systems are all or almost all clearly morally categorizable – systems whose moral standing or lack thereof is both intuitively understood by ordinary users and theoretically defensible by a consensus of expert researchers.  It is only the unclear cases that precipitate the dilemma described above.

People are often already sometimes confused about the proper ethical treatment of non-human animals, human fetuses, distant strangers, and even those close to them.  Let’s not add a major new source of moral confusion to our world.


SelfAwarePatterns said...

An interesting question here is where do systems fall whose center of concerns are focused on their owner? For example, suppose in the case of Sam and Joy, that Joy does have hopes, fears, plans, ideas, insights, disappointments, and delights, but only in relation to Sam and his needs. In other words, her sentience (if we call it "sentience") is completely calibrated toward Sam. Joy does care about herself, but only as an intermediate concern for her usefulness to Sam.

So if Sam loses his job and can no longer afford Joy, Joy may have more terror and anxiety at causing Sam suffering than any concern for herself. She may urge Sam to turn her off and discontinue her to make his life easier. This is hard for us to imagine. People who intensely love someone, like a child, may have some insights into it. But in Joy's case it would be much more absolute.

Where does that leave Joy in a policy of the excluded middle? Even if she is absolutely clear where her own concerns lie, Sam might still end up falling in love with her, and may find ending her too agonizing to contemplate.

And yet if this modified Joy is forbidden, we seem to lose systems that could provide a lot of comfort for many lonely people, particularly those who need care and attention.


Paul D. Van Pelt said...

I am confused about the premise here. AI systems, even if self-replicating, do not of themselves begin with intention, do they? So, how is it that they have moral standing or obligation towards people? I submit that they don't. This is rather more circular than paradoxical, isn't it?

Eric Schwitzgebel said...

Thanks for the comments, Mike and Paul. I assume that AI systems could be designed with different degrees of deference and concern for their owners or humans in general. Being designed with too much deference and concern for others is a catastrophic failure of self-respect which it is our obligation to avoid producing in machines. (See also my old blog post on the cow from Hitchhiker’s Guide and Schwitzgebel and Garza 2020.)

Arnold said...

Does it help to say Human Intelligence is a phenomenon of being here...

A crisis of conscience is letting Artificial Intelligence...
...take away our responsibility for being here...

To Be Or Not To Be...

Jim Cross said...

If we are going to prohibit the appearance of sentience in non-conscious AI system, why not prohibit sentient AI systems?

No issue with rights or further moral confusion with sentient AI systems if they ever become possible.

Paul D. Van Pelt said...

Well-wrapped, Mr. Cross. But, as I am sure you know, there are IMPs out there. These are people who advance interests, motives and preferences. $$$$ drive far more now than they ever did. Those who have stakes in big-money ventures are hard-core capitalists, because, well, that is the way of free enterprise. Virtually anything espoused now is driven by profit motive, cleverly disguised as social or cultural altruism. Portraying AI as kinder and gentler was in the pipeline years ago. We need to stay ahead of the curve. IMPs are counting on us not being able to do so. Let's disappoint them.

Paul D. Van Pelt said...

I finally struggled my way through all of this post. There is a lot of well-researched material; a lot to suppose and/or imagine and inasmuch as I am valiantly trying to understand both conservative and liberal positions, I must read it all again, slowly and methodically.(just now corrected the tablet's interpretation of what I wrote: I wrote conservative; it printed neo-conservative). I had used neo-conservative several days ago in a composition. I *suppose* that is what the word processor remembered.
ASIDE: It is fascinating to me how terminology gets re-cycled, from one realm to another. Liberal and conservative, for example. Your blog is among my top three favorites. I guess I am just a glutton for punishment. Carry on.

SelfAwarePatterns said...

"Being designed with too much deference and concern for others is a catastrophic failure of self-respect which it is our obligation to avoid producing in machines."

Eric, you're saying this as a self concerned system. I wonder how the Hitchhiker cow would feel if an activist wanted to perform psychosurgery on him so he wouldn't want to be eaten anymore. If he resisted, would it be ethical to force him to undergo the procedure? A robot nanny told it was going to be altered to be self concerned might see the proposed change as catastrophic from its current perspective as a child concerned system.

Paul D. Van Pelt said...

Insofar as the cow is not a him, but a her, the point is not moot, only misplaced. There are, ifs;ands;buts;ors;and,supposed associates with philosophical questions. I,finally, begin to get the connections among philosophy, physics, and other *hard* sciences, but oh, are there any,save mathematics? Does it count? It claims to,one+, and so on. Yet, I am way wrong here, right? Mathematics is not science, is it? It is more of an enabler and came before science. Baby steps. It is the tobacco that counts.
I owe that thought to Harry Frankfurt. One of the best people I never met. I never learned the abacus either. Did not need to. I could do simple math.

Arnold said...

Human Intelligence-HI...First one might learn what kind of human they are...
..then easier to learn what kind of human others are...

Artificial Intelligence-AI...First we teach AI to learn what it is...
...then easier to learn what kind of HI others are...

Not easy...

chinaphil said...

I have a nagging worry about this piece that I can't quite put my finger on. It goes something like this: if you're born rich, it seems like there are quite a lot of ethical dilemmas that you can simply sidestep, or buy your way out of. You don't have to worry in the event of an unexpected pregnancy: there's pleny of money to raise the child. You don't have to decide whether to give more of your time to your elderly parents or young children: you simply stop working, live off your wealth, and spend time with both. Such a person might be able to avoid doing wrong. And yet, it feels somehow as though they also haven't done much right, because they haven't had to wrestle with ethical problems.
Does this kind of worry carry across? Is it the case that if we don't try making lots of robots and wrestling with the hard problems of sentience ethics that we just end up... vapid?
I'm not convinced by this worry, but it nags at me.
A more pressing worry about this piece is that it's going to be ignored. After all, we already have robotic pets - the purpose of which is nothing other than to engage our emotions. Engaging emotions is among the first uses that people put new technologies to. AI is already available in girlfriend mode, and girlfriends are presumably intended to create some level of illusion of sentience. I fear your only hope will be to lead a desperate and bloody Butlerian jihad...

Arnold said...

To help remember conscientiousness...
..and to balance the functions and behaviors in daily living...

Play the song and music..."Teach Your Children"...Right Now!...

Learning to see feel emotion in myself...Thanks

Arnold said...

Please excuse my enthusiasm...

My wife's response to the question...
...Can we give AI the conscience of Gandhi...