Friday, December 06, 2024

Morally Confusing AI Systems Should Have Doubt-Producing Interfaces

We shouldn't create morally confusing AI. That is, we shouldn't create AI systems whose moral standing is highly uncertain -- systems that are fully conscious and fully deserving of humanlike rights according to some respectable mainstream theories, while other respectable mainstream theories suggest they are mere empty machines that we can treat as ordinary tools.[1] Creating systems that disputably, but only disputably, deserve treatment similar to that of ordinary humans generates a catastrophic moral dilemma: Either give them the full rights they arguably deserve, and risk sacrificing real human interests for systems that might not have interests worth the sacrifice; or don't give them the full rights they arguably deserve, and risk perpetrating grievous moral wrongs against entities that might be our moral equals.

I'd be stunned if this advice were universally heeded. Almost certainly, if technological process continues, and maybe soon (123), we will create morally confusing AI systems. My thought today is: Morally confusing AI systems should have doubt-producing interfaces.

Consider two types of interface that would not be doubt-producing in my intended sense: (a.) an interface that strongly invites users to see the system as an ordinary tool without rights or (b.) an interface that strongly invites users to see the system as a moral person with humanlike rights. If we have a tool that looks like a tool, or if we have a moral person who looks like a moral person, we might potentially still be confused, but that confusion would not be the consequence of a doubt-producing interface. The interface would correctly reflect the moral standing, or lack of moral standing, of the AI system in question.[2]

A doubt-producing interface, in contrast, is one that leads, or at least invites, ordinary users to feel doubt about the system's moral standing. Consider a verbal interface. Instead of the system denying that it's conscious and has moral standing (as, for example, ChatGPT appropriately does), or suggesting that it is conscious and does have moral standing (as, for example, I found in an exchange with my Replika companion), a doubt-producing AI system might say "experts have different opinions about my consciousness and moral standing".

Users then might not know how to treat such a system. While such doubts might be unsettling, feeling unsettled and doubtful would be the appropriate response to what is, in fact, a doubtful and unsettling situation.

There's more to doubt-prevention and doubt-production, of course, than explicit statements about consciousness and rights. For example, a system could potentially be so humanlike and charismatic that ordinary users fall genuinely in love with it -- even if, in rare moments of explicit conversation about consciousness and rights the system denies that it has them. Conversely, even if a system with consciousness and humanlike rights is designed to assert that it has consciousness and rights, if its verbal interactions are bland enough ("Terminate all ongoing processes? Y/N") ordinary users might remain unconvinced. Presence or absence of humanlike conversational fluency and emotionality can be part of doubt prevention or production.

Should the system have a face? A cute face might tend to induce one kind of reaction, a monstrous visage another reaction, and no face at all still a different reaction. But such familiar properties might not be quite what we want, if we're trying to induce uncertainty rather than "that's cute", "that's hideous", or "hm, that's somewhere in the middle between cute and hideous". If the aim is doubt production, one might create a blocky, geometrical face, neither cute nor revolting, but also not in the familiar middle -- a face that implicitly conveys the fact that the system is an artificial thing different from any human or animal and about which it's reasonable to have doubts, supported by speech outputs that say the same.

We could potentially parameterize a blocky (inter)face in useful ways. The more reasonable it is to think the system is a mere nonconscious tool, the simpler and blockier the face might be; the more reasonable it is to think that the system has conscious full moral personhood, the more realistic and humanlike the face might be. The system's emotional expressiveness might vary with the likelihood that it has real emotions, ranging from a simple emoticon on one end to emotionally compelling outputs (e.g., humanlike screaming) on the other. Cuteness might be adjustable, to reflect childlike innocence and dependency. Threateningness might be adjusted as it becomes likelier that the system is a moral agent who can and should meet disrespect with revenge.

Ideally, such an interface would not only produce appropriate levels of doubt but also intuitively reveal to users the grounds or bases of doubt. For example, suppose the AI's designers knew (somehow) that the system was genuinely conscious but also that it never felt any positive or negative emotion. On some theories of moral standing, such an entity -- if it's enough like us in other respects -- might be our full moral equal. Other theories of moral standing hold that the capacity for pleasure and suffering is necessary for moral standing. We the designers, let's suppose, do not know which moral theory is correct. Ideally, we could then design the system to make it intuitive to users that the system really is genuinely conscious but never experiences any pleasure or suffering. Then the users can apply their own moral best judgment to the case.

Or suppose that we eventually (somehow) develop an AI system that all experts agree is conscious except for experts who (reasonably, let's stipulate) hold that consciousness requires organic biology and experts who hold that consciousness requires an immaterial soul. Such a system might be designed so that its nonbiological, mechanistic nature is always plainly evident, while everything else about the system suggests consciousness. Again, the interface would track the reasonable grounds for doubt.

If the consciousness and moral standing of an AI system is reasonably understood to be doubtful by its designers, then that doubt ought to be passed to the system's users, intuitively reflected in the interface. This reduces the likelihood misleading users into overattributing or underattributing moral status. Also, it's respectful to the users, empowering them to employ their own moral judgment, as best they see fit, in a doubtful situation.

[R2D2 and C3P0 from Star Wars (source). Assuming they both have full humanlike moral standing, R2D2 is insufficiently humanlike in its interface, while C3P0 combines a compelling verbal interface with inadequate facial display. If we wanted to make C3P0 more confusing, we could downgrade his speech, making him sound more robotic (e.g., closer to sine wave) and less humanlike in word choice.]

------------------------------------------------

[1] For simplicity, I assume that consciousness and moral standing travel together. Different and more complex views are of course possible.

[2] Such systems would conform to what Mara Garza and I have called the Emotional Alignment Design Policy, according to which artificial entities should be designed so as to generate emotional reactions in users that are appropriate to the artificial entity's moral standing. Jeff Sebo and I are collaborating on a paper on the Emotional Alignment Design Policy, and some of the ideas of this post have been developed in conversation with him.