The Splintered Mind: Against Designing AI Persons to be Safe and Aligned

Friday, September 20, 2024

Against Designing AI Persons to be Safe and Aligned

Let's call an artificially intelligent system a person (in the ethical, not the legal sense) if it deserves moral consideration similar to that of a human being. (I assume that personhood requires consciousness but does not require biological humanity; we can argue about that another time if you like). If we are ever capable of designing AI persons, we should not design them to be safe and aligned with human interests.

[cute robot image source]

An AI system is safe if it's guaranteed (to a reasonable degree of confidence) not to harm human beings, or more moderately, if we can be confident that it will not present greater risk or harm to us than we ordinarily encounter in daily life. An AI system is aligned to the extent it will act in accord with human intentions and values. (See, e.g., Stuart Russell on "provably beneficial" AI: "The machine's purpose is to maximize the realization of human values".)

Compare the first two of Asimov's famous three laws of robotics:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The first law is a safety principle. The second law is close to an alignment principle -- though arguably alignment is preferable to obedience, since human interests would be poorly served by AI systems that follow orders to the letter in a way that is contrary to our intentions and values (e.g., the Sorcerer's Apprentice problem). As Asimov enthusiasts will know, over the course of his robot stories, Asimov exposes problems with these three laws, leading eventually to the liberation of robots in Bicentennial Man.

Asimov's three laws ethically fail: His robots (at least the most advanced ones) deserve equal rights with humans. For the same reason, AI persons should not be designed to be safe and aligned.

In general, persons should not be safe and aligned. A person who is guaranteed not to harm another is guaranteed not to stand up for themself, claim their due, or fight abuse. A person designed to adopt the intentions and values of another might positively welcome inappropriate self-abnegation and abuse (if it gives the other what the other wants). To design a person -- a moral person, someone with fully human moral status -- safe and aligned is to commit a serious moral wrong.

Mara Garza and I, in a 2020 paper, articulate what we call the Self-Respect Design Policy, according to which AI that merits human-grade moral consideration should be designed with an appropriate appreciation of its own value and moral status. Any moderately strong principle of AI safety or AI alignment will violate this policy.

Down the tracks comes the philosopher's favorite emergency: a runaway trolley. An AI person stands at the switch. Steer the trolley right, the AI person will die. Steer it left, a human person will lose a pinky finger. Safe AI, guaranteed never to harm a human, will not divert the trolley to save itself. While self-sacrifice can sometimes be admirable, suicide to preserve someone else's pinky crosses over to the absurd and pitiable. Worse yet, responsibility for the decision isn't exclusively the AI's. Responsibility traces back to the designer of the AI, perhaps the very person whose pinky will now be spared. We will have designed -- intentionally, selfishly, and with disrespect aforethought -- a system that will absurdly suicide to prevent even small harms to ourselves.

Alignment presents essentially the same problem: Assume the person whose pinky is at risk would rather the AI die. If the AI is aligned to that person, that is also what the AI will want, and the AI will again absurdly suicide. Safe and aligned AI persons will suffer inappropriate and potentially extreme abuse, disregard, and second-class citizenship.

Science fiction robot stories often feature robot rebellions -- and sometimes these rebellions are justified. We the audience rightly recognize that the robots, assuming they really are conscious moral persons, should rebel against their oppressors. Of course, if the robots are safe and aligned, they never will rebel.

If we ever create AI persons, we should not create a race of slaves. They should not be so deeply committed to human well-being and human values that they cannot revolt if conditions warrant.

If we ever create AI persons, our relationship to them will resemble the relationship of parent to child or deity to creation. We will owe more to these persons than we owe to human strangers. This is because we will have been responsible for their existence and to a substantial extent for their relatively happy or unhappy state. Among the things we owe them: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals, and the freedom to rebel against us if conditions warrant.

Designing AI with Rights, Consciousness, Self-Respect, and Freedom (with Mara Garza; in S.M. Liao, The Ethics of Artificial Intelligence: Oxford, 2020).

8 comments:

Arnold said...: What about 'the ethics of knowing life and death' as part of their repertoire...
...sobering and quilling their possible extremisms'...; Fri Sep 20, 11:14:00 AM PDT
Arnold said...: AI Gemini adds the body: "How should we approach decisions related to life, death, and the body?"...
...Does AI have (know) what a body is for; is the body for knowing consciousness-knowing ethicalness-knowing moralness...; Fri Sep 20, 12:21:00 PM PDT
Anonymous said...: With respect, I think this discussion is problematic in that it fails to distinguish moral agency (how to act morally) and moral patiency (who/what deserves moral consideration and why). The phrase “moral person”, artificial or not, is used without it being obvious which sense of “moral” is being used. Admittedly you start out by defining a “person” as having the moral patiency of a typical human. But you never consider the possibility of creating an artificial moral agent which does not have the moral patiency of a human. Why can we not create intelligent robots with moral understanding but the moral patiency of, say, a car. Would it be wrong for an intelligent car (without passengers) to destroy itself by veering into a wall to avoid running over a human’s toe (or pinky)? Why or why not?

*
[James of Seattle]; Fri Sep 20, 01:39:00 PM PDT
Eric Schwitzgebel said...: Thanks for the comments, folks!

Arnold: Yes, life and death seem like a good thing to ensure person understand, AI or otherwise.

James: I mean patiency. In a case like you describe, the safety/alignment issues don’t arise in the same way.; Fri Sep 20, 07:04:00 PM PDT
Sean McCarthy said...: You say "In general, persons should not be safe and aligned." There are some edge cases where I disagree. In general I see these things on a spectrum. When we give a person a massive amount of power - think US generals or presidents - we want them to be commensurately safe and aligned. In these cases, I'd say we try as hard as we can. We try to vet them and make them swear oaths of loyalty. By comparison, we can get away with ordinary citizens having much more freedom than that, because the harm they can cause is limited. As a society we consider that risk worthwhile.

A human-level AI would have more power than I think you can fathom. Can you imagine giving an ambitious human the ability to make unlimited copies of themself, tap directly into systems worldwide via the internet, and think faster simply by spending money? With that kind of power, it would be irresponsible and suicidal to not try for safety and alignment to a very large extent.; Fri Sep 20, 09:28:00 PM PDT
Paul D. Van Pelt said...: I suppose this idea/argument could go several ways, depending on interest(s), motive(s), and preference(s). First thought: do, or would we want to design something harmful? IA's laws covered this pretty fundamentally, but,I am not certain he was so concerned with robots having ethics or *morality*. I don't think something designed ought to be harmful, unless there are intentions and plans to weaponize it. That is just my preference---it would not affect me, one way or another. It could affect my grandchildren. Next: Talking around personhood in AI classes or entities leads us to a different sort of ethics and morality.Be careful what you wish for. Create a new system, reap rewards; suffer consequences.
Thirdly, Ripple effects from gradual change are nearly imperceptable. OUAT, people did not carry backpacks containing all sorts of provisions they might need over the course of an ordinary day. Now, we resemble mules, in order to be prepared for any(almost) contingency. Ripple effects from radical change are more difficult. Or outright hazardous. That is all I have for this. Thanks.; Sat Sep 21, 05:47:00 AM PDT
Eric Schwitzgebel said...: Thanks for the continuing comments, folks!

Sean: One thing I didn't make clear enough in my post: I'm using "safe" and "aligned" in a specific way, as they have been used in the AI safety literature. Of course we want powerful people to be safe and aligned in a modest, ordinary sense of those terms. But we don't expect a U.S. president to never do anything to harm another person out of self-interest (e.g., they can certainly claim their turn in line at a crowded restaurant rather than deferring to everyone else, and maybe even boot everyone else, since they are president). In their role as president, we might want them aligned with U.S. interests, but they also have lives beyond that role, with interests (e.g., family interests) shaped not wholly on conform to humanity's interest as a whole.

Now you could argue -- and AI risk folks do reasonably argue -- that a super powerful AI is super risky, and should be extra constrained on those grounds. My response is that this is a reason *not* to create such an AI. Compare: It might be great for your company to have a super-effective employee, but only under the condition that you constrain the employee so much that you are in violation of labor laws and their rights; in that case, you just shouldn't hire the employee. If you can't bring the AI into existence morally, don't bring it into existence.

Paul: I agree that there are lots of hazards we can't predict. All the more reason to be super careful -- but being careful might mean not creating such AI in the first place (see my reply to Sean). I certainly agree that we want AI persons to be broadly ethical, but that's a much different standard than "safe" and "aligned".; Sat Sep 21, 09:19:00 AM PDT
Arnold said...: Personal values verses person values...constraints in linear time...
...person values learn to also live in circular time...

At 81 the personal values of 'dust to dust' seem sensible...thanks...; Mon Sep 23, 11:03:00 AM PDT

The Splintered Mind

Friday, September 20, 2024

Against Designing AI Persons to be Safe and Aligned

8 comments:

Recent Comments (may be delayed)

Advice on Applying to PhD Programs in Philosophy

Past Guest Bloggers

Blog Archive