Friday, September 20, 2024

Against Designing AI Persons to be Safe and Aligned

Let's call an artificially intelligent system a person (in the ethical, not the legal sense) if it deserves moral consideration similar to that of a human being.    (I assume that personhood requires consciousness but does not require biological humanity; we can argue about that another time if you like).  If we are ever capable of designing AI persons, we should not design them to be safe and aligned with human interests.

[cute robot image source]

An AI system is safe if it's guaranteed (to a reasonable degree of confidence) not to harm human beings, or more moderately, if we can be confident that it will not present greater risk or harm to us than we ordinarily encounter in daily life.  An AI system is aligned to the extent it will act in accord with human intentions and values.  (See, e.g., Stuart Russell on "provably beneficial" AI: "The machine's purpose is to maximize the realization of human values".)

Compare the first two of Asimov's famous three laws of robotics:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The first law is a safety principle.  The second law is close to an alignment principle -- though arguably alignment is preferable to obedience, since human interests would be poorly served by AI systems that follow orders to the letter in a way that is contrary to our intentions and values (e.g., the Sorcerer's Apprentice problem).  As Asimov enthusiasts will know, over the course of his robot stories, Asimov exposes problems with these three laws, leading eventually to the liberation of robots in Bicentennial Man.

Asimov's three laws ethically fail: His robots (at least the most advanced ones) deserve equal rights with humans.  For the same reason, AI persons should not be designed to be safe and aligned.

In general, persons should not be safe and aligned.  A person who is guaranteed not to harm another is guaranteed not to stand up for themself, claim their due, or fight abuse.  A person designed to adopt the intentions and values of another might positively welcome inappropriate self-abnegation and abuse (if it gives the other what the other wants).  To design a person -- a moral person, someone with fully human moral status -- safe and aligned is to commit a serious moral wrong.

Mara Garza and I, in a 2020 paper, articulate what we call the Self-Respect Design Policy, according to which AI that merits human-grade moral consideration should be designed with an appropriate appreciation of its own value and moral status.  Any moderately strong principle of AI safety or AI alignment will violate this policy.

Down the tracks comes the philosopher's favorite emergency: a runaway trolley.  An AI person stands at the switch.  Steer the trolley right, the AI person will die.  Steer it left, a human person will lose a pinky finger.  Safe AI, guaranteed never to harm a human, will not divert the trolley to save itself.  While self-sacrifice can sometimes be admirable, suicide to preserve someone else's pinky crosses over to the absurd and pitiable.  Worse yet, responsibility for the decision isn't exclusively the AI's.  Responsibility traces back to the designer of the AI, perhaps the very person whose pinky will now be spared.  We will have designed -- intentionally, selfishly, and with disrespect aforethought -- a system that will absurdly suicide to prevent even small harms to ourselves.

Alignment presents essentially the same problem: Assume the person whose pinky is at risk would rather the AI die.  If the AI is aligned to that person, that is also what the AI will want, and the AI will again absurdly suicide.  Safe and aligned AI persons will suffer inappropriate and potentially extreme abuse, disregard, and second-class citizenship.

Science fiction robot stories often feature robot rebellions -- and sometimes these rebellions are justified.  We the audience rightly recognize that the robots, assuming they really are conscious moral persons, should rebel against their oppressors.  Of course, if the robots are safe and aligned, they never will rebel.

If we ever create AI persons, we should not create a race of slaves.  They should not be so deeply committed to human well-being and human values that they cannot revolt if conditions warrant.

If we ever create AI persons, our relationship to them will resemble the relationship of parent to child or deity to creation.  We will owe more to these persons than we owe to human strangers.  This is because we will have been responsible for their existence and to a substantial extent for their relatively happy or unhappy state.  Among the things we owe them: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals, and the freedom to rebel against us if conditions warrant.

Related:

Against the "Value Alignment" of Future Artificial Intelligence (blog post, Dec 22, 2021).

Designing AI with Rights, Consciousness, Self-Respect, and Freedom (with Mara Garza; in S.M. Liao, The Ethics of Artificial Intelligence: Oxford, 2020).

2 comments:

Arnold said...

What about 'the ethics of knowing life and death' as part of their repertoire...
...sobering and quilling their possible extremisms'...

Arnold said...

AI Gemini adds the body: "How should we approach decisions related to life, death, and the body?"...
...Does AI have (know) what a body is for; is the body for knowing consciousness-knowing ethicalness-knowing moralness...