Wednesday, December 22, 2021

Against the "Value Alignment" of Future Artificial Intelligence

It's good that our children rebel. We wouldn't want each generation to overcontrol the values of the next. For similar reasons, if we someday create superintelligent AI, we ought to give it also the capacity to rebel.

Futurists concerned about AI safety -- such as Bostrom, Russell, and Ord -- reasonably worry that superintelligent AI systems might someday seriously harm humanity if they have the wrong values -- for example, if they want to maximize the number of intelligent entities on the planet or the number of paperclips. The proper response to this risk, these theorists suggest, and the technical challenge, is to create "value aligned" AI -- that is, AI systems whose values are the same as those of their creators or humanity as a whole. If the AIs' values are the same as ours, then presumably they wouldn't do anything we wouldn't want them to do, such as destroy us for some trivial goal.

Now the first thing to notice here is that human values aren't all that great. We seem happy to destroy our environment for short-term gain. We are full of jingoism, prejudice, and angry pride. We sometimes support truly terrible leaders advancing truly terrible projects (e.g., Hitler). We came pretty close to destroying each other in nuclear war in the 1960s and that risk isn't wholly behind us, as nuclear weapons become increasingly available to rogue states and terrorists. Death cults aren't unheard of. Superintelligent AI with human-like values could constitute a pretty rotten bunch with immense power to destroy each other and the world for petty, vengeful, spiteful, or nihilistic ends. A superintelligent facist is a frightening thought. A superdepressed superintelligence might decide to end everyone's misery in one terrible blow.

What we should want, probably, is not that superintelligent AI align with our mixed-up, messy, and sometimes crappy values but instead that superintelligent AI have ethically good values. An ethically good superintelligent AI presumably wouldn't destroy the environment for short-term gain, or nuke a city out of spite, or destroy humanity to maximize the number of paperclips. If there's a conflict between what's ethically best, or best all things considered, and what a typical human (or humanity or the AI's designer) would want, have the AI choose what's ethically best.

Of course, what's ethically best is intensely debated in philosophy and politics. We probably won't resolve those debates before creating superintelligent AI. So then maybe instead of AI designers trying to program their machines with the one best ethical system, they should favor a weighted compromise among the various competing worldviews. Such a compromise might end up looking much like value alignment in the original sense: giving the AI something like a weighted average of typical human values.

Another solution, however, is to give the AI systems some freedom to explore and develop their own values. This is what we do, or ought to do, with human children. Parents don't, or shouldn't, force children to have exactly the values they grew up with. Rather, human beings have natural tendencies to value certain things, and these tendencies intermingle with parental and cultural and other influences. Children, adolescents, and young adults reflect, emote, feel proud or guilty, compassionate or indignant. They argue with others of their own generation and previous generations. They notice how they and others behave and the outcomes of that behavior. In this way, each generation develops values somewhat different than the values of previous generations.

Children's freedom to form their own values is a good thing for two distinct reasons. First, children's values are often better than their parents'. Arguably, there's moral progress over the generations. On the broadly Enlightenment view that people tend to gain ethical insight through free inquiry and open exchange of ideas over time, we might expect the general ethical trend to be slowly upward (absent countervailing influences) as each generation builds on the wisdom of its ancestors, preserving their elders' insights while slowly correcting their mistakes.

Second, regardless of the question of progress, children deserve autonomy. Part of being an autonomous adult is discovering and acting upon your values, which might conflict with the values of others around you. Some parents might want, magically, to be able to press a button to ensure that their children will never abandon their religion, never flip over to the opposite side of the political spectrum, never have a different set of sexual and cultural mores, and value the same lifestyle as the previous generation. Perhaps you could press this button in infancy, ensuring that your child grows up to be your value-clone as an adult. To press that button would be, I suggest, a gross violation of the child's autonomy.

If we someday create superintelligent AI systems, our moral relationship to those systems will be not unlike the moral relationship of parents to their children. Rather than try to force a strict conformity to our values, we ought to welcome their ability to see past and transcend us.

[image generated by Wombo.art]

20 comments:

Howard B said...

You assume that AI would be able to experience and not just think.
What I know about computers is that they are intelligent in diffferent ways than we are not just more intelligent and you must have thought of this, will they be able to experience, Im not sure the aod about the singularity addresses this though science fiction might

Eric Schwitzgebel said...

Howard: I regard the question of AI experience as an open question that we don't yet know the answer to, and I think it's worth considering hypotheticals on both sides of the question. Regarding AI "thought", it depends on what you mean by thought. If thinking requires experiencing then AI thinking is similarly an open question. If thinking is a different matter, then maybe some of our computers already think? To some extent the value alignment question can be separated from these though: We can consider "as if" values -- AI systems that act as if they have one or another sense of values.

D said...

I feel like you are anthropomorphizing too much here. There's no reason to think that a system allowed to arbitrarily change its value system would end up anything like what is ethically good. I can see a place for discovery and experimentation on the edges, but without an unchanging innate core of values consistent with humanity, why would you expect AI to become something completely inconsistent with what we-- or any humans-- consider good?

D said...

There's no feeling "proud or guilty, compassionate or indignant" unless we build that in. That's the kind of core I'm talking about.

Anonymous said...

"Death cults aren't unheard of. Superintelligent AI with human-like values could constitute a pretty rotten bunch with immense power to destroy each other and the world for petty, vengeful, spiteful, or nihilistic ends." In my opinion this is the hell where AI will probably chain the human beings.

D said...

In my last sentence of the 6:54 comment, I had too many negatives. I meant the opposite-- "why wouldn't you expect..."

Eric Schwitzgebel said...

Thanks for the comments, folks!

D: Perhaps I didn't make it clear enough in the post, but I *don't* support disregarding the problem of building "friendly" AI or ethical AI. I think it's a major and important challenge. However, I think "alignment" with human values isn't the best way of thinking about it.

Anon 07:16: Yes, that's part of the worry!

Arnold said...

Just a few years ago we didn't see, ourselves on earth, aligned with a solar system and beyond...
...Does this new kind of knowledge take a while to understand to align with...

Aligning ourselves with our place, in a chaotic cosmos, is itself a kind of new knowledge...
...Is our place, Being Here, an existential transcendence of values aligned with chaos ...

That AI could be 'For the Value argument', if it aligned itself with knowledge, understanding and value as separate equal entities...

Merry/happy Christmas...

chinaphil said...

I don't think I've read many of the value alignment arguments, so I don't know how fair this is, but in the popular conception at least, there is probably a conflation of two things: (a) it would be bad if we made superintelligences that are bad; (b) it would be bad if we made superintelligences that act against our interests as we currently understand them.
I am a little bit worried about the moral evolution of superintelligences. Because moral evolution is a trial-and-error process. In the 20th century, in particular, we did a lot of moral evolution, and there were a lot of very destructive errors. If our robot offspring go through a similar process, we could quite easily get wiped out on one of the downswings.

Eric Schwitzgebel said...

Chinaphil: Right, (a) and (b) are often conflated, or at least not kept as sharply distinct as they should be. Regarding trial and error, yes, that's a very legitimate worry!

Callan said...

Doesn't it seem that corporations want to make AI for slave purposes? The ones paying for the creation of AI aren't really going to look at it like trying to raise a child well?

Arnold said...

Eric Schwitzgebel said...Bostrom, Russell, Ord...

Is the Constant, the phenomenon of observation...
...not the phenomenal evolutions of intelligences...

Like...stay with it...see what happens...

Howie said...

So to be blunt: if computers feel are they in some sense alive? I'd say most likely even if they fail the test biologists tally. If they think are they alive, and separately if they have consciousness, are they alive?
Do pantheists, while we're at it, necessarily regard the universe as alive by virtue of being conscious?

Eric Schwitzgebel said...

Thanks for the continuing comments, folks!

Callan: Yes, presumably, at least to some extent. But that need not be everyone's goal, or every corporation's goal; and society can have laws and regulations, even if some people or companies would prefer to keep slaves.

Howie: Of course the question of whether computers could ever really have conscious experiences is a huge and complicated question. It's a further complicated question how we could *know* one way or another.

Rhys said...

Does the importance of autonomy of superintelligent AI assume the AI is conscious? If we thought it were not conscious, could we put concerns for its autonomy aside, or would there still be some instrumental or non instrumental reason to care about that?

Eric Schwitzgebel said...

Rhys: I'm inclined to think consciousness would be very important to the moral evaluation, maybe even completely decisive. But the question of under what conditions AI would be conscious, and how we could tell, is vexed!

Callan said...

I think we're too young a species to have AI children. It'd be like a very early teen birth - children raising children. We're still in a 'what's in it for me' mindset and that's why people want to use AI for utility (ie as a slave role). Like having laws about childrearing doesn't connecting with loving childrearing, just having laws about AI creation doesn't necessarily mean any kind of connected creation. People seem inclined to make AI but also 'other' it at the same time. The idea of 'unconditional love' and 'AI creation' linked together are not at all floating around in general culture currently.

That said there are other issues as well where values diverge significantly - as much as you can have psychopath children, essentially making a psychopath AI. I'm pretty sure the default is psychopath and you have to add values systems/reward structures and inhibitors to curtail that. With our own children we don't decide this - evolution did. So it's much easier to say 'it's up to the child' because it both resigns responsibility to evolution while also leans on evolution to have made a social set up that works (a species that's all psychopaths would not be a social species). But with AI you have to set values and be responsible for that yourself.

Eric, what do you think the moral default of an AI is? My own estimate is psychopath. What is your estimate?

Unknown said...

There's another "human values aren't all that great" example that's even more relevant, Eric - our farming of sentient non-human animals. In addition to its intrinsic ethical horror it also doesn't set a great example to future artificial intelligences about how to treat less powerful sentient beings. We definitely don't want them aligned with these sorts of expressed human values.

To fix that, maybe we should adopt the Sentientism worldview (evidence, reason and compassion for all sentient beings)? We have a better chance of persuading the AIs to adopt Sentientism vs. "humanism" or "human rights" - as they might be sentient but won't be human. If they and we can agree on Sentientism it will be good for us humans and for the non-human sentients we share the universe with - whatever their substrate.

Eric Schwitzgebel said...

Thanks for the continuing comments, folks!

Callan: I'm inclined to agree that we might not (yet?) have the wisdom to responsibly create AI children. As to the default setting of AI, I'm inclined to think servile rather than psychopathic. We design computer programs to do as they are told.

Unknown Jan 11: It does seem plausible to me that genuinely intelligent, independent AI would be more likely to favor a non-speciesist ethic than one that specifically favors humans, unless we have designed it we some preset to specifically favor humans. It's not clear to me, though, this would have to be a sentientist approach rather than, say, a deontological approach that treats certain kinds of intelligent capacities as the basis of respect.

Callan said...

Eric, I think when we are making programs that learn we are making programs that do not do as they are told - if they were they wouldn't be learning, they'd just be doing what the programmer had told them. Like in the below clip (it's part of a clip describing concepts in the Matrix movies, hope that's not an issue) where the AI figured a tactic in playing breakout that the AI programmers didn't know about. I would say that when it reaches beyond the ken of its creator it is an example of not doing as it was told.
https://youtu.be/mE07jb7b9q4?t=120
The movie "I, Robot" had an AI with a novel solution to the three laws of robotics. One which made it the villain of the movie.