## Friday, October 27, 2023

### Utilitarianism and Risk Amplification

A thousand utilitarian consequentialists stand before a thousand identical buttons.  If any one of them presses their button, ten people will die.  The benefits of pressing the button are more difficult to estimate.  Ninety-nine percent of the utilitarians rationally estimate that fewer than ten lives will be saved if any of them presses a button.  One percent rationally estimate that more than ten lives will be saved.  Each utilitarian independently calculates expected utility.  Since ten utilitarians estimate that more lives will be saved than lost, they press their buttons.  Unfortunately, as the 99% would have guessed, fewer than ten lives are saved, so the result is a net loss of utility.

This cartoon example illustrates what I regard as a fundamental problem with simple utilitarianism as decision procedure: It deputizes everyone to act as risk-taker for everyone else.  As long as anyone has both (a.) the power and (b.) a rational utilitarian justification to take a risk on others' behalf, then the risk will be taken, even if a majority would judge the risk not to be worth it.

Consider this exchange between Tyler Cowen and Sam Bankman-Fried (pre-FTX-debacle):

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.

COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.

There are, I think, two troubling things about Bankman-Fried's reasoning here.  (Probably more than two, but I'll restrain myself.)

First is the thought that it's worth risking everything valuable for a small chance of a huge gain.  (I call this the Black Hole Objection to consequentialism.)

Second, I don't want Sam Bankman-Fried making that decision.  That's not (just) because of who in particular he is.  I wouldn't want anyone making that decision -- at least not unless they were appropriately deputized with that authority through an appropriate political process, and maybe not even then.  No matter how rational and virtuous you are, I don't want you deciding to take risks on behalf of the rest of us simply because that's what your consequentialist calculus says.  This issue subdivides into two troubling aspects: the issue of authority and the issue of risk amplification.

The authority issue is: We should be very cautious in making decisions that sacrifice others or put them at high risk.  Normally, we should do so only in constrained circumstances where we are implicitly or explicitly endowed with appropriate responsibility.  Our own individual calculation of high expected utility (no matter how rational and well-justified) is not normally, by itself, sufficient grounds for substantially risking or harming others.

The risk amplification issue is: If we universalize utilitarian decision-making in a way that permits many people to risk or sacrifice others whenever they reasonably calculate that it would be good to do so, we render ourselves collectively hostage to whomever has the most sacrificial reasonable calculation.  That was the point illustrated in the opening scenario.

[Figure: Simplified version of the opening scenario.  Five utilitarians have the opportunity to sacrifice five people to save an unknown number of others.  The button will be pressed by the utilitarian whose estimate errs highest.  Click to enlarge and clarify.]

My point is not that some utilitarians might be irrationally risky, though certainly that's a concern.  Rather, my point is that even if all utilitarians are perfectly rational, if they differ in their assessments of risk and benefit, and if all it takes to trigger a risky action is one utilitarian with the power to choose that action, then the odds of a bad outcome rise dramatically.

Advocates of utilitarian decision procedures can mitigate this problem in a few ways, but I'm not seeing how to escape it without radically altering the view.

First, a utilitarian could adopt a policy of decision conciliationism -- that is, if you see that most others aren't judging the risk or cost worth it, adjust your own assessment of the benefits and likelihoods, so that you fall in line with the majority.  However, strong forms of conciliationism are pretty radical in their consequences; and of course this only works if the utilitarians know that there are others in similar positions deciding differently.

Second, a utilitarian could build some risk aversion and loss aversion into their calculus.  This might be a good idea on independent grounds.  Unfortunately, aversion corrections only shift the weights around.  If the anticipated gains are sufficiently high, as judged by the most optimistic rational utilitarian, they will outweigh any discounts due to risk or loss aversion.

Third, they could move to rule utilitarianism: Endorse some rule according to which you shouldn't generally risk or sacrifice others without the right kind of authority.  Plausibly, the risk amplification argument above is exactly the sort of argument that might a motivate a utilitarian to adopt rule utilitarianism as a decision procedure rather than trying to evaluate the consequences of each act individually.  That is, it's a utilitarian argument in favor of not always acting according to utilitarian calculations.  However, the risk amplification and authority problems are so broad in scope (even with appropriate qualifications) that moving to rule utilitarianism to deal with them is to abandon act utilitarianism as a general decision procedure.

Of course, one could also design scenarios in which bad things happen if everyone is a rule-following deontologist!  Picture a thousand "do not kill" deontologists who will all die unless one of them kills another.  Tragedy.  We can cherry-pick scenarios in which any view will have unfortunate results.

However, I don't think my argument is that unfair.  The issues of authority and risk amplification are real problems for utilitarian decision procedures, as brought out in these cartoon examples.  We can easily imagine, I think, a utilitarian Robespierre, a utilitarian academic administrator, Sam Bankman-Fried with his hand on the destroy-or-duplicate button, calculating reasonably, and too easily inflicting well-intentioned risk on the rest of us.

## Friday, October 20, 2023

### Gunkel's Criticism of the No-Relevant-Difference Argument for Robot Rights

In a 2015 article, Mara Garza and I offer the following argument for the rights of some possible AI systems:

Premise 1: If Entity A deserves some particular degree of moral consideration and Entity B does not deserve that same degree of moral consideration, there must be some relevant difference between the two entities that grounds this difference in moral status.

Premise 2: There are possible AIs who do not differ in any such relevant respects from human beings.

Conclusion: Therefore, there are possible AIs who deserve a degree of moral consideration similar to that of human beings.

The argument is, we think, appealingly minimalist, avoiding controversial questions about the grounds of moral status.  Does human-like moral status require human-like capacity for pain or pleasure (as classical utilitarians would hold)?  Or human-like rational cognition, as Kant held?  Or the capacity for human-like varieties of flourishing?  Or the right types of social relations?

The No-Relevant-Difference Argument avoids these vexed questions, asserting only that whatever grounds moral status can be shared between robots and humans.  This is not an entirely empty claim about the grounds of moral status.  For example, the argument commits to denying that membership in the species Homo sapiens, or having a natural rather than artificial origin, is required for human-like moral status.

Compare egalitarianism about race and gender.  We needn't settle tricky questions about the grounds of moral status to know that all genders and races deserve similar moral consideration!  We need only know this: Whatever grounds moral status, it's not skin color, or possession of a Y chromosome, or any of the other things that might be thought to distinguish among the races or genders.

Garza and I explore four arguments for denying Premise 2 -- that is, for thinking that robots would inevitably differ from humans in some relevant respect.  We call these the objections from Psychological Difference, Duplicability, Otherness, and Existential Debt.  Today, rather than discussing Premise 2, I want to discuss David Gunkel's objection to our argument in his just-released book, Person, Thing, Robot.

[Image of Ralph and Person, Thing, Robot.  Ralph is a sculpture designed to look like an old-fashioned robot, composed of technological junk from the mid-20th century (sculptor: Jim Behrman).  I've named him after my father, whose birth name was Ralph Schwitzgebel.  My father was also a tinkerer and artist with technology from that era.]

Gunkel acknowledges that the No-Relevant-Difference Argument "turns what would be a deficiency... -- [that] we cannot positively define the exact person-making qualities beyond a reasonable doubt -- into a feature" (p. 91).  However, he objects as follows:

The main difficulty with this alternative, however, is that it could just as easily be used to deny human beings access to rights as it could be used to grant rights to robots and other nonhuman artifacts.  Because the no relevant difference argument is theoretically minimal and not content dependent, it cuts both ways.  In the following remixed version, the premises remain intact; only the conclusion is modified.

Premise 1: If Entity A deserves some particular degree of moral consideration and Entity B does not deserve that same degree of moral consideration, there must be some relevant difference between the two entities that grounds this difference in moral status.
Premise 2: There are possible AIs who do not differ in any such relevant respects from human beings.
Conclusion: Therefore, there are possible human beings who, like AI systems, do not deserve moral consideration.

In other words, the no relevant difference argument can be used either to argue for an extension of rights to other kinds of entities, like AI systems, robots, and artifacts, or, just as easily, to justify dehumanization, reification of human beings, and the exclusion and/or marginalization of others (p. 91-92, italics added).

This is an interesting objection.  However, I reject the appropriateness of the repeated phrase "just as easily", which I have italicized in the block quote.

----------------------------------------------------------------

As the saying goes, one person's modus ponens is another's modus tollens.  Suppose you know that A implies BModus ponens is an inference rule which assumes the truth of A and concludes that B must also be true.  Modus tollens is an inference rule which assumes the falsity of B and concludes that A must also be false.  For example, suppose you can establish that if anyone stole the cookies, it was Cookie Monster.  If you know that the cookies were stolen, modus ponens unmasks Cookie Monster as the thief.  If, on the other hand, you know that Cookie Monster has committed no crimes, modus tollens assures you that the cookies remain secure.

Gunkel correctly recognizes that the No Relevant Difference Argument can be reframed as a conditional: Assuming that human X and robot Y are similar in all morally relevant respects, then if human X deserves rights so also does robot Y.  This isn't exactly how Garza and I frame the argument -- our framing implicitly assumes that there is a standard level of moral consideration for human beings in general -- but it's a reasonable adaptation for someone wants to leave open the possibility that different humans deserve different levels of moral consideration.

In general, the plausibility of modus ponens vs modus tollens depends on the relative security of A vs not-B.  If you're rock-solid sure the cookies were stolen and have little faith in Cookie Monster's crimelessness, then ponens is the way to go.  If you've been tracking Cookie all day and know for sure he couldn't have committed a crime, then apply tollens.  The "easiness", so to speak, of ponens vs. tollens depends on one's confidence in A vs. not-B.

Few things are more secure in ethics than at least some humans deserve substantial moral consideration.  This gives us the rock-solid A that we need for modus ponens.  As long as we are not more certain all possible robots would not deserve rights than that some humans do deserve rights, modus ponens will be the correct move.  Ponens and tollens will not be equally "easy".

Still, Gunkel's adaptation of our argument does reveal a potential for abuse, which I had not previously considered, and which I thank him for highlighting.  Anyone who is more confident that robots of a certain sort are undeserving of moral consideration than they are of the moral considerability of some class of humans could potentially combine our No Relevant Difference principle with an appeal to the supposed robotlikeness of those humans to deny rights to those humans.

I don't think the No Relevant Difference principle warrants skepticism on those grounds.  Compare application of a principle like "do unto others as you would have them do unto you".  Although one could in principle reason "I want to punch him in the nose, so I guess I should punch myself in the nose", the fact that some people might potentially run such a tollens reveals more about their minor premises than it does about the Golden Rule.

I hope that such an abuse of the principle would be in any case rare.  People who want to deny rights to subgroups of humans will, I suspect, be motivated by other considerations, and appealing to those people's putative "robotlikeness" would probably be only an afterthought or metaphor.  Almost no one, I suspect, will be on the fence about the attribution of moral status to some group of people and then think, "whoa, now that I consider it, those people are like robots in every morally relevant respect, and I'm sure robots don't deserve rights, so tollens it is".  If anyone is tempted by such reasoning, I advise them to rethink the path by which they find themselves with that peculiar constellation of credences.

## Thursday, October 12, 2023

### Strange Intelligence, Strange Philosophy

AI intelligence is strange -- strange in something like the etymological sense of external, foreign, unfamiliar, alien.  My PhD student Kendra Chilson (in unpublished work) argues that we should discard the familiar scale of subhuman → human-grade → superhuman.  AI systems do, and probably will continue to, operate orthogonally to simple scalar understandings of intelligence modeled on the human case.  We should expect them, she says, to be and remain strange intelligence[1] -- inseparably combining, in a single package, serious deficits and superhuman skills.  Future AI philosophers will, I suspect, prove to be strange in this same sense.

Most readers are probably familiar with the story of AlphaGo, which in 2016 defeated the world champion player of the game of go.  Famously, in the series of matches (which it won 4-1), it made several moves that human go experts regarded as bizarre -- moves that a skilled human go player would never have made, and yet which proved instrumental in its victory -- while also, in its losing match, making some mistakes characteristic of simple computer programs, which go experts know to avoid.

Similarly, self-driving cars are in some respects better and safer drivers than humans, while nevertheless sometimes making mistakes that few humans would make.

Large Language Models have stunning capacity to swiftly create competent and even creative texts on a huge breadth of topics, while still failing conspicuously in some simple common sense tasks. they can write creative-seeming poetry and academic papers, often better than the average first-year university student.  Yet -- borrowing an example from Sean Carroll -- I just had the following exchange with GPT-4 (the most up-to-date version of the most popular large language model):
GPT-4 seems not to recognize that a hot skillet will be plenty cool by the next day.

I'm a "Stanford school" philosopher of science.  Core to Stanford school thinking is this: The world is intractably complex; and so to deal with it, we limited beings need to employ simplified (scientific or everyday) models and take cognitive shortcuts.  We need to find rough patterns in go, since we cannot pursue every possible move down every possible branch.  We need to find rough patterns in the chaos of visual input, guessing about the objects around us and how they might behave.  We need quick-and-dirty ways to extract meaning from linguistic input in the swift-moving world, relating it somehow to what we already know, and producing linguistic responses without too much delay.  There will be different ways of building these simplified models and implementing these shortcuts, with different strengths and weaknesses.  There is rarely a single best way to render the complexity of the world tractable.  In psychology, see also Gigerenzer on heuristics.

Now mix Stanford school philosophy of science, the psychology of heuristics, and Chilson's idea of strange intelligence.  AI, because it is so different from us in its underlying cognitive structure, will approach the world with a very different set of heuristics, idealizations, models, and simplifications than we do.  Dramatic outperformance in some respects, coupled with what we regard as shockingly stupid mistakes in others, is exactly what we should expect.

If the AI system makes a visual mistake in judging the movement of a bus -- a mistake (perhaps) that no human would make -- well, we human beings also make visual mistakes, and some of those mistakes, perhaps, would never be made by an AI system.  From an AI perspective, our susceptibility to the Muller-Lyer illusion might look remarkably stupid.  Of course, we design our driving environment to complement our vision: We require headlights, taillights, marked curves, lane markers, smooth roads of consistent coloration, etc.  Presumably, if society commits to driverless cars, we will similarly design the driving environment to complement their vision, and "stupid" AI mistakes will become rarer.

I want to bring this back to the idea of an AI philosopher.  About a year and a half ago, Anna Strasser, Matthew Crosby, and I built a language model of philosopher Daniel Dennett.  We fine-tuned GPT-3 on Dennett's corpus, so that the language model's outputs would reflect a compromise between the base model of GPT-3 and patterns in Dennett's writing.  We called the resulting model Digi-Dan.  In a study collaborative with my son David, we then posed philosophical questions to both Digi-Dan and the actual Daniel Dennett.  Although Digi-Dan flubbed a few questions, overall it performed remarkably well.  Philosophical experts were often unable to distinguish Digi-Dan's answers from Dennett's own answers.

Picture now a strange AI philosopher -- DigiDan improved.  This AI system will produce philosophical texts very differently than we do.  It need not be fully superhuman in its capacities to be interesting.  It might even, sometimes, strike us as remarkably, foolishly wrong.  (In fairness, other human philosophers sometimes strike me the same way.)  But even if subhuman in some respects, if this AI philosopher also sometimes produces strange but brilliant texts -- analogous to the strange but brilliant moves of AlphaGo, texts that no human philosopher would create but which on careful study contain intriguing philosophical moves -- it could be a philosophical interlocutor of substantial interest.

Philosophy, I have long argued, benefits from including people with a diversity of perspectives.  Strange AI might also be appreciated as a source of philosophical cognitive diversity, occasionally generating texts that contain sparks of something genuinely new, different, and worthwhile that would not otherwise exist.

------------------------------------------------
[1] Kendra Chilson is not the first to use the phrase "strange intelligence" with this meaning in an AI context, but the usage was new to me; and perhaps through her work it will catch on more widely.

## Thursday, October 05, 2023

### Skeletal vs Fleshed-Out Philosophy

All philosophical views are to some degree skeletal. By this, I mean that the details of their application remain to some extent open. This is true of virtually any formal system: Even the 156-page rule handbook for golf couldn't cover every eventuality: What if the ball somehow splits in two and one half falls in the hole? What if an alien spaceship levitates the ball for two seconds as it's arcing through the air? (See the literature on "open textured" statements.)

Still, some philosophical views are more skeletal than others. A bare statement like "maximize utility" is much more skeletal, much less fleshed out, than a detailed manual of utilitarian consequentialist advice. Today, I want to add a little flesh to the skeletal vs. fleshed-out distinction. Doing so will, I hope, help clarify some of the value of trying to walk the walk as an ethicist. (For more on walking the walk, see last month's posts here and here.)

[Midjourney rendition of a person and a skeleton talking philosophy, against a background of stars]

Using "maximize utility" as an example, let's consider sources of linguistic, metaphysical, and epistemic openness.

Linguistic: What does "utility" mean, exactly? Maybe utility is positively valenced conscious experiences. Or maybe utility is welfare or well-being more broadly construed. What counts as "maximizing"? Is it a sum or a ratio? Is the scope truly universal -- for all entities in the entire cosmos over all time, or is it limited in some way (e.g., to humans, to Earth, to currently existing organisms)? Absent specification (by some means or other), there will be no fact of the matter whether, say, two acts with otherwise identical results, but one of which also slightly improves the knowledge (but not happiness) of one 26th-century Martian, are equally choiceworthy according to the motto.

Metaphysical: Consider a broad sense of utility as well-being or flourishing. If well-being has components that are not strictly commensurable -- that is, which cannot be precisely weighed against each other -- then the advice to maximize utility leaves some applications open. Plausibly, experiencing positive emotions and achieving wisdom (whatever that is, exactly) are both part of flourishing. While it might be clear that a tiny loss of positive emotion is worth trading off for a huge increase in wisdom and vice versa, there might be no fact of the matter exactly what the best tradeoff ratio is -- and thus, sometimes, no fact of the matter whether someone with moderate levels of positive emotion and moderate levels of wisdom has more well-being than someone with a bit less positive emotion and a bit more wisdom.

Epistemic: Even absent linguistic and metaphysical openness, there can be epistemic openness. Imagine we render the utilitarian motto completely precise: Maximize the total sum of positive minus negative conscious experiences for all entities in the cosmos in the entire history of the cosmos (and whatever else needs precisification). Posit that there is always an exact fact of the matter how to weigh competing goods in the common coin of utility and there are never ties. Suppose further that it is possible in principle to precisely specify what an "action" is, individuating all the possible alternative actions at each particular moment. It should then always be the case that there is exactly one action you could do that would "maximize utility". But could you know what this action is? That's doubtful! Every action has a huge number of non-obvious consequences. This is ignorance; but we can also think of it as a kind of openness, to highlight its similarity to linguistic and metaphysical openness or indeterminacy. The advice "maximize utility", however linguistically and metaphysically precise, leaves it still epistemically open what you should actually do.

Parallel remarks apply to other ethical principles: "Act on that maxim that you can will to be a universal law", "be kind", "don't discriminate based on race", "don't perform medical experiments on someone without their consent" -- all exhibit some linguistic, metaphysical, and epistemic openness.

Some philosophers might deny linguistic and/or metaphysical openness: Maybe context always renders meanings perfectly precise, and maybe normative facts are never actually mushy-edged and indeterminate. Okay. Epistemic openness will remain. As long as we -- the reader, the consumer, the applier, of the philosophical doctrine -- can't reasonably be expected to grasp the full range of application, the view remains skeletal in my sense of the term.

It's not just ethics. Similar openness also pervades other areas of philosophy. For example, "higher order" theories of consciousness hold that an entity is conscious if and only if it has the right kind of representations of or knowledge of its own mental states or cognitive processes. Linguistically, what is meant by a "higher order representation", exactly? Metaphysically, might there be borderline cases that are neither determinately conscious nor unconscious? Epistemically, even if we could precisify the linguistic and metaphysical issues, what actual entities or states satisfy the criteria (mice? garden snails? hypothetical robots of various configurations?).

The degree of openness of a position is itself, to some extent, open: There's linguistic, metaphysical, and epistemic meta-openness, we might say. Even a highly skeletal view rules some things out. No reasonable fleshing out of "maximize utility" is consistent with torturing babies for no reason. But it's generally unclear where exactly the boundaries of openness lie, and there might be no precise boundary to be discovered.

#

Now, there's something to be said for skeletal philosophy. Simple maxims, which can be fleshed out in various ways, have an important place in our thinking. But at some point, the skeleton needs to get moving, if it's going to be of use. Lying passively in place, it might block a few ideas -- those that crash directly against its obvious bones. But to be livable, applicable, it needs some muscle. It needs to get up and walk over to real, specific situations. What does "maximize utility" (or whatever other policy, motto, slogan, principle) actually recommend in this particular case? Too skeletal a view will be silent, leaving it open.