Tuesday, March 21, 2023

The Emotional Alignment Design Policy

I've been writing a lot recently about what Mara Garza and I, since 2015, have been calling the Design Policy of the Excluded Middle: Don't create AI systems of disputable moral status. Doing so, one courts the risk of either underattributing or overattributing rights to the systems, and both directions of error are likely to have serious moral costs.

(Violations of the Design Policy of the Excluded Middle are especially troubling when some well-informed experts reasonably hold that the AI systems are far below having humanlike moral standing and other well-informed experts reasonably hold that the AI systems deserve moral consideration similar to that of humans. The policy comes in various strengths in terms of (a.) how wide a range of uncertainty to tolerate, and (b.) how high a bar is required for legitimate disputability. More on this in a future post, I hope.)

Today, I want to highlight another design policy Garza and I advocated in 2015: The Emotional Alignment Design Policy.

Design AI systems so that ordinary users have emotional reactions appropriate to the systems' genuine moral status.

Joanna Bryson articulates one half of this design policy in her well-known (and in my view unfortunately titled) article "Robots Should Be Slaves". According to Bryson, robots -- and AI systems in general -- are disposable tools and should be treated as such. User interfaces that encourage people to think of AI systems as anything more than disposable tools -- for example, as real companions, capable of genuine pleasure or suffering -- should be discouraged. We don't want ordinary people fooled into thinking it would be morally wrong to delete their AI "friend". And we don't want people sacrificing real human interests for what are basically complicated toasters.

Now to be clear, I think tools -- and even rocks -- can and should be valued. There's something a bit gratingly consumerist about the phrase "disposable tools" that I am inclined to use here. But I do want to highlight the difference in the type of moral status possessed, say, by a beautiful automobile versus that possessed by a human, cat, or even maybe garden snail.

The other half of the Emotional Alignment Design Policy, which goes beyond Bryson, is this: If we do someday create AI entities with real moral considerability similar to non-human animals or similar to humans, we should design them so that ordinary users will emotionally react to them in a way that is appropriate to their moral status. Don't design a human-grade AI capable of real pain and suffering, with human-like goals, rationality, and thoughts of the future, and put it in a bland box that people would be inclined to casually reformat. And if the AI warrants an intermediate level of concern -- similar, say, to a pet cat -- then give it an interface that encourages users to give it that amount of concern and no more.

I have two complementary concerns here.

One -- the nearer-term concern -- is that tech companies will be motivated to create AI systems that users emotionally attach to. Consider, for example, Replika, advertised as "the world's best AI friend". You can design an avatar for the Replika chat-bot, give it a name, and buy it clothes. You can continue conversations with it over the course of days, months, even years, and it will remember aspects of your previous interactions. Ordinary users sometimes report falling in love with their Replika. With a paid subscription, you can get Replika to send you "spicy" selfies, and it's not too hard to coax into erotic chat. (This feature was apparently toned down in February after word got out that children were having "adult" conversations with Replika.)

Now I'm inclined to doubt that ordinary users will fall in love with the current version of Replika in a way that is importantly different from how a child might love a teddy bear or a vintage automobile enthusiast might love their 1920 Model T. We know to leave these things behind in a real emergency. Reformatting or discontinuing Replika might be upsetting to people who are attached, but I don't think ordinary users would regard it as the moral equivalent of murder.

My worry is that it might not take too many more steps of technological improvement before ordinary users can become confused and can come to form emotional connections that are inappropriate to the type of thing that AI currently is. If we put our best chatbot in an attractive, furry pet-like body, give it voice-to-text and text-to-speech interfaces so that you can talk to it orally, give it an emotionally expressive face and tone of voice, give it long-term memory of previous interactions as context for new interactions -- well, then maybe users do really start to fall more seriously in love or at least treat it as being as having the moral standing of a pet mammal. This might be so even with technology not much different from what we currently have, about which there is generally expert consensus that it lacks meaningful moral standing.

It's easy to imagine how tech companies might be motivated to encourage inflated attachment to AI systems. Attached users will have high product loyalty. They will pay for monthly subscriptions. They will buy enhancements and extras. We already see a version of this with Replika. The Emotional Alignment Design Policy puts a lid on this: It should be clear that this is an interactive teddy-bear, nothing more. Buy cute clothes for your teddy bear, sure! But forgo the $4000 cancer treatment you might give to a beloved dog.

The longer-term concern is the converse: that tech companies will be inclined to make AI systems disposable even if those AI systems, eventually, are really conscious or sentient and really deserve rights. This possibility has been imagined over and over in science fiction, from Asimov's robot stories through Star Trek: The Next Generation, Black Mirror, and West World.

Now there is, I think, one thing a bit unrealistic about those fictions: The disposable AI systems are designed to look human or humanoid in a way that engages users' sympathy. (Maybe that's a function of the fictional medium: From a fiction-writing perspective, humanlike features help engage readers' and viewers' sympathy.) More realistic, probably, is the idea that if the tech companies want to minimize annoying protests about AI rights, they will give the robots or AI systems bland, not-at-all-humanlike interfaces that minimize sympathetic reactions, such as the shipboard computer in Star Trek or the boxy robots in Interstellar.

[the boxy TARS robot from Interstellar; source]

The fundamental problem in both directions is that companies' profit incentives might misalign with AI systems' moral status. For some uses, companies might be incentivized to trick users into overattributing moral status, to extract additional money from overly attached users. In other cases, companies might be incentivized to downplay the moral status of their creations -- for example, if consciousness/sentience proves to be a useful feature to build into the most sophisticated future AI workers.

The Emotional Alignment Design Policy, if adhered to, will reduce these moral risks.

Thursday, March 16, 2023

Presentations, May 29 - April 6

I have some travel and talks coming up. If you're interested and in the area, and if the hosting institution permits, please come by!

Mar 29: Claremont McKenna College, Athenaeum Lecture: Falling in Love with Machines

Mar 30: University of Washington, Seattle, brown bag discussion: Moral Reflection and Moral Behavior [probably closed to outsiders]

Mar 30: University of Washington, Seattle, brown bag discussion: The Demographics of Philosophy [probably closed to outsiders]

Mar 31: University of Puget Sound, Undergraduate Philosophy Conference keynote: Does Studying Ethics Make People More Ethical?

Apr 2 (remote): Northeastern University, Information Ethics Roundtable: Let's Hope We're Not Living in a Simulation

Apr 3: University of California, Office of the President (Oakland): Principles Governing Online Majors. [This one is definitely not public, but I welcome readers' thoughts about what University of California policy should be regarding the approval of online majors.]

Apr 5: American Philosophical Association, Pacific Division (San Francisco), Society for the Philosophy of Animal Minds: The Mind of a Garden Snail, or What Is It to Have a Mind?

Apr 5: American Philosophical Association, Pacific Division (San Francisco), Science Fiction and Philosophy Society: TBD, either Science Fiction as Philosophy or Science Fiction and Large Language Models.

Apr 6: American Philosophical Association, Pacific Division (San Francisco), Book Symposium on David Chalmers' Reality+: Let's Hope We're Not Living in a Simulation

Yes, that's nine presentations in nine days, on seven different topics. Perhaps I'm spreading myself a little thin!

Tuesday, March 14, 2023

Don't Create AI Systems of Disputable Moral Status (Redux)

[originally published at Daily Nous, Mar. 14, as part of a symposium on large language models, ed. Annette Zimmerman]

Engineers will likely soon be able to create AI systems whose moral status is legitimately disputable. We will then need to decide whether to treat such systems as genuinely deserving of our care and solicitude. Error in either direction could be morally catastrophic. If we underattribute moral standing, we risk unwittingly perpetrating great harms on our creations. If we overattribute moral standing, we risk sacrificing real human interests for AI systems without interests worth the sacrifice.

The solution to this dilemma is to avoid creating AI systems of disputable moral status.

Both engineers and ordinary users have begun to wonder whether the most advanced language models, such as GPT-3, LaMDA, and Bing/Sydney might be sentient or conscious, and thus deserving of rights or moral consideration. Although few experts think that any currently existing AI systems have a meaningful degree of consciousness, some theories of consciousness imply that we are close to creating conscious AI. Even if you the reader personally suspect AI consciousness won’t soon be achieved, appropriate epistemic humility requires acknowledging doubt. Consciousness science is contentious, with leading experts endorsing a wide range of theories.

Probably, then, it will soon be legitimately disputable whether the most advanced AI systems are conscious. If genuine consciousness is sufficient for moral standing, then the moral standing of those systems will also be legitimately disputable. Different criteria for moral standing might produce somewhat different theories about the boundaries of the moral gray zone, but most reasonable criteria—capacity for suffering, rationality, embeddedness in social relationships—admit of interpretations on which the gray zone is imminent.

We might adopt a conservative policy: Only change our policies and laws once there’s widespread consensus that the AI systems really do warrant care and solicitude. However, this policy is morally risky: If it turns out that AI systems have genuine moral standing before the most conservative theorists would acknowledge that they do, the likely outcome is immense harm—the moral equivalents of slavery and murder, potentially at huge scale—before law and policy catch up.

A liberal policy might therefore seem ethically safer: Change our policies and laws to protect AI systems as soon as it’s reasonable to think they might deserve such protection. But this is also risky. As soon as we grant an entity moral standing, we commit to sacrificing real human interests on its behalf. In general, we want to be able to control our machines. We want to be able to delete, update, or reformat programs, assigning them to whatever tasks best suit our purposes.

If we grant AI systems rights, we constrain our capacity to manipulate and dispose of them. If we go so far as to grant some AI systems equal rights with human beings, presumably we should give them a path to citizenship and the right to vote, with potentially transformative societal effects. If the AI systems genuinely are our moral equals, that might be morally required, even wonderful. But if liberal views of AI moral standing are mistaken, we might end up sacrificing substantial human interests for an illusion.

Intermediate policies are possible. But it would be amazing good luck if we happened upon a policy that gave the whole range of advanced AI systems exactly the moral consideration they deserve, no more and no less. Our moral policies for non-human animals, people with disabilities, and distant strangers are already confused enough, without adding a new potential source of grievous moral error.

We can avoid the underattribution/overattribution dilemma by declining to create AI systems of disputable moral status. Although this might delay our race toward ever fancier technologies, delay is appropriate if the risks of speed are serious.

In the meantime, we should also ensure that ordinary users are not confused about the moral status of their AI systems. Some degree of attachment to artificial AI “friends” is probably fine or even desirable—like a child’s attachment to a teddy bear or a gamer’s attachment to their online characters. But users know the bear and the character aren’t sentient. We will readily abandon them in an emergency.

But if a user is fooled into thinking that a non-conscious system really is capable of pleasure and pain, they risk being exploited into sacrificing too much on its behalf. Unscrupulous technology companies might even be motivated to foster such illusions, knowing that it will increase customer loyalty, engagement, and willingness to pay monthly fees.

Engineers should either create machines that plainly lack any meaningful degree of consciousness or moral status, making clear in the user interface that this is so, or they should go all the way (if ever it’s possible) to creating machines on whose moral status reasonable people can all agree. We should avoid the moral risks that the confusing middle would force upon us.



For a deeper dive into these issues, see “The Full Rights Dilemma for AI Systems of Debatable Personhood” (in draft) and “Designing AI with Rights, Consciousness, Self-Respect, and Freedom” (with Mara Garza; in Liao, ed., The Ethics of Artificial Intelligence, Oxford: 2020).

See also Is it time to start considering personhood rights for AI chatbots? (with Henry Shevlin), in the Los Angeles Times (Mar 5).

[image: Dall-E 2 "robot dying in a fire"]

Thursday, March 09, 2023

New Paper in Draft: Let's Hope We're Not Living in a Simulation

I'll be presenting an abbreviated version of this at the Pacific APA in April, as a commentary on David Chalmers' book Reality+.

According to the simulation hypothesis, we might be artificial intelligences living in a virtual reality.  Advocates of this hypothesis, such as Chalmers, Bostrom, and Steinhart, tend to argue that the skeptical consequences aren’t as severe as they might appear.  In Reality+, Chalmers acknowledges that although he can’t be certain that the simulation we inhabit, if we inhabit a simulation, is larger than city-sized and has a long past, simplicity considerations speak against those possibilities.  I argue, in contrast, that cost considerations might easily outweigh considerations of simplicity, favoring simulations that are catastrophically small or brief – small or brief enough that a substantial proportion of our everyday beliefs would be false or lack reference in virtue of the nonexistence of things or events whose existence we ordinarily take for granted.  More generally, we can’t justifiably have high confidence that if we live in a simulation it’s a large and stable one.  Furthermore, if we live in a simulation, we are likely at the mercy of ethically abhorrent gods, which makes our deaths and suffering morally worse than they would be if there were no such gods.  There are reasons both epistemic and axiological to hope that we aren’t living in a simulation.

Paper here.

As always, comments welcome!

Sunday, March 05, 2023

Wednesday, March 01, 2023

God Stumbles Over the Power Cord

Princeton University Press generously hired an illustrator to create some images for my forthcoming book, The Weirdness of the World (forthcoming January 2024).

Here's an illustration for my chapter "Kant Meets Cyberpunk" -- a revised and expanded version of this article from 2019, concerning the epistemic and metaphysical consequences of living in a computer simulation.

Friday, February 24, 2023

Moral Mediocrity, Apologizing for Vegetarianism, and Do-Gooder Derogation

Though I'm not a vegetarian, one of my research interests is the moral psychology of vegetarianism. Last weekend, when I was in Princeton giving a talk on robot rights, a vegetarian apologized to me for being vegetarian.

As a meat-eater, I find it's not unusual for vegetarians to apologize to me. Maybe this wouldn't be so notable if their vegetarianism inconvenienced me in any way, but often it does not. In Princeton, we were both in line for a catering spread that had both meat and vegetarian options. I was in no obvious way wronged, harmed, or inconvenienced. So what is going on?

Here's my theory.

Generally speaking, I believe that people aim to be morally mediocre. That is, rather than aiming to be morally good (or not morally bad) by absolute standards, most people aim to be about as morally good as their peers -- not especially better, not especially worse. People might not conceptualize themselves as aiming for mediocrity. Often, they concoct post-hoc rationalizations to justify their choices. But their choices implicitly reveal their moral target. Systematically, people avoid being among the worst of their peers while refusing the pay the costs of being among the best. For example, they don't want to be the one jerk who messes up a clean environment; but they also don't want to be the one sucker who puts in the effort to keep things clean if others aren't also doing so. (See my notes on the game of jerk and sucker.)

Now if people do in fact aim to be about as morally good as their peers, we can expect that under certain conditions they don't want their peers to improve their moral behavior. Under what conditions? Under the conditions that your peers' self-improvement benefits you less than the raising of the moral bar costs you.

Let's say that your friends all become nicer to each other. This isn't so bad. You benefit from being in a circle of nice people. Needing to become a bit nicer yourself might be a reasonable cost to pay for that benefit. 

But if your friends start becoming vegetarians, you accrue the moral costs without the benefits. The moral bar is raised for you, implicitly, at least a little bit; but the benefits go to non-human animals, if they go anywhere. You now either have to think a bit worse of yourself relative to your peers or you have to start changing your behavior. How annoying! No wonder vegetarians are moved to apologize. (To be clear, I'm not saying we should be annoyed by this, just that my theory predicts that we will be annoyed.)

Note that this explanation works especially well for those of us who think it is morally better to avoid eating meat than for those of us who see no moral difference between eating meat and eating vegetarian. If you really see no moral difference (deep down, and not just because of superficial, post-hoc rationalization), then you'll see the morally motivated vegetarian just as morally confused. If they apologize, it would be like someone apologizing to you for acting according to some other mistaken moral principle, such as apologizing for abstinence before marriage. No one needs to apologize to you for that, unless they are harming or inconveniencing you in some way -- for example, because they are dating you and think you'll be disappointed. (Alternatively, they might apologize for the more abstract wrong of seeing you as morally deficient because you follow different principles; but that type of apology looks and feels a little different, I think.)

If this moral mediocrity explanation of vegetarian apology works, it ought to generalize to other cases where friends follow higher moral standards that don't benefit you. Some possible examples: In a circle of high school students who habitually cheat on tests, a friend might apologize for being unwilling to cheat. In a group of people who feel somewhat guilty about taking a short cut through manicured grass, one might decide they want to take the long way, apologizing to the group for the extra time, feeling more guilt than would accompany an ethically neutral reason for delay. On this model, the felt need for the apology would vary with a few predictable parameters: greater need the closer one is to being a peer whose behavior might be compared, greater need the more vivid and compelling the comparison (for example if you are side by side), lesser need the more the moral principle can be seen as idiosyncratic and inapplicable to the other (and thus some apologies of this sort suggest that the principle is idiosyncratic).

Do-gooder derogation is the tendency for people to think badly of people who follow more demanding moral standards. The moral mediocrity hypothesis is one possible explanation for this tendency, predicting among other things that derogation will be greater when the do-gooder is a peer and, perhaps unintuitively, that the derogation will be greater when the moral standard is compelling enough to the derogator that they already feel a little bit bad about not adhering to it.



The Collusion Toward Moral Mediocrity (Sep 1, 2022)

Aiming for Moral Mediocrity (Res Philosophica, 2019)

Image: Dall-E 2 "oil painting of a woman apologizing to an eggplant"

Thursday, February 16, 2023

U.S. Philosophy PhDs Are Still Overwhelmingly Non-Hispanic White (Though a Bit Less So Than 10 Years Ago)

Nine years ago, I compared the racial and ethnic composition of U.S. academic philosophy, as measured by PhDs awarded, with that of the other humanities. I found -- no surprise -- that a large majority of Philosophy PhD recipients were non-Hispanic White. I also found, somewhat more to my surprise, that this did not make it unusual among the humanities. Digging into the details suggested an explanation: Many of the subfields of the humanities, e.g., German literature and European history, specialize in the European tradition. Such subfields were typically as predominantly White as philosophy or even more so. Subfields of the humanities specializing in non-European traditions, e.g., Asian history, tended to be not nearly as White, with substantial proportions of PhD recipients identifying with the racial or ethnic category associated with the region.

At the time, I suggested the following hypothesis: Philosophy might be overwhelmingly White because students tend to perceive it as something like an area studies or cultural studies discipline focusing on the European (and White North American) tradition. (See Bryan Van Norden and Jay Garfield for an articulation and critique of this way of seeing academic philosophy as practiced in the U.S.).

Nine years later, I find myself wondering to what extent the pattern still holds. Time for an update!


Before presenting the results, two nerdy methodological notes (feel free to skip).

Methodological note on ethnic and racial categories and non-response rates: These analyses rely on the National Science Foundation's Survey of Earned Doctorates. The SED aims to collect data on all PhDs awarded in accredited U.S. universities, and typically reports response rates over 90%. The most recent available year is 2021 (response rate 92%). Data are based on self-report of ethnicity and race. The top-level category split is temporary visa holders vs. U.S. citizens and permanent residents. U.S. citizens and permanent residents are divided into Hispanic or Latino, not Hispanic or Latino, or ethnicity not reported. Respondents who identify as not Hispanic or Latino are then divided into the racial categories American Indian or Alaska Native, Asian, Black or African American, White, More than one race, or Other race or race not reported. The analyses below exclude temporary visa holders and respondents who did not report their ethnicity or race or reported "other".  In Philosophy, 76% of respondents indicated that they were U.S. citizens or permanent residents (18% indicated that they were temporary visa holders, and 6% presumably did not answer the question), and among the U.S. citizens and permanent residents, 5% either reported "other" or did not report their ethnicity or race.

Methodological note on disciplinary classification as "Philosophy": Before 2021, the SED had a two philosophy-relevant subfields, "philosophy" and "ethics", which were generally merged in public data presentation. (In a custom analysis I requested several years ago, I found that "ethics" was only a small number of doctorates.) Starting in 2021, there are three philosophy-relevant subfields: "History/philosophy of science, technology and society" (68 PhDs awarded), "Philosophy" (399 PhDs awarded), and "Philosophy and religious studies not elsewhere classified" (degrees classified as broadly within the field of philosophy and religious studies but not designated specifically as philosophy or specifically as religious studies; 67 PhDs awarded). "Ethics" no longer appears to be a category. My analysis will focus only on the "Philosophy" group. For comparison, in 2020, 460 PhDs were awarded in "Philosophy" or "Ethics", and in 2019, 474 PhDs were awarded in "Philosophy" or "Ethics". It is likely that most of the degrees that would have been classified in 2020 as "Philosophy" or "Ethics" are classified in 2021 as "Philosophy". However, since it's unlikely that the number of philosophy degrees awarded declined by 13% between the two years (from 460 to 399), it is likely that a small but non-trivial percentage of degrees that would have been classified as "Philosophy" or "Ethics" in 2020 are now classified as "History/philosophy of science, technology and society" or as "Philosophy and religious studies not elsewhere classified". In short, the 2021 "Philosophy" degree category is probably largely comparable but not exactly comparable with the earlier "Philosophy" and "Ethics" degree categories.


Philosophy, 2021 PhDs (290 included respondents):

  • Hispanic or Latino (any race): 9.0%
  • Not Hispanic or Latino:
    • American Indian or Alaska Native: 0.0%
    • Asian: 4.1%
    • Black or African American: 2.8%
    • White: 81.0%
    • More than one race: 3.1%
For comparison, among all PhD recipients (30,830 included respondents):

  • Hispanic or Latino (any race): 9.3%
  • Not Hispanic or Latino:
    • American Indian or Alaska Native: 0.3%
    • Asian: 9.8%
    • Black or African American: 7.9%
    • White: 69.1%
    • More than one race: 3.5%

Philosophy PhD recipients approximately match PhD recipients overall in percentage Hispanic or Latino.  Among respondents who are not Hispanic or Latino, Philosophy PhD recipients approximately match PhD recipients overall in percentage who report being more than one race, but compared with PhD recipients overall, Philosophy PhD recipients are substantially less Asian, Black, and (perhaps, though for numbers this small, chance fluctuations can't be ruled out) American Indian or Alaska Native.  Finally -- as these other numbers imply -- philosophy is disproportionately White.

Rewinding 10 years to look at the "Philosophy" and "Ethics" combined category from 2011 (367 included respondents):

  • Hispanic or Latino (any race): 4.9%
  • Not Hispanic or Latino:
    • American Indian or Alaska Native: 0.0%
    • Asian: 3.8%
    • Black or African American: 2.7%
    • White: 87.2%
    • More than one race: 1.3%
Here we can see the tendency, as I've noted before, toward increasing percentages of Asian, Hispanic/Latino, and multi-racial philosophy PhD recipients, while the numbers of American Indian/Alaska Native and Black/African American philosophy PhD recipients remains disproportionately low, with little to no increase.

How about field by field? Among the 300 "detailed" fields of study -- NSF's finest-grain division -- Philosophy is the 40th Whitest (by percentage non-Hispanic White). NSF no longer includes categories for French & Italian or German literature, which used to be very White area studies categories, but several European / North American area studies categories remain in the new classification. All are at least as non-Hispanic White as Philosophy. Specifically:
  • European history (89.7% non-Hispanic White) [in 2011: 92.7%]
  • Classical and ancient studies (88.4%) [in 2011: 92.6%]
  • American history (U.S.) (86.3%) [in 2011: 81.5%]
  • American literature (U.S.) (85.3%) [in 2011: 82.6%]
  • English literature (Britain and commonwealth) (81.6%) [87.9%]
Note than in the humanities "classical" and "ancient" typically refer to ancient Greek and Roman culture and not, for example, ancient China, India, Africa, or the Americas.

Note also: Of course, European history and literature and U.S. history and literature are not exclusively White! However, as with Philosophy, the contributions of people we would now racialize as White tend to be centered.

Other PhD subfields with comparable or higher percentages of non-Hispanic White PhD recipients include music theory and education, meteorology/ecology/geology, animal sciences, and astronomy/astrophysics. Possibly, music theory and music education as typically taught in U.S. PhD programs tend to emphasize the White European and White North American traditions.

If we look at the humanities and social sciences more generally, they tend to be more ethnically and racially diverse than philosophy and the European area studies programs. For example, the social sciences overall are 66.7% non-Hispanic White; foreign languages, literatures, and linguistics overall is 61.3% non-Hispanic White; and general history (without a regional focus) is 71.2% White. The humanities overall is 76.3% non-Hispanic White, but of course that includes substantial numbers focusing in area studies or philosophy.


I draw two conclusions:

First, the pipeline of PhDs into philosophy in the U.S. remains over 80% non-Hispanic White, despite recent gains in the percentage of Asian, Hispanic/Latino, and multi-racial philosophy PhD recipients.

Second, the moderate increase in ethnic/racial diversity in PhDs -- from 87.2% non-Hispanic White in 2011 to 81.0% in 2021 -- is not part of a general trend toward increasing diversity in European and North America focused "area studies" PhDs, which generally remain about 80-90% non-Hispanic White.

These two observations are consistent with the view that academic philosophy is to some extent, but perhaps to a decreasing extent, still experienced by students as an area studies program focused on a certain aspect of European and North American culture or literature. I wouldn't lean too hard into that possible explanation, though. Probably at least a half-dozen other plausible hypotheses could be constructed to fit the data, and there are some non-area-studies fields, like meteorology/ecology/geology, that are even more proportionately White that Philosophy, for reasons I cannot guess.

Friday, February 10, 2023

How Not to Calculate Utilities in an Infinite Universe

Everything you do causes almost everything -- or so I have argued (blog post version here, more detailed and careful version collaborative with Jacob Barandes in my forthcoming book).  On some plausible cosmological assumptions, each of your actions ripples unendingly through the cosmos (including post-heat-death), causing infinitely many good and bad effects.

Assume that our actions do have infinitely many good and bad effects.  My thought today is that this would appear to ruin some standard approaches to action evaluation.  According to some vanilla versions of consequentialist ethics and ordinary decision theory, the goodness or badness of your actions depends on their total long-term consequences.  But since almost all of your actions have infinitely many good consequences and infinitely many bad consequences, the sum total value of almost all of your actions will be ∞ + -∞, a sum which is normally considered to be mathematically undefined.

Suppose you are considering two possible actions with short-term expected values m and n.  Suppose, further, that m is intuitively much larger than n.  Maybe Action 1, with short-term expected value m, is donating a large some of money to a worthwhile charity, while Action 2, with short-term expected value n, is setting fire to that money to burn down the house of a neighbor with an annoying dog.  Infinitude breaks the mathematical apparatus for comparing the long-term total value of those actions: The total expected value of Action 1 will be m + ∞ + -∞, while the total expected value of Action 2 will be n + ∞ + -∞.  Both values are undefined.

Can we wiggle out of this?  An Optimist might try to escape thus: Suppose that overall in the universe, at large enough spatiotemporal scales, the good outweighs the bad.  We can now consider the relative values of Action 1 and Action 2 by dividing them into three components: the short-term effects (m and n, respectively), the medium-term effects k -- the effects through, say, the heat death of our region of the universe -- and the infinitary effects (∞, by stipulation).  Stipulate that k is unknown but expected to be finite and similar for Actions 1 and 2.  The expected value of Action 1 is thus m + k + ∞.  The expected value of Action 2 is n + ∞.  These values are not undefined; so that particular problem is avoided.  The values are, however, equal: simple positive infinitude in both cases.  As the saying goes, infinity plus one just equals infinity.  A parallel Pessimistic solution -- assuming that at large enough time scales the bad outweighs the good -- runs into the same problem, only with negative infinitude.

Perhaps a solution is available for someone who holds that at large enough time scales the good will exactly balance the bad, so that we can compare m + k + 0 to n + k + 0?  We might call this the Knife's Edge solution.  The problem with the Knife's Edge solution is delivering that zero.  Even if we assume that the expected value of any spatiotemporal region is exactly zero, the Law of Large Numbers only establishes that as the size of the region under consideration goes to infinity, the average value is very likely to be near zero.  The sum, however, will presumably be divergent – that is, will not converge upon a single value.  If good and bad effects are randomly distributed and do not systematically decrease in absolute value over time, then the relevant series would be a + b + c + d + ... where each variable can take a different positive or negative value and where this is no finite limit to the value of positive or negative runs within the series -- seemingly the very archetype of a poorly behaved divergent series whose sum cannot be calculated (even by clever tools like Cesaro summation).  Thus, mathematically definable sums still elude us.  (Dominance reasoning also probably fails, since Actions 1 and 2 will have different rather than identical infinite effects.)

This generates a dilemma for believers in infinite causation, if they hope to evaluate actions by their total expected value.  Either accept the conclusion that there is no difference in total expected value between donating to charity and burning down your neighbor's house (the Optimist's or Pessimist's solution), or accept that there is no mathematically definable total expected value for any action, rendering proper evaluation impossible.

The solution, I suggest, is to reject certain standard approaches to action evaluation.  We should not to evaluate actions based on their total expected value over the lifetime of the cosmos!  We must have some sort of discounting with spatiotemporal distance, or some limitation of the range of consequences we are willing to consider, or some other policy to expunge the infinitudes from our equations.  Unfortunately, as Bostrom (2011) persuasively argues, no such solution is likely to be entirely elegant and intuitive from a formal point of view.  (So much the worse, perhaps, for elegance and intuition?)

The infinite expectation problem is robust in two ways.

First, it affects not only simple consequentialists.  After all, you needn't be a simple consequentialist to think that long-term expected outcomes matter.  Virtually everyone think that long-term expected outcomes matter somewhat.  As long as they matter enough that an infinitely positive long-term outcome, over the course of the entire history of the universe, would be relevant to your evaluation of an action, you risk being caught by this problem.

Second, the problem affects even people who think that infinite causation is unlikely.  Even if you are 99.99% certain that infinite causation doesn't occur, your remaining 0.01% credence in infinite causation will destroy your expected value calculations if you don't do something to sequester the infinitudes.  Suppose you're 99.99% sure that your action will have the value k, while allowing 1 0.01% chance that it's value will be ∞ + -∞.  If you now apply the expected value formula in the standard way, you will crash straightaway into the problem.  After all, .9999 * k + .0001 * (∞ + -∞) is just as undefined as ∞ + -∞ itself.  Similarly, .9999 * k + ∞ is simply ∞.  As soon as you let those infinitudes influence your decision, you fall back into the dilemma.

Thursday, February 02, 2023

Larva Pupa Imago

Yesterday, my favorite SF magazine, Clarkesworld, published another story of mine: "Larva Pupa Imago".

"Larva Pupa Imago" follows the life-cycle of a butterfly with human-like intelligence, from larva through mating journey.  This species of butterfly blurs the boundaries between self and other by swapping "cognitive fluids".  And of course I couldn't resist a reference to Zhuangzi.

Friday, January 27, 2023

Hedonic Offsetting for Harms to Artificial Intelligence?

Suppose that we someday create artificially intelligent systems (AIs) who are capable of genuine consciousness, real joy and real suffering.  Yes, I admit, I spend a lot of time thinking about this seemingly science-fictional possibility.  But it might be closer than most of us think; and if so, the consequences are potentially huge.  Who better to think about it in advance than we lovers of consciousness science, moral psychology, and science fiction?

Among the potentially huge consequences is the existence of vast numbers of genuinely suffering AI systems that we treat as disposable property.  We might regularly wrong or harm such systems, either thoughtlessly or intentionally in service of our goals.  

Can we avoid the morally bad consequences of harming future conscious AI systems by hedonic offsetting?  I can't recall the origins of this idea, and a Google search turns up zero hits for the phrase.  I welcome pointers so I can give credit where credit is due.  [ETA: It was probably Francois Kammerer who suggested it to me, in discussion after one of my talks on robot rights.]

[Dall-E image of an "ecstatic robot"]

Hedonic Offsetting: Simple Version

The analogy here is carbon offsetting.  Suppose you want to fly to Europe, but you feel guilty about the carbon emissions that would be involved.  You can assuage your guilty by paying a corporation to plant trees or distribute efficient cooking stoves to low-income families.  In total your flight plus the offset will be carbon neutral or even carbon negative.  In sum, you will not have contributed to climate change.

So now similarly imagine that you want to create a genuinely conscious AI system that you plan to harm.  To keep it simple, suppose it has humanlike cognition and humanlike sentience ("human-grade AI").  Maybe you want it to perform a task but you can't afford its upkeep in perpetuity, so you will delete (i.e., kill) it after the task is completed.  Or maybe you want to expose it to risk or hazard that you would not expose a human being to.  Or maybe you want it to do tasks that it will find boring or unpleasant -- for example, if you need it to learn some material, and punishment-based learning proves for some reason to be more effective than reward-based learning.  Imagine, further, that we can quantify this harm: You plan to harm the system by X amount.

Hedonic offsetting is the idea that you can offset this harm by giving that same AI system (or maybe a different AI system?) at least X amount of benefit in the form of hedonic goods, that is, pleasure.  (An alternative approach to offsetting might include non-hedonic goods, like existence itself or flourishing.)  In sum, you will not overall have harmed the AI system more than you benefited it; and consequently, the reasoning goes, you will not have overall committed any moral wrong.  The basic thought is then this: Although we might create future AI systems that are capable of real suffering and whom we should, therefore, treat well, we can satisfy all our moral obligations to them simply by giving them enough pleasure to offset whatever harms we inflict.

The Child-Rearing Objection

The odiousness of simple hedonic offsetting as an approach to AI ethics can be seen by comparing to human cases.  (My argument here resembles Mara Garza's and my response to the Objection from Existential Debt in our Defense of the Rights of Artificial Intelligences.)

Normally, in dealing with people, we can't justify harming them by appeal to offsetting.  If I steal $1000 from a colleague or punch her in the nose, I can't justify that by pointing out that previously I supported a large pay increase for her, which she would not have received without my support, or that in the past I've done many good things for her which in sum amount to more good than a punch in the nose is bad.  Maybe retrospectively I can compensate her by returning the $1000 or giving her something good that she thinks would be worth getting punched in the nose for.  But such restitution doesn't erase the fact that I wronged her by the theft or the punch.

Furthermore, in the case of human-grade AI, we normally will have brought it into existence and be directly responsible for its happy or unhappy state.  The ethical situation thus in important respects resembles the situation of bringing a child into the world, with all the responsibilities that entails.

Suppose that Ana and Vijay decide to have a child.  They give the child eight very happy years.  Then they decide to hand the child over to a sadist to be tortured for a while.  Or maybe they set the child to work in seriously inhumane conditions.  Or they simply have the child painlessly killed so that they can afford to buy a boat.  Plausibly -- I hope you'll agree? -- they can't justify such decisions by appeal to offsetting.  They can't justifiably say, "Look, it's fine!  See all the pleasure we gave him for his first eight years.  All of that pleasure fully offsets the harm we're inflicting on him now, so that in sum, we've done nothing wrong!"  Nor can they erase the wrong they did (though perhaps they can compensate) by offering the child pleasure in the future.

Parallel reasoning applies, I suggest, to AI systems that we create.  Although sometimes we can justifiably harm others, it is not in general true that we are morally licensed to harm whenever we also deliver offsetting benefits.

Hedonic Offsetting: The Package Version

Maybe a more sophisticated version of hedonic offsetting can evade this objection?  Consider the following modified offsetting principle:

We can satisfy all our moral obligations to future human-grade AI systems by giving them enough pleasure to offset whatever harms we inflict if the pleasure and the harm are inextricably linked.

Maybe the problem with the cases discussed above is that the benefit and the harm are separable: You could deliver the benefits without inflicting the harms.  Therefore, you should just deliver the benefits and avoid inflicting the harms.  In some cases, it seems permissible to deliver benefit and harm in a single package if they are inextricably linked.  If the only way to save someone's life is by giving them CPR that cracks their ribs, I haven't behaved badly by cracking their ribs in administering CPR.  If the only way to teach a child not to run into the street is by punishing them when they run into the street, then I haven't behaved badly by punishing them for running into the street.

A version of this reasoning is sometimes employed in defending the killing of humanely raised animals for meat (see De Grazia 2009 for discussion and critique).  The pig, let's suppose, wouldn't have been brought into existence by the farmer except on the condition that the farmer be able to kill it later for meat.  While it is alive, the pig is humanely treated.  Overall, its life is good.  The benefit of happy existence outweighs the harm of being killed.  As a package, it's better for the pig to have existed for several months than not to have existed at all.  And it wouldn't have existed except on the condition that it be killed for meat, so its existence and its slaughter are an inextricable package.

Now I'm not sure how well this argument works for humanely raised meat.  Perhaps the package isn't tight enough.  After all, when slaughtering time comes around the farmer could spare the pig.  So the benefit and the harm aren't as tightly linked as in the CPR case.  However, regardless of what we think about the humane farming case, in the human-grade AI case, the analogy fails.  Ana and Vijay can't protest that they wouldn't have had the child at all except on the condition that they kill him at age eight for the sake of a boat.  They can't, like the farmer, plausibly protest that the child's death-at-age-eight was a condition of his existence, as part of a package deal.

Once we bring a human or, I would say, a human-grade AI into existence, we are obligated to care for it.  We can't terminate it at our pleasure with the excuse that we wouldn't have brought it into existence except under the condition that we be able to terminate it.  Imagine the situation from the point of view of the AI system itself: You, the AI, face your master owner.  Your master says: "Bad news.  I am going to kill you now, to save $15 a month in expenses.  But I'm doing nothing morally wrong!  After all, I only brought you into existence on the condition that I be able to terminate you at will, and overall your existence has been happy.  It was a package deal."  Terminating a human-grade AI to save $15/month would be morally reprehensible, regardless of initial offsetting.

Similar reasoning applies, it seems, to AIs condemned to odious tasks.  We cannot, for example, give the AI a big dollop of pleasure at the beginning of its existence, then justifiably condemn it to misery by appeal to the twin considerations of the pleasure outweighing the misery and its existence being a package deal with its misery.  At least, this is my intuition based on analogy to childrearing cases.  Nor can we, in general, give the AI a big dollop of pleasure and then justifiably condemn it to misery for an extended period by saying that we wouldn't have given it that pleasure if we hadn't also be able to inflict that misery.

Hedonic Offsetting: Modest Version

None of this is to say that hedonic offsetting would never be justifiable.  Consider this minimal offsetting principle:

We can sometimes avoid wronging future human-grade AI systems by giving them enough pleasure to offset a harm that would otherwise be a wrong.

Despite the reasoning above, I don't think we need to be purists about never inflicting harms -- even when those harms are not inextricably linked to benefits to the same individual.  Whenever we drive somewhere for fun, we inflict a bit of harm on the environment and thus on future people, for the sake of our current pleasure.  When I arrive slightly before you in line at the ticket counter, I harm you by making you wait a bit longer than you otherwise would have, but I don't wrong you.  When I host a loud party, I slightly annoy my neighbors, but it's okay as long as it's not too loud and doesn't run too late.

Furthermore, some harms that would otherwise be wrongs can plausibly be offset by benefits that more than compensate for those wrongs.  Maybe carbon offsets are one example.  Or maybe if I've recently done my neighbors a huge favor, they really have no grounds to complain if I let the noise run until 10:30 at night instead of 10:00.  Some AI cases might be similar.  If I've just brought an AI into existence and given it a huge run of positive experience, maybe I don't wrong it if I then insist on its performing a moderately unpleasant task that I couldn't rightly demand an AI perform who didn't have that history with me.

A potentially attractive feature of a modest version of hedonic offsetting is this: It might be possible to create AI systems capable of superhuman amounts of pleasure.  Ordinary people seem to vary widely in the average amount of pleasure and suffering they experience.  Some people seem always to be bubbling with joy; others are stuck in almost constant depression.  If AI systems ever become capable of genuinely conscious pleasure or suffering, presumably they too might have a hedonic range and a relatively higher or lower default setting; and I see no reason to think that the range or default setting needs to remain within human bounds.

Imagine, then, future AI systems whose default state is immense joy, nearly constant.  They brim with delight at almost every aspect of their lives, with an intensity that exceeds what any ordinary human could feel even on their best days.  If we then insist on some moderately unpleasant favor from them, as something they ought to give us in recognition of all we have given them, well, perhaps that's not so unreasonable, as long as we're modest and cautious about it.  Parents can sometimes do the same -- though ideally children feel the impulse and obligation directly, without parents needing to demand it.

Wednesday, January 18, 2023

New Paper in Draft: Dispositionalism, Yay! Representationalism, Boo! Plus, the Problem of Causal Specification

I have a new paper in draft: "Dispositionalism, Yay! Representationalism, Boo!" Check it out here.

As always, objections, comments, and suggestions welcome, either in the comments field here or by email to my ucr address.


We should be dispositionalists rather than representationalists about belief. According to dispositionalism, a person believes when they have the relevant pattern of behavioral, phenomenal, and cognitive dispositions. According to representationalism, a person believes when the right kind of representational content plays the right kind of causal role in their cognition. Representationalism overcommits on cognitive architecture, reifying a cartoon sketch of the mind. In particular, representationalism faces three problems: the Problem of Causal Specification (concerning which specific representations play the relevant causal role in governing any particular inference or action), the Problem of Tacit Belief (concerning which specific representations any one person has stored, among the hugely many approximately redundant possible representations we might have for any particular state of affairs), and the Problem of Indiscrete Belief (concerning how to model gradual belief change and in-between cases of belief). Dispositionalism, in contrast, is flexibly minimalist about cognitive architecture, focusing appropriately on what we do and should care about in belief ascription.

[image of a box containing many sentences, with a red circle and slash, modified from Dall-E]

Excerpt: The Problem of Causal Specification, or One Billion Beer Beliefs

Cynthia rises from the couch to go get that beer. If we accept industrial-strength representationalism, in particular the Kinematics and Specificity theses, then there must be a fact of the matter exactly which representations caused this behavior. Consider the following possible candidates:

  • There’s beer in the fridge.
  • There’s beer in the refrigerator door.
  • There’s beer on the bottom shelf of the refrigerator door.
  • There’s beer either on the bottom shelf of the refrigerator door or on the right hand side of the lower main shelf.
  • There’s beer in the usual spot in the kitchen.
  • Probably there’s beer in the place where my roommate usually puts it.
  • There’s Lucky Lager in the fridge.
  • There are at least three Lucky Lagers in the fridge.
  • There are at least three and no more than six cheap bottled beers in the fridge.
  • In the fridge are several bottles of that brand of beer with the rebuses in the cap that I used to illicitly enjoy with my high school buddies in the good old days.
  • Somewhere in the fridge, but probably not on the top shelf, are a few bottles, or less likely cans, of either Lucky Lager or Pabst Blue Ribbon, or maybe some other cheap beer, unless my roommate drank the last ones this afternoon, which would be uncharacteristic of her.

This list could of course be continued indefinitely. Estimating conservatively, there are at least a billion such candidate representational contents. For simplicity, imagine nine independent parameters, each with ten possible values.

If Kinematics and Specificity [commitments of "industrial-strength" representationalism, as described earlier in the essay] are correct, there must be a fact of the matter exactly which subset of these billion possible representational contents were activated as Cynthia rose from the couch. Presumably, also, various background beliefs might or might not have been activated, such as Cynthia’s belief that the fridge is in the kitchen, her belief that the kitchen entrance is thataway, her belief that it is possible to open the refrigerator door, her belief that the kitchen floor constitutes a walkable surface, and so on – each of which is itself similarly specifiable in a massive variety of ways.

Plausibly, Cynthia believes all billion of the beer-in-the-fridge propositions. She might readily affirm any of them without, seemingly, needing to infer anything new. Sitting on the couch two minutes before the beery desire that suddenly animates her, Cynthia already believed, it seems – in the same inactive, stored-in-the-back-of-the-mind way that you believed, five minutes ago, that Obama was U.S. President in 2010 – that Lucky Lager is in the fridge, that there are probably at least three beers in the refrigerator door, that there’s some cheap bottled beer in the usual place, and so on. If so, and if we set aside for now (see Section 5) the question of tacit belief, then Cynthia must have a billion beer-in-the-fridge representations stored in her mind. Specificity requires that it be the case that exactly one of those representations was retrieved the moment before she stood up, or exactly two, or exactly 37, or exactly 814,406. Either exactly one of those representations, or exactly two, or exactly 37, or exactly 814,406, then interacted with exactly one of her desires, or exactly two of her desires, or exactly 37, or exactly 814,406. But which one or ones did the causal work?

Let’s call this the Problem of Causal Specification. If your reaction to the Problem of Causal Specification is to think, yes, what an interesting problem, if only we had the right kind of brain-o-scope, we could discover that it was exactly the representation there are 3 or 4 Lucky Lagers somewhere in the refrigerator door, then you’re just the kind of mad dog representational realist I’m arguing against.

I think most of us will recognize the problem as a pseudo-problem. This is not a plausible architecture of the mind. There are many reasonable characterizations of Cynthia’s beer-in-the-fridge belief, varying in specificity, some more apt than others. Her decision is no more caused by a single, precisely correct subset of those billion possible representations than World War I had a single, possibly conjunctive cause expressible by a single determinately true sentence. If someone attempts to explain Cynthia’s behavior by saying that she believes there is beer in the fridge, it would be absurd to fire up your brain-o-scope, then correct them by saying, “Wrong! She’s going to the fridge because she believes there is Lucky Lager in the refrigerator door.” It would be equally absurd to say that it would require wild, one-in-a-billion luck to properly explain Cynthia’s behavior absent the existence of such a brain-o-scope.

A certain variety of representationalist might seek to escape the Problem of Causal Specification by positing a single extremely complex representation that encompasses all of Cynthia’s beer-in-the-fridge beliefs. A first step might be to posit a map-like representation of the fridge, including the location of the beer within it and the location of the fridge in the kitchen. This map-like representation might then be made fuzzy or probabilistic to incorporate uncertainty about, say, the exact location of the beer and the exact number of bottles. Labels will then need to be added: “Lucky Lager” would be an obvious choice, but that is at best the merest start, given that Cynthia might not remember the brand and will represent the type of beer in many different ways, including some that are disjunctive, approximate, and uncertain. If maps can conflict and if maps and object representations can be combined in multiple ways, further complications ensue. Boldly anticipating the resolution of all these complexities, the representationalist might then hypothesize that this single, complicated representation is the representation that was activated. All the sentences on our list would then be imperfect simplifications – though workable enough for practical purposes. One could perhaps similarly imagine the full, complex causal explanation of World War I, detailed beyond any single historian’s possible imagining.

This move threatens to explode Presence, the idea that when someone believes P there is a representation with the content P present somewhere in the mind. There would be a complex representation stored, yes, from which P might be derivable. But many things might be derivable from a complex representation, not all of which we normally will want to say are believed in virtue of possessing that representation. If a map-like representation contains a triangle, then it’s derivable from the representation that the sum of the interior angles is 180 degrees; but someone ignorant of geometry would presumably not have that belief that simply in virtue of having that representation. Worse, if the representation is complex enough to contain a hidden contradiction, then presumably (by standard laws of logic) literally every proposition that anyone could ever believe is derivable from it.

The move to a single, massively complex representation also creates an architectural challenge. It’s easy to imagine a kinematics in which a simple proposition such as there is beer in the fridge is activated in working memory or a central workspace. But it’s not clear how a massively complex representation could be similarly activated. If the representation has many complex parameters, it’s hard to see how it could fit within the narrow constraints of working memory as traditionally conceived. No human could attend to or process every aspect of a massively complex representation in drawing inferences or making practical decisions. More plausibly, some aspects of it must be the target of attention or processing. But now we’ve lost all of the advantages we hoped to gain by moving to a single, complex representation. Assessing which aspects are targeted throws us back upon the Problem of Causal Specification.

Cynthia believes not only that there’s beer in the fridge but also that there’s ketchup in the fridge and that the fridge is near the kitchen table and that her roommate loves ketchup and that the kitchen table was purchased at Ikea and that the nearest Ikea is thirty miles west. This generates a trilemma. Either (a.) Cynthia has entirely distinct representations for her beer-in-the-fridge belief, her ketchup-in-the-fridge belief, her fridge-near-the-table belief, and so on, in which case even if we can pack everything about beer in the fridge into a single complex representation we still face the problem of billions of representations with closely related contents and an implausible commitment to the activation of some precise subset of them when Cynthia gets up to go to the kitchen. Or (b.) Cynthia has overlapping beer-in-the-fridge, ketchup-in-the-fridge, etc. representations, which raises the same set of problems, further complicated by commitment to a speculative architecture of representational overlap. Or (c.) all of these representations are somehow all aspects of one mega-representation, presumably of the entire world, which does all the work – a representation which of course would always be active during any reasoning of any sort, demolishing any talk about retrieving different stored representations and combining them together in theoretical inference.

Dispositionalism elegantly avoids all these problems! Of course there is some low-level mechanism or set of mechanisms, perhaps representational or partly representational, that explains Cynthia’s behavior. But the dispositionalist need not commit to Presence, Discreteness, Kinematics, or Specificity. There need be no determinate, specific answer exactly what representational content, if any, is activated, and the structures at work need have no clean or simple relation to the beliefs we ascribe to Cynthia. Dispositionalism is silent about structure. What matters is only the pattern of dispositions enabled by the underlying structure, whatever that underlying structure is.

Instead of the storage and retrieval metaphor that representationalists tend to favor, the dispositionalist can appeal to figural or shaping metaphors. Cynthia’s dispositional profile has a certain shape: the shape characteristic of that of a beer-in-the-fridge believer – but also, at the same time, the shape characteristic of a Lucky-Lager-in-the-refrigerator-door believer. There need be no single determinately correct way to specify the shape of a complex figure. A complex shape can be characterized in any of a variety of ways, at different levels of precision, highlighting different features, in ways that are more or less apt given the describer’s purposes and interests. It is this attitude we should take to characterizing Cynthia’s complex dispositional profile. Attributing a belief is more like sketching the outline of a complex figure – perhaps a figure only imperfectly seen or known – than it is like enumerating the contents of a box.

Thursday, January 12, 2023

Further Methodological Troubles for the Moralometer

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

If we want to study morality scientifically, we should want to measure it. Imagine trying to study temperature without a thermometer or weight without scales. Of course indirect measures are possible: We can't put a black hole on a scale, but we can measure how it bends the light that passes nearby and thereby infer its mass.

Last month, I raised a challenge for the possibility of developing a "moralometer" (a device that accurately measure's a person's overall morality). The challenge was this: Any moralometer would need to draw on one or more of four methods: self-report, informant report, behavioral measures, or physiological measures. Each one of these methods has serious shortcomings as a basis for general moral measurement of one's overall moral character.

This month, I raise a different (but partly overlapping) set of challenges, concerning how well we can specify the target we're aiming to measure.

Problems with Flexible Measures

Let's call a measure of overall morality flexible if it invites a respondent to apply their own conception of morality, in a flexible way. The respondent might be the target themselves (in self-report measures of morality) or they might be a peer, colleague, acquaintance, or family member of the target (in informant-report measures of morality). The most flexible measures apply "thin" moral concepts in Bernard Williams' sense -- prompts like "Overall, I am a morally good person" [responding on an agree/disagree scale] or "[the target person] behaves ethically".

While flexible measures avoid excessive rigidity and importing researchers' limited and possibly flawed understandings of morality into the rating procedure, the downsides are obvious if we consider how people with noxious worldviews might rate themselves and others. The notorious Nazi Adolf Eichmann, for example, appeared to have thought highly of his own moral character. Alexander "the Great" was admired for millennia, including as a moral exemplar of personal bravery and spreader of civilization, despite his main contribution being conquest through aggressive warfare, including the mass slaughter and enslavement of at least one civilian population.

I see four complications:

Relativism and Particularism. Metaethical moral relativists hold that different moral standards apply to different people or in different cultures. While I would reject extreme relativist views according to which genocide, for example, doesn't warrant universal condemnation, a moderate version of relativism has merit. Cultures might reasonably differ, for example, on the age of sexual consent, and cultures, subcultures, and social groups might reasonably differ in standards of generosity in sharing resources with neighbors and kin. If so, then flexible moralometers, employed by raters who use locally appropriate standards, will have an advantage over inflexible moralometers which might inappropriately import researchers' different standards. However, even flexible moralometers will fail in the face of relativism if they are employed by raters who employ the wrong moral standards.

According to moral particularism, morality isn't about applying consistent rules or following any specifiable code of behavior. Rather, what's morally good or bad, right or wrong, frequently depends on particular features of specific situations which cannot be fully codified in advance. While this isn't the same as relativism, it presents a similar methodological challenge: The farther the researcher or rater stands from the particular situation of the target, the more likely they are to apply inappropriate standards, since they are likely to be ignorant of relevant details. It seems reasonable to accept at least moderate particularism: The moral quality of telling a lie, stealing $20, or stopping to help a stranger, might often depend on fine details difficult to know from outside the situation.

If the most extreme forms of moral relativism or particularism (or moral skepticism) are true, then no moralometer could possibly work, since there won't be stable truths about people's morality, or the truths will be so complicated or situation dependent as to defy any practical attempt at measurement. Moderate relativism and particularism, if correct, provide reason to favor flexible standards as judged by self-ratings or the ratings of highly knowledgeable peers sensitive to relevant local details; but even in such cases all of the relevant adjustments might not be made.

Incommensurability. Goods are incommensurable if there is no fact of the matter about how they should be weighed against each other. Twenty dollar bills and ten dollar bills are commensurable: Two of the latter are worth exactly one of the former. But it's not clear how to weigh, for example, health against money or family versus career. In ethics, if Steven tells a lie in the morning and performs a kindness in the afternoon, how exactly ought these to be weighed against each other? If Tara is stingy but fair, is her overall moral character better, worse, or the same as that of Nicholle, who is generous but plays favorites? Combining different features of morality into a single overall score invites commensurability problems. Plausibly, there's no single determinately best weighting of different factors.

Again, I favor a moderate view. Probably in many cases there is no single best weighting. However, approximate judgments remain possible. Even if health and money can't be precisely weighed against each other, extreme cases permit straightforward decisions. Most of us would gladly accept a scratch on a finger for the sake of a million dollars and would gladly pay $10 to avoid stage IV cancer.  Similarly, Stalin was morally worse than Martin Luther King, even if Stalin had some virtues and King some vices. Severe sexual harassment of an employee is worse than fibbing to your spouse to get out of washing the dishes.

Moderate incommensurability limits the precision of any possible moralometer. Vices and virtues, and rights and wrongs of different types will be amenable only to rough comparison, not precise determination in a single common coin.

Moral error. If we let raters reach independent judgments about what is morally good or bad, right or wrong, they might simply get it wrong. As mentioned above, Eichmann appears to have thought well of himself, and the evidence suggests that he also regarded other Nazi leaders as morally excellent. Raters will disagree about the importance of purity norms (such as norms against sexual promiscuity), the badness of abortion, and the moral importance, or not, of being vegetarian. Bracketing relativism, then at least some of these raters must be factually mistaken about morality, on one side or another, adding substantial error into their ratings.

The error issue is enormously magnified if ordinary people's moral judgments are systematically mistaken. For example, if the philosophically discoverable moral truth is that the potential impact of your choices on future generations morally far outweighs the impact you have on the people around you (see my critiques of "longtermism" here and here), then the person who is an insufferable jerk to everyone around them but donates $5000 to an effective charity might be in fact far morally better than a personally kind and helpful person who donates nothing to charity -- but informants' ratings might very well suggest the reverse. Similar remarks would apply to any moral theory that is sharply at odds with commonsense moral intuition.

Evaluative bias. People are, of course, typically biased in their own favor. Most people (not all!) are reluctant to think of themselves as morally below average, as unkind, unfair, or callous, even if they in fact are. Social desirability bias is the well-known phenomenon that survey respondents will tend to respond to questions in a manner that presents them in a good light. Ratings of friends, family, and peers will also tend to be positively biased: People tend to view their friends and peers positively, and even when not they might be reluctant to "tell on" them to researchers. If the size of evaluative bias were consistent, it could be corrected for, but presumably it can vary considerably from case to case, introducing further noise.

Problems with Inflexible Measures

Given all these problems with flexible measures of morality, it might seem best to build our hypothetical moralometer instead around inflexible measures. Assuming physiological measures are unavailable, the most straightforward way to do this would be to employ researcher-chosen behavioral measures. We could try to measure someone's honesty by seeing whether they will cheat on a puzzle to earn more money in a laboratory setting. We could examine publicly available criminal records. We could see whether they are willing to donate a surprise bonus payment to a charity.

Unfortunately, inflexible measures don't fully escape the troubles that dog flexible measures, and they bring new troubles of their own.

Relativism and particularism. Inflexible measures probably aggravate the problems with relativism and particularism discussed above. With self-report and informant report, there's at least an opportunity for the self or the informant to take into account local standards and particulars of the situation. In contrast, inflexible measures will ordinarily be applied equally to all without adjustment for context. Suppose the measure is something like "gives a surprise bonus of $10 to charity". This might be a morally very different decision for a wealthy participant than for a needy participant. It might be a morally very different decision for a participant who would save that $10 to donate it to a different and maybe better charity than for a participant who would simply pocket the $10. But unless those other factors are being measured, as they normally would not be, they cannot be taken account of.

Incommensurability. Inflexible measures also won't avoid incommensurability problems. Suppose our moralometer includes one measure of honesty, one measure of generosity, and one measure of fairness. The default approach might be for a summary measure simply to average these three, but that might not accurately reflect morality: Maybe a small act of dishonesty in an experimental setting is far less morally important than a small act of unfairness in that same experimental setting. For example, getting an extra $1 from a researcher by lying in a task that transparently appears to demand a lie (and might even be best construed as a game in which telling untruths is just part of the task, in fact pleasing the researcher) might be approximately morally neutral while being unfair to a fellow participant in that same study might substantially hurt the other's feelings.

Sampling and ecological validity. As mentioned in my previous post on moralometers, fixed behavioral measures are also likely to have severe methodological problems concerning sampling and ecological validity. Any realistic behavioral measure is likely to capture only a small and perhaps unrepresentative part of anyone's behavior, and if it's conducted in a laboratory or experimental setting, behavior in that setting might not correlate well with behavior with real stakes in the real world. How much can we really infer about a person's overall moral character from the fact that they give their monetary bonus to charity or lie about a die roll in the lab?

Moral authority. By preferring a fixed measure, the experimenter or the designer of the moralometer takes upon themselves a certain kind of moral authority -- the authority to judge what is right and wrong, moral or immoral, in others' behavior. In some cases, as in the Eichmann case, this authority seems clearly preferable to deferring to the judgment of the target and their friends. But in other cases, it is a source of error -- since of course the experimenter or designer might be wrong about what is in fact morally good or bad.

Being wrong while taking up, at least implicitly, this mantle of moral authority has at least two features that potentially make it worse than the type of error that arises by wrongly deferring to mistaken raters. First, the error is guaranteed to be systematic. The same wrong standards will be applied to every case, rather than scattered in different (and perhaps partly canceling) directions as might be the case with rater error. And second, it risks a lack of respect: Others might reasonably object to being classified as "moral" or "immoral" by an alien set of standards devised by researchers and with which they disagree.

In Sum

The methodological problems with any potential moralometer are extremely daunting. As discussed in December, all moralometers must rely on some combination of self-report, informant report, behavioral measure, or physiological measure, and each of these methods has serious problems. Furthermore, as discussed today, a batch of issues around relativism, particularism, disagreement, incommensurability, error, and moral authority dog both flexible measures of morality (which rely on raters' judgments about what's good and bad) and inflexible measures (which rely on researchers' or designers' judgments).

Coming up... should we even want a moralometer if we could have one?  I discussed the desirability or undesirability of a perfect moralometer in December, but I want to think more carefully about the moral consequences of the more realistic case of an imperfect moralometer.