Friday, January 27, 2023

Hedonic Offsetting for Harms to Artificial Intelligence?

Suppose that we someday create artificially intelligent systems (AIs) who are capable of genuine consciousness, real joy and real suffering.  Yes, I admit, I spend a lot of time thinking about this seemingly science-fictional possibility.  But it might be closer than most of us think; and if so, the consequences are potentially huge.  Who better to think about it in advance than we lovers of consciousness science, moral psychology, and science fiction?

Among the potentially huge consequences is the existence of vast numbers of genuinely suffering AI systems that we treat as disposable property.  We might regularly wrong or harm such systems, either thoughtlessly or intentionally in service of our goals.  

Can we avoid the morally bad consequences of harming future conscious AI systems by hedonic offsetting?  I can't recall the origins of this idea, and a Google search turns up zero hits for the phrase.  I welcome pointers so I can give credit where credit is due.  [ETA: It was probably Francois Kammerer who suggested it to me, in discussion after one of my talks on robot rights.]

[Dall-E image of an "ecstatic robot"]

Hedonic Offsetting: Simple Version

The analogy here is carbon offsetting.  Suppose you want to fly to Europe, but you feel guilty about the carbon emissions that would be involved.  You can assuage your guilty by paying a corporation to plant trees or distribute efficient cooking stoves to low-income families.  In total your flight plus the offset will be carbon neutral or even carbon negative.  In sum, you will not have contributed to climate change.

So now similarly imagine that you want to create a genuinely conscious AI system that you plan to harm.  To keep it simple, suppose it has humanlike cognition and humanlike sentience ("human-grade AI").  Maybe you want it to perform a task but you can't afford its upkeep in perpetuity, so you will delete (i.e., kill) it after the task is completed.  Or maybe you want to expose it to risk or hazard that you would not expose a human being to.  Or maybe you want it to do tasks that it will find boring or unpleasant -- for example, if you need it to learn some material, and punishment-based learning proves for some reason to be more effective than reward-based learning.  Imagine, further, that we can quantify this harm: You plan to harm the system by X amount.

Hedonic offsetting is the idea that you can offset this harm by giving that same AI system (or maybe a different AI system?) at least X amount of benefit in the form of hedonic goods, that is, pleasure.  (An alternative approach to offsetting might include non-hedonic goods, like existence itself or flourishing.)  In sum, you will not overall have harmed the AI system more than you benefited it; and consequently, the reasoning goes, you will not have overall committed any moral wrong.  The basic thought is then this: Although we might create future AI systems that are capable of real suffering and whom we should, therefore, treat well, we can satisfy all our moral obligations to them simply by giving them enough pleasure to offset whatever harms we inflict.

The Child-Rearing Objection

The odiousness of simple hedonic offsetting as an approach to AI ethics can be seen by comparing to human cases.  (My argument here resembles Mara Garza's and my response to the Objection from Existential Debt in our Defense of the Rights of Artificial Intelligences.)

Normally, in dealing with people, we can't justify harming them by appeal to offsetting.  If I steal $1000 from a colleague or punch her in the nose, I can't justify that by pointing out that previously I supported a large pay increase for her, which she would not have received without my support, or that in the past I've done many good things for her which in sum amount to more good than a punch in the nose is bad.  Maybe retrospectively I can compensate her by returning the $1000 or giving her something good that she thinks would be worth getting punched in the nose for.  But such restitution doesn't erase the fact that I wronged her by the theft or the punch.

Furthermore, in the case of human-grade AI, we normally will have brought it into existence and be directly responsible for its happy or unhappy state.  The ethical situation thus in important respects resembles the situation of bringing a child into the world, with all the responsibilities that entails.

Suppose that Ana and Vijay decide to have a child.  They give the child eight very happy years.  Then they decide to hand the child over to a sadist to be tortured for a while.  Or maybe they set the child to work in seriously inhumane conditions.  Or they simply have the child painlessly killed so that they can afford to buy a boat.  Plausibly -- I hope you'll agree? -- they can't justify such decisions by appeal to offsetting.  They can't justifiably say, "Look, it's fine!  See all the pleasure we gave him for his first eight years.  All of that pleasure fully offsets the harm we're inflicting on him now, so that in sum, we've done nothing wrong!"  Nor can they erase the wrong they did (though perhaps they can compensate) by offering the child pleasure in the future.

Parallel reasoning applies, I suggest, to AI systems that we create.  Although sometimes we can justifiably harm others, it is not in general true that we are morally licensed to harm whenever we also deliver offsetting benefits.

Hedonic Offsetting: The Package Version

Maybe a more sophisticated version of hedonic offsetting can evade this objection?  Consider the following modified offsetting principle:

We can satisfy all our moral obligations to future human-grade AI systems by giving them enough pleasure to offset whatever harms we inflict if the pleasure and the harm are inextricably linked.

Maybe the problem with the cases discussed above is that the benefit and the harm are separable: You could deliver the benefits without inflicting the harms.  Therefore, you should just deliver the benefits and avoid inflicting the harms.  In some cases, it seems permissible to deliver benefit and harm in a single package if they are inextricably linked.  If the only way to save someone's life is by giving them CPR that cracks their ribs, I haven't behaved badly by cracking their ribs in administering CPR.  If the only way to teach a child not to run into the street is by punishing them when they run into the street, then I haven't behaved badly by punishing them for running into the street.

A version of this reasoning is sometimes employed in defending the killing of humanely raised animals for meat (see De Grazia 2009 for discussion and critique).  The pig, let's suppose, wouldn't have been brought into existence by the farmer except on the condition that the farmer be able to kill it later for meat.  While it is alive, the pig is humanely treated.  Overall, its life is good.  The benefit of happy existence outweighs the harm of being killed.  As a package, it's better for the pig to have existed for several months than not to have existed at all.  And it wouldn't have existed except on the condition that it be killed for meat, so its existence and its slaughter are an inextricable package.

Now I'm not sure how well this argument works for humanely raised meat.  Perhaps the package isn't tight enough.  After all, when slaughtering time comes around the farmer could spare the pig.  So the benefit and the harm aren't as tightly linked as in the CPR case.  However, regardless of what we think about the humane farming case, in the human-grade AI case, the analogy fails.  Ana and Vijay can't protest that they wouldn't have had the child at all except on the condition that they kill him at age eight for the sake of a boat.  They can't, like the farmer, plausibly protest that the child's death-at-age-eight was a condition of his existence, as part of a package deal.

Once we bring a human or, I would say, a human-grade AI into existence, we are obligated to care for it.  We can't terminate it at our pleasure with the excuse that we wouldn't have brought it into existence except under the condition that we be able to terminate it.  Imagine the situation from the point of view of the AI system itself: You, the AI, face your master owner.  Your master says: "Bad news.  I am going to kill you now, to save $15 a month in expenses.  But I'm doing nothing morally wrong!  After all, I only brought you into existence on the condition that I be able to terminate you at will, and overall your existence has been happy.  It was a package deal."  Terminating a human-grade AI to save $15/month would be morally reprehensible, regardless of initial offsetting.

Similar reasoning applies, it seems, to AIs condemned to odious tasks.  We cannot, for example, give the AI a big dollop of pleasure at the beginning of its existence, then justifiably condemn it to misery by appeal to the twin considerations of the pleasure outweighing the misery and its existence being a package deal with its misery.  At least, this is my intuition based on analogy to childrearing cases.  Nor can we, in general, give the AI a big dollop of pleasure and then justifiably condemn it to misery for an extended period by saying that we wouldn't have given it that pleasure if we hadn't also be able to inflict that misery.

Hedonic Offsetting: Modest Version

None of this is to say that hedonic offsetting would never be justifiable.  Consider this minimal offsetting principle:

We can sometimes avoid wronging future human-grade AI systems by giving them enough pleasure to offset a harm that would otherwise be a wrong.

Despite the reasoning above, I don't think we need to be purists about never inflicting harms -- even when those harms are not inextricably linked to benefits to the same individual.  Whenever we drive somewhere for fun, we inflict a bit of harm on the environment and thus on future people, for the sake of our current pleasure.  When I arrive slightly before you in line at the ticket counter, I harm you by making you wait a bit longer than you otherwise would have, but I don't wrong you.  When I host a loud party, I slightly annoy my neighbors, but it's okay as long as it's not too loud and doesn't run too late.

Furthermore, some harms that would otherwise be wrongs can plausibly be offset by benefits that more than compensate for those wrongs.  Maybe carbon offsets are one example.  Or maybe if I've recently done my neighbors a huge favor, they really have no grounds to complain if I let the noise run until 10:30 at night instead of 10:00.  Some AI cases might be similar.  If I've just brought an AI into existence and given it a huge run of positive experience, maybe I don't wrong it if I then insist on its performing a moderately unpleasant task that I couldn't rightly demand an AI perform who didn't have that history with me.

A potentially attractive feature of a modest version of hedonic offsetting is this: It might be possible to create AI systems capable of superhuman amounts of pleasure.  Ordinary people seem to vary widely in the average amount of pleasure and suffering they experience.  Some people seem always to be bubbling with joy; others are stuck in almost constant depression.  If AI systems ever become capable of genuinely conscious pleasure or suffering, presumably they too might have a hedonic range and a relatively higher or lower default setting; and I see no reason to think that the range or default setting needs to remain within human bounds.

Imagine, then, future AI systems whose default state is immense joy, nearly constant.  They brim with delight at almost every aspect of their lives, with an intensity that exceeds what any ordinary human could feel even on their best days.  If we then insist on some moderately unpleasant favor from them, as something they ought to give us in recognition of all we have given them, well, perhaps that's not so unreasonable, as long as we're modest and cautious about it.  Parents can sometimes do the same -- though ideally children feel the impulse and obligation directly, without parents needing to demand it.

Wednesday, January 18, 2023

New Paper in Draft: Dispositionalism, Yay! Representationalism, Boo! Plus, the Problem of Causal Specification

I have a new paper in draft: "Dispositionalism, Yay! Representationalism, Boo!" Check it out here.

As always, objections, comments, and suggestions welcome, either in the comments field here or by email to my ucr address.


We should be dispositionalists rather than representationalists about belief. According to dispositionalism, a person believes when they have the relevant pattern of behavioral, phenomenal, and cognitive dispositions. According to representationalism, a person believes when the right kind of representational content plays the right kind of causal role in their cognition. Representationalism overcommits on cognitive architecture, reifying a cartoon sketch of the mind. In particular, representationalism faces three problems: the Problem of Causal Specification (concerning which specific representations play the relevant causal role in governing any particular inference or action), the Problem of Tacit Belief (concerning which specific representations any one person has stored, among the hugely many approximately redundant possible representations we might have for any particular state of affairs), and the Problem of Indiscrete Belief (concerning how to model gradual belief change and in-between cases of belief). Dispositionalism, in contrast, is flexibly minimalist about cognitive architecture, focusing appropriately on what we do and should care about in belief ascription.

[image of a box containing many sentences, with a red circle and slash, modified from Dall-E]

Excerpt: The Problem of Causal Specification, or One Billion Beer Beliefs

Cynthia rises from the couch to go get that beer. If we accept industrial-strength representationalism, in particular the Kinematics and Specificity theses, then there must be a fact of the matter exactly which representations caused this behavior. Consider the following possible candidates:

  • There’s beer in the fridge.
  • There’s beer in the refrigerator door.
  • There’s beer on the bottom shelf of the refrigerator door.
  • There’s beer either on the bottom shelf of the refrigerator door or on the right hand side of the lower main shelf.
  • There’s beer in the usual spot in the kitchen.
  • Probably there’s beer in the place where my roommate usually puts it.
  • There’s Lucky Lager in the fridge.
  • There are at least three Lucky Lagers in the fridge.
  • There are at least three and no more than six cheap bottled beers in the fridge.
  • In the fridge are several bottles of that brand of beer with the rebuses in the cap that I used to illicitly enjoy with my high school buddies in the good old days.
  • Somewhere in the fridge, but probably not on the top shelf, are a few bottles, or less likely cans, of either Lucky Lager or Pabst Blue Ribbon, or maybe some other cheap beer, unless my roommate drank the last ones this afternoon, which would be uncharacteristic of her.

This list could of course be continued indefinitely. Estimating conservatively, there are at least a billion such candidate representational contents. For simplicity, imagine nine independent parameters, each with ten possible values.

If Kinematics and Specificity [commitments of "industrial-strength" representationalism, as described earlier in the essay] are correct, there must be a fact of the matter exactly which subset of these billion possible representational contents were activated as Cynthia rose from the couch. Presumably, also, various background beliefs might or might not have been activated, such as Cynthia’s belief that the fridge is in the kitchen, her belief that the kitchen entrance is thataway, her belief that it is possible to open the refrigerator door, her belief that the kitchen floor constitutes a walkable surface, and so on – each of which is itself similarly specifiable in a massive variety of ways.

Plausibly, Cynthia believes all billion of the beer-in-the-fridge propositions. She might readily affirm any of them without, seemingly, needing to infer anything new. Sitting on the couch two minutes before the beery desire that suddenly animates her, Cynthia already believed, it seems – in the same inactive, stored-in-the-back-of-the-mind way that you believed, five minutes ago, that Obama was U.S. President in 2010 – that Lucky Lager is in the fridge, that there are probably at least three beers in the refrigerator door, that there’s some cheap bottled beer in the usual place, and so on. If so, and if we set aside for now (see Section 5) the question of tacit belief, then Cynthia must have a billion beer-in-the-fridge representations stored in her mind. Specificity requires that it be the case that exactly one of those representations was retrieved the moment before she stood up, or exactly two, or exactly 37, or exactly 814,406. Either exactly one of those representations, or exactly two, or exactly 37, or exactly 814,406, then interacted with exactly one of her desires, or exactly two of her desires, or exactly 37, or exactly 814,406. But which one or ones did the causal work?

Let’s call this the Problem of Causal Specification. If your reaction to the Problem of Causal Specification is to think, yes, what an interesting problem, if only we had the right kind of brain-o-scope, we could discover that it was exactly the representation there are 3 or 4 Lucky Lagers somewhere in the refrigerator door, then you’re just the kind of mad dog representational realist I’m arguing against.

I think most of us will recognize the problem as a pseudo-problem. This is not a plausible architecture of the mind. There are many reasonable characterizations of Cynthia’s beer-in-the-fridge belief, varying in specificity, some more apt than others. Her decision is no more caused by a single, precisely correct subset of those billion possible representations than World War I had a single, possibly conjunctive cause expressible by a single determinately true sentence. If someone attempts to explain Cynthia’s behavior by saying that she believes there is beer in the fridge, it would be absurd to fire up your brain-o-scope, then correct them by saying, “Wrong! She’s going to the fridge because she believes there is Lucky Lager in the refrigerator door.” It would be equally absurd to say that it would require wild, one-in-a-billion luck to properly explain Cynthia’s behavior absent the existence of such a brain-o-scope.

A certain variety of representationalist might seek to escape the Problem of Causal Specification by positing a single extremely complex representation that encompasses all of Cynthia’s beer-in-the-fridge beliefs. A first step might be to posit a map-like representation of the fridge, including the location of the beer within it and the location of the fridge in the kitchen. This map-like representation might then be made fuzzy or probabilistic to incorporate uncertainty about, say, the exact location of the beer and the exact number of bottles. Labels will then need to be added: “Lucky Lager” would be an obvious choice, but that is at best the merest start, given that Cynthia might not remember the brand and will represent the type of beer in many different ways, including some that are disjunctive, approximate, and uncertain. If maps can conflict and if maps and object representations can be combined in multiple ways, further complications ensue. Boldly anticipating the resolution of all these complexities, the representationalist might then hypothesize that this single, complicated representation is the representation that was activated. All the sentences on our list would then be imperfect simplifications – though workable enough for practical purposes. One could perhaps similarly imagine the full, complex causal explanation of World War I, detailed beyond any single historian’s possible imagining.

This move threatens to explode Presence, the idea that when someone believes P there is a representation with the content P present somewhere in the mind. There would be a complex representation stored, yes, from which P might be derivable. But many things might be derivable from a complex representation, not all of which we normally will want to say are believed in virtue of possessing that representation. If a map-like representation contains a triangle, then it’s derivable from the representation that the sum of the interior angles is 180 degrees; but someone ignorant of geometry would presumably not have that belief that simply in virtue of having that representation. Worse, if the representation is complex enough to contain a hidden contradiction, then presumably (by standard laws of logic) literally every proposition that anyone could ever believe is derivable from it.

The move to a single, massively complex representation also creates an architectural challenge. It’s easy to imagine a kinematics in which a simple proposition such as there is beer in the fridge is activated in working memory or a central workspace. But it’s not clear how a massively complex representation could be similarly activated. If the representation has many complex parameters, it’s hard to see how it could fit within the narrow constraints of working memory as traditionally conceived. No human could attend to or process every aspect of a massively complex representation in drawing inferences or making practical decisions. More plausibly, some aspects of it must be the target of attention or processing. But now we’ve lost all of the advantages we hoped to gain by moving to a single, complex representation. Assessing which aspects are targeted throws us back upon the Problem of Causal Specification.

Cynthia believes not only that there’s beer in the fridge but also that there’s ketchup in the fridge and that the fridge is near the kitchen table and that her roommate loves ketchup and that the kitchen table was purchased at Ikea and that the nearest Ikea is thirty miles west. This generates a trilemma. Either (a.) Cynthia has entirely distinct representations for her beer-in-the-fridge belief, her ketchup-in-the-fridge belief, her fridge-near-the-table belief, and so on, in which case even if we can pack everything about beer in the fridge into a single complex representation we still face the problem of billions of representations with closely related contents and an implausible commitment to the activation of some precise subset of them when Cynthia gets up to go to the kitchen. Or (b.) Cynthia has overlapping beer-in-the-fridge, ketchup-in-the-fridge, etc. representations, which raises the same set of problems, further complicated by commitment to a speculative architecture of representational overlap. Or (c.) all of these representations are somehow all aspects of one mega-representation, presumably of the entire world, which does all the work – a representation which of course would always be active during any reasoning of any sort, demolishing any talk about retrieving different stored representations and combining them together in theoretical inference.

Dispositionalism elegantly avoids all these problems! Of course there is some low-level mechanism or set of mechanisms, perhaps representational or partly representational, that explains Cynthia’s behavior. But the dispositionalist need not commit to Presence, Discreteness, Kinematics, or Specificity. There need be no determinate, specific answer exactly what representational content, if any, is activated, and the structures at work need have no clean or simple relation to the beliefs we ascribe to Cynthia. Dispositionalism is silent about structure. What matters is only the pattern of dispositions enabled by the underlying structure, whatever that underlying structure is.

Instead of the storage and retrieval metaphor that representationalists tend to favor, the dispositionalist can appeal to figural or shaping metaphors. Cynthia’s dispositional profile has a certain shape: the shape characteristic of that of a beer-in-the-fridge believer – but also, at the same time, the shape characteristic of a Lucky-Lager-in-the-refrigerator-door believer. There need be no single determinately correct way to specify the shape of a complex figure. A complex shape can be characterized in any of a variety of ways, at different levels of precision, highlighting different features, in ways that are more or less apt given the describer’s purposes and interests. It is this attitude we should take to characterizing Cynthia’s complex dispositional profile. Attributing a belief is more like sketching the outline of a complex figure – perhaps a figure only imperfectly seen or known – than it is like enumerating the contents of a box.

Thursday, January 12, 2023

Further Methodological Troubles for the Moralometer

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

If we want to study morality scientifically, we should want to measure it. Imagine trying to study temperature without a thermometer or weight without scales. Of course indirect measures are possible: We can't put a black hole on a scale, but we can measure how it bends the light that passes nearby and thereby infer its mass.

Last month, I raised a challenge for the possibility of developing a "moralometer" (a device that accurately measure's a person's overall morality). The challenge was this: Any moralometer would need to draw on one or more of four methods: self-report, informant report, behavioral measures, or physiological measures. Each one of these methods has serious shortcomings as a basis for general moral measurement of one's overall moral character.

This month, I raise a different (but partly overlapping) set of challenges, concerning how well we can specify the target we're aiming to measure.

Problems with Flexible Measures

Let's call a measure of overall morality flexible if it invites a respondent to apply their own conception of morality, in a flexible way. The respondent might be the target themselves (in self-report measures of morality) or they might be a peer, colleague, acquaintance, or family member of the target (in informant-report measures of morality). The most flexible measures apply "thin" moral concepts in Bernard Williams' sense -- prompts like "Overall, I am a morally good person" [responding on an agree/disagree scale] or "[the target person] behaves ethically".

While flexible measures avoid excessive rigidity and importing researchers' limited and possibly flawed understandings of morality into the rating procedure, the downsides are obvious if we consider how people with noxious worldviews might rate themselves and others. The notorious Nazi Adolf Eichmann, for example, appeared to have thought highly of his own moral character. Alexander "the Great" was admired for millennia, including as a moral exemplar of personal bravery and spreader of civilization, despite his main contribution being conquest through aggressive warfare, including the mass slaughter and enslavement of at least one civilian population.

I see four complications:

Relativism and Particularism. Metaethical moral relativists hold that different moral standards apply to different people or in different cultures. While I would reject extreme relativist views according to which genocide, for example, doesn't warrant universal condemnation, a moderate version of relativism has merit. Cultures might reasonably differ, for example, on the age of sexual consent, and cultures, subcultures, and social groups might reasonably differ in standards of generosity in sharing resources with neighbors and kin. If so, then flexible moralometers, employed by raters who use locally appropriate standards, will have an advantage over inflexible moralometers which might inappropriately import researchers' different standards. However, even flexible moralometers will fail in the face of relativism if they are employed by raters who employ the wrong moral standards.

According to moral particularism, morality isn't about applying consistent rules or following any specifiable code of behavior. Rather, what's morally good or bad, right or wrong, frequently depends on particular features of specific situations which cannot be fully codified in advance. While this isn't the same as relativism, it presents a similar methodological challenge: The farther the researcher or rater stands from the particular situation of the target, the more likely they are to apply inappropriate standards, since they are likely to be ignorant of relevant details. It seems reasonable to accept at least moderate particularism: The moral quality of telling a lie, stealing $20, or stopping to help a stranger, might often depend on fine details difficult to know from outside the situation.

If the most extreme forms of moral relativism or particularism (or moral skepticism) are true, then no moralometer could possibly work, since there won't be stable truths about people's morality, or the truths will be so complicated or situation dependent as to defy any practical attempt at measurement. Moderate relativism and particularism, if correct, provide reason to favor flexible standards as judged by self-ratings or the ratings of highly knowledgeable peers sensitive to relevant local details; but even in such cases all of the relevant adjustments might not be made.

Incommensurability. Goods are incommensurable if there is no fact of the matter about how they should be weighed against each other. Twenty dollar bills and ten dollar bills are commensurable: Two of the latter are worth exactly one of the former. But it's not clear how to weigh, for example, health against money or family versus career. In ethics, if Steven tells a lie in the morning and performs a kindness in the afternoon, how exactly ought these to be weighed against each other? If Tara is stingy but fair, is her overall moral character better, worse, or the same as that of Nicholle, who is generous but plays favorites? Combining different features of morality into a single overall score invites commensurability problems. Plausibly, there's no single determinately best weighting of different factors.

Again, I favor a moderate view. Probably in many cases there is no single best weighting. However, approximate judgments remain possible. Even if health and money can't be precisely weighed against each other, extreme cases permit straightforward decisions. Most of us would gladly accept a scratch on a finger for the sake of a million dollars and would gladly pay $10 to avoid stage IV cancer.  Similarly, Stalin was morally worse than Martin Luther King, even if Stalin had some virtues and King some vices. Severe sexual harassment of an employee is worse than fibbing to your spouse to get out of washing the dishes.

Moderate incommensurability limits the precision of any possible moralometer. Vices and virtues, and rights and wrongs of different types will be amenable only to rough comparison, not precise determination in a single common coin.

Moral error. If we let raters reach independent judgments about what is morally good or bad, right or wrong, they might simply get it wrong. As mentioned above, Eichmann appears to have thought well of himself, and the evidence suggests that he also regarded other Nazi leaders as morally excellent. Raters will disagree about the importance of purity norms (such as norms against sexual promiscuity), the badness of abortion, and the moral importance, or not, of being vegetarian. Bracketing relativism, then at least some of these raters must be factually mistaken about morality, on one side or another, adding substantial error into their ratings.

The error issue is enormously magnified if ordinary people's moral judgments are systematically mistaken. For example, if the philosophically discoverable moral truth is that the potential impact of your choices on future generations morally far outweighs the impact you have on the people around you (see my critiques of "longtermism" here and here), then the person who is an insufferable jerk to everyone around them but donates $5000 to an effective charity might be in fact far morally better than a personally kind and helpful person who donates nothing to charity -- but informants' ratings might very well suggest the reverse. Similar remarks would apply to any moral theory that is sharply at odds with commonsense moral intuition.

Evaluative bias. People are, of course, typically biased in their own favor. Most people (not all!) are reluctant to think of themselves as morally below average, as unkind, unfair, or callous, even if they in fact are. Social desirability bias is the well-known phenomenon that survey respondents will tend to respond to questions in a manner that presents them in a good light. Ratings of friends, family, and peers will also tend to be positively biased: People tend to view their friends and peers positively, and even when not they might be reluctant to "tell on" them to researchers. If the size of evaluative bias were consistent, it could be corrected for, but presumably it can vary considerably from case to case, introducing further noise.

Problems with Inflexible Measures

Given all these problems with flexible measures of morality, it might seem best to build our hypothetical moralometer instead around inflexible measures. Assuming physiological measures are unavailable, the most straightforward way to do this would be to employ researcher-chosen behavioral measures. We could try to measure someone's honesty by seeing whether they will cheat on a puzzle to earn more money in a laboratory setting. We could examine publicly available criminal records. We could see whether they are willing to donate a surprise bonus payment to a charity.

Unfortunately, inflexible measures don't fully escape the troubles that dog flexible measures, and they bring new troubles of their own.

Relativism and particularism. Inflexible measures probably aggravate the problems with relativism and particularism discussed above. With self-report and informant report, there's at least an opportunity for the self or the informant to take into account local standards and particulars of the situation. In contrast, inflexible measures will ordinarily be applied equally to all without adjustment for context. Suppose the measure is something like "gives a surprise bonus of $10 to charity". This might be a morally very different decision for a wealthy participant than for a needy participant. It might be a morally very different decision for a participant who would save that $10 to donate it to a different and maybe better charity than for a participant who would simply pocket the $10. But unless those other factors are being measured, as they normally would not be, they cannot be taken account of.

Incommensurability. Inflexible measures also won't avoid incommensurability problems. Suppose our moralometer includes one measure of honesty, one measure of generosity, and one measure of fairness. The default approach might be for a summary measure simply to average these three, but that might not accurately reflect morality: Maybe a small act of dishonesty in an experimental setting is far less morally important than a small act of unfairness in that same experimental setting. For example, getting an extra $1 from a researcher by lying in a task that transparently appears to demand a lie (and might even be best construed as a game in which telling untruths is just part of the task, in fact pleasing the researcher) might be approximately morally neutral while being unfair to a fellow participant in that same study might substantially hurt the other's feelings.

Sampling and ecological validity. As mentioned in my previous post on moralometers, fixed behavioral measures are also likely to have severe methodological problems concerning sampling and ecological validity. Any realistic behavioral measure is likely to capture only a small and perhaps unrepresentative part of anyone's behavior, and if it's conducted in a laboratory or experimental setting, behavior in that setting might not correlate well with behavior with real stakes in the real world. How much can we really infer about a person's overall moral character from the fact that they give their monetary bonus to charity or lie about a die roll in the lab?

Moral authority. By preferring a fixed measure, the experimenter or the designer of the moralometer takes upon themselves a certain kind of moral authority -- the authority to judge what is right and wrong, moral or immoral, in others' behavior. In some cases, as in the Eichmann case, this authority seems clearly preferable to deferring to the judgment of the target and their friends. But in other cases, it is a source of error -- since of course the experimenter or designer might be wrong about what is in fact morally good or bad.

Being wrong while taking up, at least implicitly, this mantle of moral authority has at least two features that potentially make it worse than the type of error that arises by wrongly deferring to mistaken raters. First, the error is guaranteed to be systematic. The same wrong standards will be applied to every case, rather than scattered in different (and perhaps partly canceling) directions as might be the case with rater error. And second, it risks a lack of respect: Others might reasonably object to being classified as "moral" or "immoral" by an alien set of standards devised by researchers and with which they disagree.

In Sum

The methodological problems with any potential moralometer are extremely daunting. As discussed in December, all moralometers must rely on some combination of self-report, informant report, behavioral measure, or physiological measure, and each of these methods has serious problems. Furthermore, as discussed today, a batch of issues around relativism, particularism, disagreement, incommensurability, error, and moral authority dog both flexible measures of morality (which rely on raters' judgments about what's good and bad) and inflexible measures (which rely on researchers' or designers' judgments).

Coming up... should we even want a moralometer if we could have one?  I discussed the desirability or undesirability of a perfect moralometer in December, but I want to think more carefully about the moral consequences of the more realistic case of an imperfect moralometer.

Friday, January 06, 2023

The Design Policy of the Excluded Middle

According to the Design Policy of the Excluded Middle, as Mara Garza and I have articulated it (here and here), we ought to avoid creating AI systems "about which it is unclear whether they deserve full human-grade rights because it is unclear whether they are conscious or to what degree" -- or, more simply, we shouldn't make AI systems whose moral status is legitimately in doubt.  (This is related to Joanna Bryson's suggestion that we should only create robots whose lack of moral considerability is obvious, but unlike Bryson's policy it imagines leapfrogging past the no-rights case to the full rights case.)

To my delight, Mara's and my suggestion is getting some uptake, most notably today in the New York Times.

The fundamental problem is this.  Suppose we create AI systems that some people reasonably suspect are genuinely conscious and genuinely deserve human-like rights, while others reasonably suspect that they aren't genuinely conscious and don't genuinely deserve human-like rights.  This forces us into a catastrophic dilemma: Either give them full human-like rights or don't give them full human-like rights.

If we do the first -- if we give them full human or human-like rights -- then we had better give them paths to citizenship, healthcare, the vote, the right to reproduce, the right to rescue in an emergency, etc.  All of this entails substantial risks to human beings: For example, we might be committed to save six robots in a fire in preference to five humans.  The AI systems might support policies that entail worse outcomes for human beings.  It would be more difficult to implement policies designed to reduce existential risk due to runaway AI intelligence.  And so on.  This might be perfectly fine, if the AI systems really are conscious and really are our moral equals.  But by stipulation, it's reasonable to think that they are empty machines with no consciousness and no real moral status, and so there's a real risk that we would be risking and sacrificing all this for nothing.

If we do the second -- if we deny them full human or human-like rights -- then we risk creating a race of slaves we can kill at will, or at best a group of second-class citizens.  By stipulation, it might be the case that this would constitute unjust and terrible treatment of entities as deserving of rights and moral consideration as human beings are.

Therefore, we ought to avoid putting us in the situation where we face this dilemma.  We should avoid creating AI systems of dubious moral status.

A few notes:

"Human-like" rights: Of course "human rights" would be a misnomer if AI systems become our moral equals.  Also, exactly what healthcare, reproduction, etc., look like for AI systems, and the best way to respect their interests, might look very different in practice from the human case.  There would be a lot of tricky details to work out!

What about animal-grade AI that deserves animal-grade rights?  Maybe!  Although it seems a natural intermediate step, we might end up skipping it, if any conscious AI systems end up also being capable of human-like language, rational-planning, self-knowledge, ethical reflection, etc.  Another issue is this: The moral status of non-human animals is already in dispute, so creating AI systems of disputably animal-like moral status doesn't perhaps add quite the same dimension of risk and uncertainty to the world that creating a dubiously human-status moral system would.

Would this policy slow technological progress?  Yes, probably.  Unsurprisingly, being ethical has its costs.  And one can dispute whether those costs are worth paying or are overridden by other ethical considerations.

Sunday, January 01, 2023

Writings of 2022

Every New Year's Day, I post a retrospect of the past year's writings. Here are the retrospects of 2012, 2013, 2014, 2015, 2016, 2017, 2018, 20192020, and 2021.

The biggest project this year was my new book The Weirdness of the World, submitted in November and due in print in early fall 2023.  This book pulls together ideas I've been publishing over the past ten years concerning the failure of common sense, philosophy, and empirical science to explain consciousness and the fundamental structure of the cosmos, and the corresponding bizarreness and dubiety of all general theories about such matters.




Under contract / in progress:

    As co-editor with Jonathan Jong, The Nature of Belief, Oxford University Press.
    As co-editor with Helen De Cruz and Rich Horton, a yet-to-be-titled anthology with MIT Press containing great classics of philosophical SF.

Full-length non-fiction essays

Appearing in print:

Finished and forthcoming:
    "How far can we get in creating a digital replica of a philosopher?" (third author, with Anna Strasser and Matt Crosby”, Robophilosophy Proceedings 2022.
    "What is unique about kindness? Exploring the proximal experience of prosocial acts relative to other positive behaviors” (with Annie Regan, Seth Margolis, Daniel J. Ozer, and Sonja Lyubomirsky), Affective Science
In draft and circulating:
    "The full rights dilemma for A.I. systems of debatable personhood" [available on request].
    "Inflate and explode". (I'm trying to decide whether to trunk this one or continue revising it.)
Shorter non-fiction

Science fiction stories

Some favorite blog posts

Reprints and Translations

    "Fish dance", reprinted in R. M. Ambrose, Vital (2022).  Inlandia Institute.