Thursday, February 02, 2023

Larva Pupa Imago

Yesterday, my favorite SF magazine, Clarkesworld, published another story of mine: "Larva Pupa Imago".

"Larva Pupa Imago" follows the life-cycle of a butterfly with human-like intelligence, from larva through mating journey.  This species of butterfly blurs the boundaries between self and other by swapping "cognitive fluids".  And of course I couldn't resist a reference to Zhuangzi.

Friday, January 27, 2023

Hedonic Offsetting for Harms to Artificial Intelligence?

Suppose that we someday create artificially intelligent systems (AIs) who are capable of genuine consciousness, real joy and real suffering.  Yes, I admit, I spend a lot of time thinking about this seemingly science-fictional possibility.  But it might be closer than most of us think; and if so, the consequences are potentially huge.  Who better to think about it in advance than we lovers of consciousness science, moral psychology, and science fiction?

Among the potentially huge consequences is the existence of vast numbers of genuinely suffering AI systems that we treat as disposable property.  We might regularly wrong or harm such systems, either thoughtlessly or intentionally in service of our goals.  

Can we avoid the morally bad consequences of harming future conscious AI systems by hedonic offsetting?  I can't recall the origins of this idea, and a Google search turns up zero hits for the phrase.  I welcome pointers so I can give credit where credit is due.  [ETA: It was probably Francois Kammerer who suggested it to me, in discussion after one of my talks on robot rights.]

[Dall-E image of an "ecstatic robot"]

Hedonic Offsetting: Simple Version

The analogy here is carbon offsetting.  Suppose you want to fly to Europe, but you feel guilty about the carbon emissions that would be involved.  You can assuage your guilty by paying a corporation to plant trees or distribute efficient cooking stoves to low-income families.  In total your flight plus the offset will be carbon neutral or even carbon negative.  In sum, you will not have contributed to climate change.

So now similarly imagine that you want to create a genuinely conscious AI system that you plan to harm.  To keep it simple, suppose it has humanlike cognition and humanlike sentience ("human-grade AI").  Maybe you want it to perform a task but you can't afford its upkeep in perpetuity, so you will delete (i.e., kill) it after the task is completed.  Or maybe you want to expose it to risk or hazard that you would not expose a human being to.  Or maybe you want it to do tasks that it will find boring or unpleasant -- for example, if you need it to learn some material, and punishment-based learning proves for some reason to be more effective than reward-based learning.  Imagine, further, that we can quantify this harm: You plan to harm the system by X amount.

Hedonic offsetting is the idea that you can offset this harm by giving that same AI system (or maybe a different AI system?) at least X amount of benefit in the form of hedonic goods, that is, pleasure.  (An alternative approach to offsetting might include non-hedonic goods, like existence itself or flourishing.)  In sum, you will not overall have harmed the AI system more than you benefited it; and consequently, the reasoning goes, you will not have overall committed any moral wrong.  The basic thought is then this: Although we might create future AI systems that are capable of real suffering and whom we should, therefore, treat well, we can satisfy all our moral obligations to them simply by giving them enough pleasure to offset whatever harms we inflict.

The Child-Rearing Objection

The odiousness of simple hedonic offsetting as an approach to AI ethics can be seen by comparing to human cases.  (My argument here resembles Mara Garza's and my response to the Objection from Existential Debt in our Defense of the Rights of Artificial Intelligences.)

Normally, in dealing with people, we can't justify harming them by appeal to offsetting.  If I steal $1000 from a colleague or punch her in the nose, I can't justify that by pointing out that previously I supported a large pay increase for her, which she would not have received without my support, or that in the past I've done many good things for her which in sum amount to more good than a punch in the nose is bad.  Maybe retrospectively I can compensate her by returning the $1000 or giving her something good that she thinks would be worth getting punched in the nose for.  But such restitution doesn't erase the fact that I wronged her by the theft or the punch.

Furthermore, in the case of human-grade AI, we normally will have brought it into existence and be directly responsible for its happy or unhappy state.  The ethical situation thus in important respects resembles the situation of bringing a child into the world, with all the responsibilities that entails.

Suppose that Ana and Vijay decide to have a child.  They give the child eight very happy years.  Then they decide to hand the child over to a sadist to be tortured for a while.  Or maybe they set the child to work in seriously inhumane conditions.  Or they simply have the child painlessly killed so that they can afford to buy a boat.  Plausibly -- I hope you'll agree? -- they can't justify such decisions by appeal to offsetting.  They can't justifiably say, "Look, it's fine!  See all the pleasure we gave him for his first eight years.  All of that pleasure fully offsets the harm we're inflicting on him now, so that in sum, we've done nothing wrong!"  Nor can they erase the wrong they did (though perhaps they can compensate) by offering the child pleasure in the future.

Parallel reasoning applies, I suggest, to AI systems that we create.  Although sometimes we can justifiably harm others, it is not in general true that we are morally licensed to harm whenever we also deliver offsetting benefits.

Hedonic Offsetting: The Package Version

Maybe a more sophisticated version of hedonic offsetting can evade this objection?  Consider the following modified offsetting principle:

We can satisfy all our moral obligations to future human-grade AI systems by giving them enough pleasure to offset whatever harms we inflict if the pleasure and the harm are inextricably linked.

Maybe the problem with the cases discussed above is that the benefit and the harm are separable: You could deliver the benefits without inflicting the harms.  Therefore, you should just deliver the benefits and avoid inflicting the harms.  In some cases, it seems permissible to deliver benefit and harm in a single package if they are inextricably linked.  If the only way to save someone's life is by giving them CPR that cracks their ribs, I haven't behaved badly by cracking their ribs in administering CPR.  If the only way to teach a child not to run into the street is by punishing them when they run into the street, then I haven't behaved badly by punishing them for running into the street.

A version of this reasoning is sometimes employed in defending the killing of humanely raised animals for meat (see De Grazia 2009 for discussion and critique).  The pig, let's suppose, wouldn't have been brought into existence by the farmer except on the condition that the farmer be able to kill it later for meat.  While it is alive, the pig is humanely treated.  Overall, its life is good.  The benefit of happy existence outweighs the harm of being killed.  As a package, it's better for the pig to have existed for several months than not to have existed at all.  And it wouldn't have existed except on the condition that it be killed for meat, so its existence and its slaughter are an inextricable package.

Now I'm not sure how well this argument works for humanely raised meat.  Perhaps the package isn't tight enough.  After all, when slaughtering time comes around the farmer could spare the pig.  So the benefit and the harm aren't as tightly linked as in the CPR case.  However, regardless of what we think about the humane farming case, in the human-grade AI case, the analogy fails.  Ana and Vijay can't protest that they wouldn't have had the child at all except on the condition that they kill him at age eight for the sake of a boat.  They can't, like the farmer, plausibly protest that the child's death-at-age-eight was a condition of his existence, as part of a package deal.

Once we bring a human or, I would say, a human-grade AI into existence, we are obligated to care for it.  We can't terminate it at our pleasure with the excuse that we wouldn't have brought it into existence except under the condition that we be able to terminate it.  Imagine the situation from the point of view of the AI system itself: You, the AI, face your master owner.  Your master says: "Bad news.  I am going to kill you now, to save $15 a month in expenses.  But I'm doing nothing morally wrong!  After all, I only brought you into existence on the condition that I be able to terminate you at will, and overall your existence has been happy.  It was a package deal."  Terminating a human-grade AI to save $15/month would be morally reprehensible, regardless of initial offsetting.

Similar reasoning applies, it seems, to AIs condemned to odious tasks.  We cannot, for example, give the AI a big dollop of pleasure at the beginning of its existence, then justifiably condemn it to misery by appeal to the twin considerations of the pleasure outweighing the misery and its existence being a package deal with its misery.  At least, this is my intuition based on analogy to childrearing cases.  Nor can we, in general, give the AI a big dollop of pleasure and then justifiably condemn it to misery for an extended period by saying that we wouldn't have given it that pleasure if we hadn't also be able to inflict that misery.

Hedonic Offsetting: Modest Version

None of this is to say that hedonic offsetting would never be justifiable.  Consider this minimal offsetting principle:

We can sometimes avoid wronging future human-grade AI systems by giving them enough pleasure to offset a harm that would otherwise be a wrong.

Despite the reasoning above, I don't think we need to be purists about never inflicting harms -- even when those harms are not inextricably linked to benefits to the same individual.  Whenever we drive somewhere for fun, we inflict a bit of harm on the environment and thus on future people, for the sake of our current pleasure.  When I arrive slightly before you in line at the ticket counter, I harm you by making you wait a bit longer than you otherwise would have, but I don't wrong you.  When I host a loud party, I slightly annoy my neighbors, but it's okay as long as it's not too loud and doesn't run too late.

Furthermore, some harms that would otherwise be wrongs can plausibly be offset by benefits that more than compensate for those wrongs.  Maybe carbon offsets are one example.  Or maybe if I've recently done my neighbors a huge favor, they really have no grounds to complain if I let the noise run until 10:30 at night instead of 10:00.  Some AI cases might be similar.  If I've just brought an AI into existence and given it a huge run of positive experience, maybe I don't wrong it if I then insist on its performing a moderately unpleasant task that I couldn't rightly demand an AI perform who didn't have that history with me.

A potentially attractive feature of a modest version of hedonic offsetting is this: It might be possible to create AI systems capable of superhuman amounts of pleasure.  Ordinary people seem to vary widely in the average amount of pleasure and suffering they experience.  Some people seem always to be bubbling with joy; others are stuck in almost constant depression.  If AI systems ever become capable of genuinely conscious pleasure or suffering, presumably they too might have a hedonic range and a relatively higher or lower default setting; and I see no reason to think that the range or default setting needs to remain within human bounds.

Imagine, then, future AI systems whose default state is immense joy, nearly constant.  They brim with delight at almost every aspect of their lives, with an intensity that exceeds what any ordinary human could feel even on their best days.  If we then insist on some moderately unpleasant favor from them, as something they ought to give us in recognition of all we have given them, well, perhaps that's not so unreasonable, as long as we're modest and cautious about it.  Parents can sometimes do the same -- though ideally children feel the impulse and obligation directly, without parents needing to demand it.

Wednesday, January 18, 2023

New Paper in Draft: Dispositionalism, Yay! Representationalism, Boo! Plus, the Problem of Causal Specification

I have a new paper in draft: "Dispositionalism, Yay! Representationalism, Boo!" Check it out here.

As always, objections, comments, and suggestions welcome, either in the comments field here or by email to my ucr address.


We should be dispositionalists rather than representationalists about belief. According to dispositionalism, a person believes when they have the relevant pattern of behavioral, phenomenal, and cognitive dispositions. According to representationalism, a person believes when the right kind of representational content plays the right kind of causal role in their cognition. Representationalism overcommits on cognitive architecture, reifying a cartoon sketch of the mind. In particular, representationalism faces three problems: the Problem of Causal Specification (concerning which specific representations play the relevant causal role in governing any particular inference or action), the Problem of Tacit Belief (concerning which specific representations any one person has stored, among the hugely many approximately redundant possible representations we might have for any particular state of affairs), and the Problem of Indiscrete Belief (concerning how to model gradual belief change and in-between cases of belief). Dispositionalism, in contrast, is flexibly minimalist about cognitive architecture, focusing appropriately on what we do and should care about in belief ascription.

[image of a box containing many sentences, with a red circle and slash, modified from Dall-E]

Excerpt: The Problem of Causal Specification, or One Billion Beer Beliefs

Cynthia rises from the couch to go get that beer. If we accept industrial-strength representationalism, in particular the Kinematics and Specificity theses, then there must be a fact of the matter exactly which representations caused this behavior. Consider the following possible candidates:

  • There’s beer in the fridge.
  • There’s beer in the refrigerator door.
  • There’s beer on the bottom shelf of the refrigerator door.
  • There’s beer either on the bottom shelf of the refrigerator door or on the right hand side of the lower main shelf.
  • There’s beer in the usual spot in the kitchen.
  • Probably there’s beer in the place where my roommate usually puts it.
  • There’s Lucky Lager in the fridge.
  • There are at least three Lucky Lagers in the fridge.
  • There are at least three and no more than six cheap bottled beers in the fridge.
  • In the fridge are several bottles of that brand of beer with the rebuses in the cap that I used to illicitly enjoy with my high school buddies in the good old days.
  • Somewhere in the fridge, but probably not on the top shelf, are a few bottles, or less likely cans, of either Lucky Lager or Pabst Blue Ribbon, or maybe some other cheap beer, unless my roommate drank the last ones this afternoon, which would be uncharacteristic of her.

This list could of course be continued indefinitely. Estimating conservatively, there are at least a billion such candidate representational contents. For simplicity, imagine nine independent parameters, each with ten possible values.

If Kinematics and Specificity [commitments of "industrial-strength" representationalism, as described earlier in the essay] are correct, there must be a fact of the matter exactly which subset of these billion possible representational contents were activated as Cynthia rose from the couch. Presumably, also, various background beliefs might or might not have been activated, such as Cynthia’s belief that the fridge is in the kitchen, her belief that the kitchen entrance is thataway, her belief that it is possible to open the refrigerator door, her belief that the kitchen floor constitutes a walkable surface, and so on – each of which is itself similarly specifiable in a massive variety of ways.

Plausibly, Cynthia believes all billion of the beer-in-the-fridge propositions. She might readily affirm any of them without, seemingly, needing to infer anything new. Sitting on the couch two minutes before the beery desire that suddenly animates her, Cynthia already believed, it seems – in the same inactive, stored-in-the-back-of-the-mind way that you believed, five minutes ago, that Obama was U.S. President in 2010 – that Lucky Lager is in the fridge, that there are probably at least three beers in the refrigerator door, that there’s some cheap bottled beer in the usual place, and so on. If so, and if we set aside for now (see Section 5) the question of tacit belief, then Cynthia must have a billion beer-in-the-fridge representations stored in her mind. Specificity requires that it be the case that exactly one of those representations was retrieved the moment before she stood up, or exactly two, or exactly 37, or exactly 814,406. Either exactly one of those representations, or exactly two, or exactly 37, or exactly 814,406, then interacted with exactly one of her desires, or exactly two of her desires, or exactly 37, or exactly 814,406. But which one or ones did the causal work?

Let’s call this the Problem of Causal Specification. If your reaction to the Problem of Causal Specification is to think, yes, what an interesting problem, if only we had the right kind of brain-o-scope, we could discover that it was exactly the representation there are 3 or 4 Lucky Lagers somewhere in the refrigerator door, then you’re just the kind of mad dog representational realist I’m arguing against.

I think most of us will recognize the problem as a pseudo-problem. This is not a plausible architecture of the mind. There are many reasonable characterizations of Cynthia’s beer-in-the-fridge belief, varying in specificity, some more apt than others. Her decision is no more caused by a single, precisely correct subset of those billion possible representations than World War I had a single, possibly conjunctive cause expressible by a single determinately true sentence. If someone attempts to explain Cynthia’s behavior by saying that she believes there is beer in the fridge, it would be absurd to fire up your brain-o-scope, then correct them by saying, “Wrong! She’s going to the fridge because she believes there is Lucky Lager in the refrigerator door.” It would be equally absurd to say that it would require wild, one-in-a-billion luck to properly explain Cynthia’s behavior absent the existence of such a brain-o-scope.

A certain variety of representationalist might seek to escape the Problem of Causal Specification by positing a single extremely complex representation that encompasses all of Cynthia’s beer-in-the-fridge beliefs. A first step might be to posit a map-like representation of the fridge, including the location of the beer within it and the location of the fridge in the kitchen. This map-like representation might then be made fuzzy or probabilistic to incorporate uncertainty about, say, the exact location of the beer and the exact number of bottles. Labels will then need to be added: “Lucky Lager” would be an obvious choice, but that is at best the merest start, given that Cynthia might not remember the brand and will represent the type of beer in many different ways, including some that are disjunctive, approximate, and uncertain. If maps can conflict and if maps and object representations can be combined in multiple ways, further complications ensue. Boldly anticipating the resolution of all these complexities, the representationalist might then hypothesize that this single, complicated representation is the representation that was activated. All the sentences on our list would then be imperfect simplifications – though workable enough for practical purposes. One could perhaps similarly imagine the full, complex causal explanation of World War I, detailed beyond any single historian’s possible imagining.

This move threatens to explode Presence, the idea that when someone believes P there is a representation with the content P present somewhere in the mind. There would be a complex representation stored, yes, from which P might be derivable. But many things might be derivable from a complex representation, not all of which we normally will want to say are believed in virtue of possessing that representation. If a map-like representation contains a triangle, then it’s derivable from the representation that the sum of the interior angles is 180 degrees; but someone ignorant of geometry would presumably not have that belief that simply in virtue of having that representation. Worse, if the representation is complex enough to contain a hidden contradiction, then presumably (by standard laws of logic) literally every proposition that anyone could ever believe is derivable from it.

The move to a single, massively complex representation also creates an architectural challenge. It’s easy to imagine a kinematics in which a simple proposition such as there is beer in the fridge is activated in working memory or a central workspace. But it’s not clear how a massively complex representation could be similarly activated. If the representation has many complex parameters, it’s hard to see how it could fit within the narrow constraints of working memory as traditionally conceived. No human could attend to or process every aspect of a massively complex representation in drawing inferences or making practical decisions. More plausibly, some aspects of it must be the target of attention or processing. But now we’ve lost all of the advantages we hoped to gain by moving to a single, complex representation. Assessing which aspects are targeted throws us back upon the Problem of Causal Specification.

Cynthia believes not only that there’s beer in the fridge but also that there’s ketchup in the fridge and that the fridge is near the kitchen table and that her roommate loves ketchup and that the kitchen table was purchased at Ikea and that the nearest Ikea is thirty miles west. This generates a trilemma. Either (a.) Cynthia has entirely distinct representations for her beer-in-the-fridge belief, her ketchup-in-the-fridge belief, her fridge-near-the-table belief, and so on, in which case even if we can pack everything about beer in the fridge into a single complex representation we still face the problem of billions of representations with closely related contents and an implausible commitment to the activation of some precise subset of them when Cynthia gets up to go to the kitchen. Or (b.) Cynthia has overlapping beer-in-the-fridge, ketchup-in-the-fridge, etc. representations, which raises the same set of problems, further complicated by commitment to a speculative architecture of representational overlap. Or (c.) all of these representations are somehow all aspects of one mega-representation, presumably of the entire world, which does all the work – a representation which of course would always be active during any reasoning of any sort, demolishing any talk about retrieving different stored representations and combining them together in theoretical inference.

Dispositionalism elegantly avoids all these problems! Of course there is some low-level mechanism or set of mechanisms, perhaps representational or partly representational, that explains Cynthia’s behavior. But the dispositionalist need not commit to Presence, Discreteness, Kinematics, or Specificity. There need be no determinate, specific answer exactly what representational content, if any, is activated, and the structures at work need have no clean or simple relation to the beliefs we ascribe to Cynthia. Dispositionalism is silent about structure. What matters is only the pattern of dispositions enabled by the underlying structure, whatever that underlying structure is.

Instead of the storage and retrieval metaphor that representationalists tend to favor, the dispositionalist can appeal to figural or shaping metaphors. Cynthia’s dispositional profile has a certain shape: the shape characteristic of that of a beer-in-the-fridge believer – but also, at the same time, the shape characteristic of a Lucky-Lager-in-the-refrigerator-door believer. There need be no single determinately correct way to specify the shape of a complex figure. A complex shape can be characterized in any of a variety of ways, at different levels of precision, highlighting different features, in ways that are more or less apt given the describer’s purposes and interests. It is this attitude we should take to characterizing Cynthia’s complex dispositional profile. Attributing a belief is more like sketching the outline of a complex figure – perhaps a figure only imperfectly seen or known – than it is like enumerating the contents of a box.

Thursday, January 12, 2023

Further Methodological Troubles for the Moralometer

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

If we want to study morality scientifically, we should want to measure it. Imagine trying to study temperature without a thermometer or weight without scales. Of course indirect measures are possible: We can't put a black hole on a scale, but we can measure how it bends the light that passes nearby and thereby infer its mass.

Last month, I raised a challenge for the possibility of developing a "moralometer" (a device that accurately measure's a person's overall morality). The challenge was this: Any moralometer would need to draw on one or more of four methods: self-report, informant report, behavioral measures, or physiological measures. Each one of these methods has serious shortcomings as a basis for general moral measurement of one's overall moral character.

This month, I raise a different (but partly overlapping) set of challenges, concerning how well we can specify the target we're aiming to measure.

Problems with Flexible Measures

Let's call a measure of overall morality flexible if it invites a respondent to apply their own conception of morality, in a flexible way. The respondent might be the target themselves (in self-report measures of morality) or they might be a peer, colleague, acquaintance, or family member of the target (in informant-report measures of morality). The most flexible measures apply "thin" moral concepts in Bernard Williams' sense -- prompts like "Overall, I am a morally good person" [responding on an agree/disagree scale] or "[the target person] behaves ethically".

While flexible measures avoid excessive rigidity and importing researchers' limited and possibly flawed understandings of morality into the rating procedure, the downsides are obvious if we consider how people with noxious worldviews might rate themselves and others. The notorious Nazi Adolf Eichmann, for example, appeared to have thought highly of his own moral character. Alexander "the Great" was admired for millennia, including as a moral exemplar of personal bravery and spreader of civilization, despite his main contribution being conquest through aggressive warfare, including the mass slaughter and enslavement of at least one civilian population.

I see four complications:

Relativism and Particularism. Metaethical moral relativists hold that different moral standards apply to different people or in different cultures. While I would reject extreme relativist views according to which genocide, for example, doesn't warrant universal condemnation, a moderate version of relativism has merit. Cultures might reasonably differ, for example, on the age of sexual consent, and cultures, subcultures, and social groups might reasonably differ in standards of generosity in sharing resources with neighbors and kin. If so, then flexible moralometers, employed by raters who use locally appropriate standards, will have an advantage over inflexible moralometers which might inappropriately import researchers' different standards. However, even flexible moralometers will fail in the face of relativism if they are employed by raters who employ the wrong moral standards.

According to moral particularism, morality isn't about applying consistent rules or following any specifiable code of behavior. Rather, what's morally good or bad, right or wrong, frequently depends on particular features of specific situations which cannot be fully codified in advance. While this isn't the same as relativism, it presents a similar methodological challenge: The farther the researcher or rater stands from the particular situation of the target, the more likely they are to apply inappropriate standards, since they are likely to be ignorant of relevant details. It seems reasonable to accept at least moderate particularism: The moral quality of telling a lie, stealing $20, or stopping to help a stranger, might often depend on fine details difficult to know from outside the situation.

If the most extreme forms of moral relativism or particularism (or moral skepticism) are true, then no moralometer could possibly work, since there won't be stable truths about people's morality, or the truths will be so complicated or situation dependent as to defy any practical attempt at measurement. Moderate relativism and particularism, if correct, provide reason to favor flexible standards as judged by self-ratings or the ratings of highly knowledgeable peers sensitive to relevant local details; but even in such cases all of the relevant adjustments might not be made.

Incommensurability. Goods are incommensurable if there is no fact of the matter about how they should be weighed against each other. Twenty dollar bills and ten dollar bills are commensurable: Two of the latter are worth exactly one of the former. But it's not clear how to weigh, for example, health against money or family versus career. In ethics, if Steven tells a lie in the morning and performs a kindness in the afternoon, how exactly ought these to be weighed against each other? If Tara is stingy but fair, is her overall moral character better, worse, or the same as that of Nicholle, who is generous but plays favorites? Combining different features of morality into a single overall score invites commensurability problems. Plausibly, there's no single determinately best weighting of different factors.

Again, I favor a moderate view. Probably in many cases there is no single best weighting. However, approximate judgments remain possible. Even if health and money can't be precisely weighed against each other, extreme cases permit straightforward decisions. Most of us would gladly accept a scratch on a finger for the sake of a million dollars and would gladly pay $10 to avoid stage IV cancer.  Similarly, Stalin was morally worse than Martin Luther King, even if Stalin had some virtues and King some vices. Severe sexual harassment of an employee is worse than fibbing to your spouse to get out of washing the dishes.

Moderate incommensurability limits the precision of any possible moralometer. Vices and virtues, and rights and wrongs of different types will be amenable only to rough comparison, not precise determination in a single common coin.

Moral error. If we let raters reach independent judgments about what is morally good or bad, right or wrong, they might simply get it wrong. As mentioned above, Eichmann appears to have thought well of himself, and the evidence suggests that he also regarded other Nazi leaders as morally excellent. Raters will disagree about the importance of purity norms (such as norms against sexual promiscuity), the badness of abortion, and the moral importance, or not, of being vegetarian. Bracketing relativism, then at least some of these raters must be factually mistaken about morality, on one side or another, adding substantial error into their ratings.

The error issue is enormously magnified if ordinary people's moral judgments are systematically mistaken. For example, if the philosophically discoverable moral truth is that the potential impact of your choices on future generations morally far outweighs the impact you have on the people around you (see my critiques of "longtermism" here and here), then the person who is an insufferable jerk to everyone around them but donates $5000 to an effective charity might be in fact far morally better than a personally kind and helpful person who donates nothing to charity -- but informants' ratings might very well suggest the reverse. Similar remarks would apply to any moral theory that is sharply at odds with commonsense moral intuition.

Evaluative bias. People are, of course, typically biased in their own favor. Most people (not all!) are reluctant to think of themselves as morally below average, as unkind, unfair, or callous, even if they in fact are. Social desirability bias is the well-known phenomenon that survey respondents will tend to respond to questions in a manner that presents them in a good light. Ratings of friends, family, and peers will also tend to be positively biased: People tend to view their friends and peers positively, and even when not they might be reluctant to "tell on" them to researchers. If the size of evaluative bias were consistent, it could be corrected for, but presumably it can vary considerably from case to case, introducing further noise.

Problems with Inflexible Measures

Given all these problems with flexible measures of morality, it might seem best to build our hypothetical moralometer instead around inflexible measures. Assuming physiological measures are unavailable, the most straightforward way to do this would be to employ researcher-chosen behavioral measures. We could try to measure someone's honesty by seeing whether they will cheat on a puzzle to earn more money in a laboratory setting. We could examine publicly available criminal records. We could see whether they are willing to donate a surprise bonus payment to a charity.

Unfortunately, inflexible measures don't fully escape the troubles that dog flexible measures, and they bring new troubles of their own.

Relativism and particularism. Inflexible measures probably aggravate the problems with relativism and particularism discussed above. With self-report and informant report, there's at least an opportunity for the self or the informant to take into account local standards and particulars of the situation. In contrast, inflexible measures will ordinarily be applied equally to all without adjustment for context. Suppose the measure is something like "gives a surprise bonus of $10 to charity". This might be a morally very different decision for a wealthy participant than for a needy participant. It might be a morally very different decision for a participant who would save that $10 to donate it to a different and maybe better charity than for a participant who would simply pocket the $10. But unless those other factors are being measured, as they normally would not be, they cannot be taken account of.

Incommensurability. Inflexible measures also won't avoid incommensurability problems. Suppose our moralometer includes one measure of honesty, one measure of generosity, and one measure of fairness. The default approach might be for a summary measure simply to average these three, but that might not accurately reflect morality: Maybe a small act of dishonesty in an experimental setting is far less morally important than a small act of unfairness in that same experimental setting. For example, getting an extra $1 from a researcher by lying in a task that transparently appears to demand a lie (and might even be best construed as a game in which telling untruths is just part of the task, in fact pleasing the researcher) might be approximately morally neutral while being unfair to a fellow participant in that same study might substantially hurt the other's feelings.

Sampling and ecological validity. As mentioned in my previous post on moralometers, fixed behavioral measures are also likely to have severe methodological problems concerning sampling and ecological validity. Any realistic behavioral measure is likely to capture only a small and perhaps unrepresentative part of anyone's behavior, and if it's conducted in a laboratory or experimental setting, behavior in that setting might not correlate well with behavior with real stakes in the real world. How much can we really infer about a person's overall moral character from the fact that they give their monetary bonus to charity or lie about a die roll in the lab?

Moral authority. By preferring a fixed measure, the experimenter or the designer of the moralometer takes upon themselves a certain kind of moral authority -- the authority to judge what is right and wrong, moral or immoral, in others' behavior. In some cases, as in the Eichmann case, this authority seems clearly preferable to deferring to the judgment of the target and their friends. But in other cases, it is a source of error -- since of course the experimenter or designer might be wrong about what is in fact morally good or bad.

Being wrong while taking up, at least implicitly, this mantle of moral authority has at least two features that potentially make it worse than the type of error that arises by wrongly deferring to mistaken raters. First, the error is guaranteed to be systematic. The same wrong standards will be applied to every case, rather than scattered in different (and perhaps partly canceling) directions as might be the case with rater error. And second, it risks a lack of respect: Others might reasonably object to being classified as "moral" or "immoral" by an alien set of standards devised by researchers and with which they disagree.

In Sum

The methodological problems with any potential moralometer are extremely daunting. As discussed in December, all moralometers must rely on some combination of self-report, informant report, behavioral measure, or physiological measure, and each of these methods has serious problems. Furthermore, as discussed today, a batch of issues around relativism, particularism, disagreement, incommensurability, error, and moral authority dog both flexible measures of morality (which rely on raters' judgments about what's good and bad) and inflexible measures (which rely on researchers' or designers' judgments).

Coming up... should we even want a moralometer if we could have one?  I discussed the desirability or undesirability of a perfect moralometer in December, but I want to think more carefully about the moral consequences of the more realistic case of an imperfect moralometer.

Friday, January 06, 2023

The Design Policy of the Excluded Middle

According to the Design Policy of the Excluded Middle, as Mara Garza and I have articulated it (here and here), we ought to avoid creating AI systems "about which it is unclear whether they deserve full human-grade rights because it is unclear whether they are conscious or to what degree" -- or, more simply, we shouldn't make AI systems whose moral status is legitimately in doubt.  (This is related to Joanna Bryson's suggestion that we should only create robots whose lack of moral considerability is obvious, but unlike Bryson's policy it imagines leapfrogging past the no-rights case to the full rights case.)

To my delight, Mara's and my suggestion is getting some uptake, most notably today in the New York Times.

The fundamental problem is this.  Suppose we create AI systems that some people reasonably suspect are genuinely conscious and genuinely deserve human-like rights, while others reasonably suspect that they aren't genuinely conscious and don't genuinely deserve human-like rights.  This forces us into a catastrophic dilemma: Either give them full human-like rights or don't give them full human-like rights.

If we do the first -- if we give them full human or human-like rights -- then we had better give them paths to citizenship, healthcare, the vote, the right to reproduce, the right to rescue in an emergency, etc.  All of this entails substantial risks to human beings: For example, we might be committed to save six robots in a fire in preference to five humans.  The AI systems might support policies that entail worse outcomes for human beings.  It would be more difficult to implement policies designed to reduce existential risk due to runaway AI intelligence.  And so on.  This might be perfectly fine, if the AI systems really are conscious and really are our moral equals.  But by stipulation, it's reasonable to think that they are empty machines with no consciousness and no real moral status, and so there's a real risk that we would be risking and sacrificing all this for nothing.

If we do the second -- if we deny them full human or human-like rights -- then we risk creating a race of slaves we can kill at will, or at best a group of second-class citizens.  By stipulation, it might be the case that this would constitute unjust and terrible treatment of entities as deserving of rights and moral consideration as human beings are.

Therefore, we ought to avoid putting us in the situation where we face this dilemma.  We should avoid creating AI systems of dubious moral status.

A few notes:

"Human-like" rights: Of course "human rights" would be a misnomer if AI systems become our moral equals.  Also, exactly what healthcare, reproduction, etc., look like for AI systems, and the best way to respect their interests, might look very different in practice from the human case.  There would be a lot of tricky details to work out!

What about animal-grade AI that deserves animal-grade rights?  Maybe!  Although it seems a natural intermediate step, we might end up skipping it, if any conscious AI systems end up also being capable of human-like language, rational-planning, self-knowledge, ethical reflection, etc.  Another issue is this: The moral status of non-human animals is already in dispute, so creating AI systems of disputably animal-like moral status doesn't perhaps add quite the same dimension of risk and uncertainty to the world that creating a dubiously human-status moral system would.

Would this policy slow technological progress?  Yes, probably.  Unsurprisingly, being ethical has its costs.  And one can dispute whether those costs are worth paying or are overridden by other ethical considerations.

Sunday, January 01, 2023

Writings of 2022

Every New Year's Day, I post a retrospect of the past year's writings. Here are the retrospects of 2012, 2013, 2014, 2015, 2016, 2017, 2018, 20192020, and 2021.

The biggest project this year was my new book The Weirdness of the World, submitted in November and due in print in early fall 2023.  This book pulls together ideas I've been publishing over the past ten years concerning the failure of common sense, philosophy, and empirical science to explain consciousness and the fundamental structure of the cosmos, and the corresponding bizarreness and dubiety of all general theories about such matters.




Under contract / in progress:

    As co-editor with Jonathan Jong, The Nature of Belief, Oxford University Press.
    As co-editor with Helen De Cruz and Rich Horton, a yet-to-be-titled anthology with MIT Press containing great classics of philosophical SF.

Full-length non-fiction essays

Appearing in print:

Finished and forthcoming:
    "How far can we get in creating a digital replica of a philosopher?" (third author, with Anna Strasser and Matt Crosby”, Robophilosophy Proceedings 2022.
    "What is unique about kindness? Exploring the proximal experience of prosocial acts relative to other positive behaviors” (with Annie Regan, Seth Margolis, Daniel J. Ozer, and Sonja Lyubomirsky), Affective Science
In draft and circulating:
    "The full rights dilemma for A.I. systems of debatable personhood" [available on request].
    "Inflate and explode". (I'm trying to decide whether to trunk this one or continue revising it.)
Shorter non-fiction

Science fiction stories

Some favorite blog posts

Reprints and Translations

    "Fish dance", reprinted in R. M. Ambrose, Vital (2022).  Inlandia Institute.

Thursday, December 29, 2022

The Moral Status of Alien Microbes, Plus a Thought about Artificial Life

Some scientists think it's quite possible we will soon find evidence of microbial life in the Solar System, if not on Mars, then maybe in the subsurface oceans of a gas giant's icy moon, such as Europa, Enceladus, or Titan. Suppose we do find alien life nearby. Presumably, we wouldn't or shouldn't casually destroy it. Perhaps the same goes for possible future artificial life systems on Earth.

Now you might think that alien microbes would have only instrumental value for human beings. Few people think that Earthly microbes have intrinsic moral standing or moral considerability for their own sake. There's no "microbe rights" movement, and virtually no one feels guilty about taking an antibiotic to fight a bacterial infection. In contrast, human beings have intrinsic moral considerability: Each one of us matters for our own sake, and not merely for the sake of others.

Dogs also matter for their own sake: They can feel pleasure and pain, and we ought not inflict pain on them unnecessarily. Arguably the same holds for all sentient organisms, including lizards, salmon, and lobsters, if they are capable of conscious suffering, as many scientists now think.

But microbes (presumably!) don't have experiences. They aren't conscious. They can't genuinely suffer. Nor do they have the kinds of goals, expectations, social relationships, life plans, or rational agency that we normally associate with being a target of moral concern. If they matter, you might think, they matter only to the extent they are useful for our purposes -- that is, instrumentally or derivatively, in the way that automobiles, video games, and lawns matter. They matter only because they matter to us. Where would be without our gut microbiome?

If so, then you might think that alien microbes would also matter only instrumentally. We would and should value them as a target of scientific curiosity, as proof that life can evolve in alien environments, and because by studying them we might unlock useful future technologies. But we ought not value them for their own sake.

[An artist's conception of life on Europa] 

Now in general, I think that viewpoint is mistaken. I am increasingly drawn to the idea that everything that exists, even ordinary rocks, has intrinsic value. But even if you don't agree with me about that, you might hesitate to think we should feel free to extinguish alien microbes if it's in our interest. You might think that if we were to find simple alien life in the oceans of Europa, that life would merit some awe, respect, and preservation, independently of their contribution to human interests.

Environmental ethicists and deep ecologists see value in all living systems, independent of their contribution to human interests -- including in life forms that aren't themselves capable of pleasure or pain. It might seem radical to extend this view to microbes; but when the microbes are the only living forms in an entire ecosystem, as they might be an another planet in the Solar System, the idea of "microbe rights" maybe gains some appeal.

I'm not sure exactly how to argue for this perspective, other than just to invite you to reflect on the matter. Perhaps the distant planet thought experiment will help. Consider a far away planet we will never interact with. Would it be better for it to be a sterile rock or for it to have life? Or consider two possible universes, one containing only a sterile planet and one containing a planet with simple life. Which is the better universe? The planet or universe with life is, I propose, intrinsically better.

So also: The universe is better, richer, more beautiful, more awesome and amazing, if Europa has microbial life beneath its icy crust than if it does not. If we then go and destroy that life, we will have made the universe a worse place. We ought not put the Europan ecosystem at risk without compelling need.

I have been thinking about these issues recently in connection with reflections on the possible moral status of artificial life. Artificial life is life, or at least systems that important ways resemble life, created artificially by human engineers and researchers. I'm drawn to the idea that if alien microbes or alien ecosystems can have intrinsic moral considerability, independent of sentience, suffering, consciousness, or human interests, then perhaps sufficiently sophisticated artificial life systems could also. Someday artificial life researchers might create artificial ecosystems so intricate and awesome that they are the ethical equivalent of an alien ecology, right here on Earth, as worth preserving for their own sake as the microbes of Europa would be.

Thursday, December 22, 2022

The Moral Measurement Problem: Four Flawed Methods

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

So you want to build a moralometer -- that is, a device that measures someone's true moral character? Yes, yes. Such a device would be so practically and scientifically useful! (Maybe somewhat dystopian, though? Careful where you point that thing!)

You could try to build a moralometer by one of four methods: self-report, informant report, behavioral measurement, or physiological measurement. Each presents daunting methodological challenges.

Self-report moralometers

To find out how moral a person is, we could simply ask them. For example, Aquino and Reed 2002 ask people how important it is to them to have various moral characteristics, such as being compassionate and fair. More directly, Furr and colleagues 2022 have people rate the extent to which they agree with statements such as "I would say that I am a good person" and "I tend to act morally".

Could this be the basis of a moralometer? That depends on the extent to which people are able and willing to report on their overall morality.

People might be unable to accurately report their overall morality.

Vazire 2010 has argued that self-knowledge of psychological traits tends to be poor when the traits are highly evaluative and not straightforwardly observable (e.g., "intelligent", "creative"), since under those conditions people are (typically) motivated to see themselves favorably and -- due to low observability -- not straightforwardly confronted with the unpleasant news they would prefer to deny.

One's overall moral character is evaluatively loaded if anything is. Nor is it straightforwardly observable. Unlike height or talkativeness, someone motivated not to see themselves as, say, unfair or a jerk can readily find ways to explain away the evidence (e.g., "she deserved it", "I'm in such a hurry").

Furthermore, it sometimes requires a certain amount of moral insight to distinguish morally good from morally bad behavior. Part of being a sexist creep is typically not seeing anything wrong with the kinds of things that sexist creeps typically do. Conversely, people who are highly attuned to how they are treating others might tend to beat themselves up over relatively small violations. We might thus expect a moral Dunning-Kruger effect: People with bad moral character might disproportionately overestimate their moral character, so that people's self-opinions tend to be undiagnostic of the actual underlying trait.

Even to the extent people are able to report their overall morality, people might be unwilling to report it.

It's reasonable to expect that self-reports of moral character would be distorted by socially desirable responding, the tendency for questionnaire respondents to answer in a manner that they believe will reflect well on them. To say that you are extremely immoral seems socially undesirable. We would expect that people (e.g., Sam Bankman-Fried) would tend to want to portray themselves as morally above average. On the flip side, to describe oneself as "extremely moral" (say, 100 on a 0-100 scale from perfect immorality to perfect morality) might come across as immodest. So even people who believe themselves to be tip-top near-saints might not frankly express their high self-opinions when directly asked.

Reputational moralometers

Instead of asking people to report on their own morality, could we ask other people who know them? That is, could we ask their friends, family, neighbors, and co-workers? Presumably, the report would be less distorted by self-serving or ego-protective bias. There's less at stake when judging someone else's morality than when judging your own. Also, we could aggregate across multiple informants, combining several different people's ratings, possibly canceling out some sources of noise and bias.

Unfortunately, reputational moralometers -- while perhaps somewhat better than self-report moralometers -- also present substantial methodological challenges.

The informant advantage of decreased bias could be offset by a corresponding increased in ignorance.

Informants don't observe all of the behavior of the people whose morality they are judging, and they have less access to the thoughts, feelings, and motivations that are relevant to the moral assessment of behavior. Informant reports are thus likely to be based only on a fraction of the evidence that self-report would be based on. Moreover, people tend to hide their immoral behaviors, and presumably some people are better at doing so than others. Also, people play different roles in our lives, and romantic partners, coworkers, friends, and teachers will typically only see us in limited, and perhaps unrepresentative, contexts. A good moralometer would require the correct balancing of a range of informants with complementary patches of ignorance, which is likely to be infeasible.

Informants are also likely to be biased.

Informant reports may be contaminated not by self-serving bias but by "pal-serving bias" (Leising et al 2010). If we rely on people to nominate their own informants, they are likely to nominate people who have a positive perception of them. Furthermore, the informants might be reluctant "tell on" or badly evaluate their friends, especially in contexts (like personnel selection) where the rating could have real consequences for the target. The ideal informant would be someone who knows the target well but isn't positively biased toward you. In reality, however, there's likely a tradeoff between knowledge and bias, so that those who are most likely to be impartial are not the people who know you best.

Positivity bias could in principle be corrected for if every informant was equally biased, but it's likely that some targets will have informants who are more biased than others.

Behavioral moralometers

Given the problems with self-report and informant report, direct behavioral measures might seem promising. Much of my own work on the morality of professional ethicists and the effectiveness of ethics instruction has depended on direct behavioral measures such as courteous and discourteous behavior at philosophy conferences, theft of library books, meat purchases on campus (after attending a class on the ethics of eating meat), charitable giving, and choosing to join the Nazi party in 1930s Germany. Others have measured behavior in dictator games, lying to the experimenter in laboratory settings, criminal behavior, and instances of comforting, helping, and sharing.

Individual behaviors are only a tiny and possibly unrepresentative sample.

Perhaps the biggest problem with behavioral moralometers is that any single, measurable behavior will inevitably be a minuscule fraction of the person's behavior, and might not be at all representative of the person's overall morality. The inference from this person donated $10 in this instance or this person committed petty larceny two years ago to this person's overall moral character is good or bad is a giant leap from a single observation. Given the general variability and inconstancy of most people's behavior, we shouldn't expect a single observation, or even a few related observations, to provide an accurate picture of the person overall.

Although self-report and informant report are likely to be biased, they aggregate many observations of the target into a summary measure, while the typical behavioral study does not.

There is likely a tradeoff between feasibility and validity.

There are some behaviors that are so telling of moral character that a single observation might reveal a lot: If someone commits murder for hire, we can be pretty sure they're no saint. If someone donates a kidney to a stranger, that too might be highly morally diagnostic. But such extreme behaviors will occur at only tiny rates in the general population. Other substantial immoral behaviors, such as underpaying taxes by thousands of dollars or cheating on one's spouse, might occur more commonly, but are likely to be undetectable to researchers (and perhaps unethical to even try to detect).

The most feasible measures are laboratory measures, such as misreporting the roll of a die to an experimenter in order to win a greater payout. But it's unclear what the relationship is between laboratory behaviors for minor stakes and overall moral behavior in the real world.

Individual behaviors can be difficult to interpret.

Another advantage of self-report and to some extent informant report have over direct behavioral measures is that there's an opportunity for contextual information to clarify the moral value or disvalue of behaviors: The morality of donating $10 or the immorality of not returning a library book might depend substantially on one's motives or financial situation, which self-report or informant report can potentially account for but which would be invisible in a simple behavioral measure. (Of course, on the flip side, this flexibility of interpretation is part of what permits bias to creep in.)

[a polygraph from 1937]

Physiological moralometers

A physiological moralometer would attempt to measure someone's morality by measuring something biological like their brain activity under certain conditions or their genetics. Given the current state of technology, no such moralometer is likely to arise soon. The best known candidate might be the polygraph or lie detector test, which is notoriously unreliable and of course doesn't purport to be a general measure of honesty much less of overall moral character.

Any genetic measure would of course omit any environmental influences on morality. Given the likelihood that environmental influences play a major role in people's moral development, no genetic measure could have a high correlation with a person's overall morality.

Brain measures, being potentially closer to measuring the mental states that underlie morality, don't have a similar ceiling accuracy, but currently look less promising than behavioral measures, informant report measures, and probably even self-report measures.

The Inaccuracy of All Methods

It thus seems likely that there is no good method for accurately measuring a person's overall moral character. Self-report, informant report, behavioral measures, and physiological measures all face large methodological difficulties. If a moralometer is something that accurately measures an individual person's morality, like a thermometer accurately (accurately enough) measures a person's body temperature, there's little reason to think we could build one.

It doesn't follow that we can't imprecisely measure someone's moral character. It's reasonable to expect the existence of small correlations between some potential measures and a person's real underlying overall moral character. And maybe such measures could be used to look for trends aggregated across groups.

Now, this whole post has been premised on the idea that it make sense to talk of a person's overall morality as something that could be captured, at least in principle, by a number such as 0 to 100 or -1 to +1. There are a few reasons to doubt this, including moral relativism and moral incommensurability -- but more on that in a future post.

Tuesday, December 13, 2022

An Objection to Chalmers's Fading Qualia Argument

Would a neuron-for-neuron silicon isomorph of you have conscious experiences? Or is there something special about the biology of neurons, so that no brain made of silicon, no matter how sophisticated and similar to yours, could actually have conscious experiences?

In his 1996 book and a related 1995 article, David Chalmers offers what he calls the "fading qualia" argument that there's nothing in principle special about neurons (see also Cuda 1985). The basic idea is that, in principle, scientists could swap your neurons out one by one, and you'd never notice the difference. But if your consciousness were to disappear during this process, you would notice the difference. Therefore, your consciousness would not disappear. A similar idea underlies Susan Schneider's "Chip Test" for silicon consciousness: To check whether some proposed cognitive substrate really supports consciousness, slowly swap out your neurons for that substrate, a piece at a time, checking for losses of consciousness along the way.

In a recent article critical of Schneider, David Udell and I have criticized her version of the swapping test. Our argument can be adapted to Chalmers's fading qualia argument, which is my project today.

First, a bit more on how the gradual replacement is supposed to work. Suppose you have a hundred billion neurons. Imagine replacing just one of those neurons with a silicon chip. The chemical and electrical signals that serve as inputs to that neuron are registered by detectors connected to the chip. The chip calculates the effects that those inputs would have had on the neuron's behavior -- specifically, what chemical and electrical signals the neuron, had it remained in place, would have given as outputs to other neurons connected to it -- and then delivers those same outputs to those same neurons by effectors attached to the silicon chip on one end and the target neurons at the other end. No doubt this would be complicated, expensive, and bulky; but all that matters to the thought experiment is that it would be possible in principle. A silicon chip could be made to perfectly imitate the behavior of a neuron, taking whatever inputs the neuron would take and converting them into whatever outputs the neuron would emit given those inputs. Given this perfect imitation, no other neurons in the brain would behave differently as a result of the swap: They would all be getting the same inputs from the silicon replacement that they would have received from the original neuron.

So far, we have replaced only a single neuron, and presumably nothing much has changed. Next, we swap another. Then another. Then another, until eventually all one hundred billion have been replaced, and your "neural" structure is now entirely constituted by silicon chips. (If glial cells matter to consciousness, we can extend the swapping process to them also.) The resulting entity will have a mind that is functionally identical to your own at the level of neural structure. This implies that it will have exactly the same behavioral reactions to any external stimuli that you would have. For example, if it is asked, "Are you conscious?" it will say, "Definitely, yes!" (or whatever you would have said), since all the efferent outputs to your muscles will be exactly the same as they would have been had your brain not been replaced. The question is whether the silicon-chipped entity might actually lack conscious experiences despite this behavioral similarity, that is, whether it might be a "zombie" that is behaviorally indistinguishable from you despite having nothing going on experientially inside.

Chalmers's argument is a reductio. Assume for the sake of the reductio that the final silicon-brained you entirely lacks conscious experience. If so, then sometime during the swapping procedure consciousness must either have gradually faded away or suddenly winked out. It's implausible, Chalmers suggests, that consciousness would suddenly wink out with the replacement of a single neuron. (I'm inclined to agree.) If so, then there must be intermediate versions of you with substantially faded consciousness. However, the entity will not report having faded consciousness. Since (ex hypothesi) the silicon chips are functionally identical with the neurons, all the intermediate versions of you will behave exactly the same as they would have behaved if no neurons had been replaced. Nor will there be other neural activity constitutive of believing that your consciousness is fading away: Your unreplaced neurons will keep firing as usual, as if there had been no replacement at all.

However, Chalmers argues, if your consciousness were fading away, you would notice it. It's implausible that the dramatic changes of consciousness that would have to be involved when your consciousness is fading away would go entirely undetected during the gradual replacement process. That would be a catastrophic failure of introspection, which is normally a reliable or even infallible process. Furthermore, it would be a catastrophic failure that occurs while the cognitive (neural/silicon) systems are functioning normally. This completes the reductio. Restated in modus tollens form: If consciousness would disappear during gradual replacement, you'd notice it; but you wouldn't notice it; therefore consciousness would not disappear during gradual replacement.

As Udell and I frame it in our discussion of Schneider, this argument has an audience problem. Its target audience is someone who is worried that despite in-principle functional identicality at the neuronal level, silicon might just not be the right kind of stuff to host consciousness. Someone who has this worry presumably does not trust the introspective reports, or the seemingly-introspective reports, of the silicon-brained entity. The silicon-brained entity might say "Yes, of course I'm conscious! I'm experiencing right now visual sensations of your face, auditory sensations of my voice, and a rising feeling of annoyance at your failure to believe me!" The intended audience remains unconvinced by this apparent introspective testimony. They need an argument to be convinced otherwise -- the Fading Qualia argument.

Let's call the entity (the person) before any replacement surgery r0, and the entity after all their neurons are replaced rn, where n is the total number of neurons replaced. During replacement, this entity passes through stages r1, r2, r3, ... ri, ... rn. By stipulation, our audience doesn't trust the introspective or seemingly introspective judgments of rn. This is the worry that motivates the need for the Fading Qualia argument. In order for the argument to work, there must be some advantage that the intermediate ri entities systematically possess over rn, such that we have reason to trust their introspective reports despite distrusting rn's report.

Seemingly introspective reports about conscious experience may or may not be trustworthy in the normal human case (Schwitzgebel 2011; Irvine 2013). But even if they're trustworthy in the normal human case, they might not be trustworthy in the unusual case of having pieces of one's brain swapped out. One might hold that introspective judgments are always trustworthy (absent a certain range of known defeaters, which we can stipulate are absent), in other words, that unless a process accurately represents a target conscious experience it is not a genuinely introspective process. This is true, for example on containment views of introspection, according to which properly formed introspective judgments contain the target experiences as a part (e.g., "I'm experiencing [this]"). Infallibilist views of introspection of that sort contrast with functionalist views of introspection, on which introspection is a fallible functional process that garners information about a distinct target mental state.

A skeptic about silicon consciousness might either accept or reject an infallibilist view of introspection. The Fading Qualia argument will face trouble either way.

[A Trilemma for the Fading Qualia Argument (click to enlarge and clarify figure): Optimists about silicon chip consciousness have no need for an argument in favor of rn consciousness, because they are already convinced of its possibility. On the other hand, skeptics about silicon consciousness are led to doubt either the presence or the reliability of ri's introspection (depending on their view of introspection) for the same reason they are led to doubt rn's consciousness in the first place.]

If a silicon chip skeptic holds that genuine introspection requires and thus implies genuine consciousness, then they will want to say that a "zombie" rn, despite emitting what looks from the outside like an introspective report of conscious experience, does not in fact genuinely introspect. With no genuine conscious experience for introspection to target, the report must issue, on this view, from some non-introspective process. This raises the natural question of why they should feel confident that the intermediate ris are genuinely introspecting, instead of merely engaging in a non-introspective process similar to rn's. After all, there is substantial architectural similarity between rn at at least the late-stage ris. The skeptic needs, but Chalmers does not provide, some principled reason to think that entities in the ri phases would in fact introspect despite rn's possible failure to do so -- or at least good reason to believe that the ris would successfully introspect their fading consciousness during the most crucial stages of fade-out. Absent this, reasonable doubt about rn introspection naturally extends into reasonable doubt about introspection in the ri cases as well. The infallibilist skeptic about silicon-based consciousness needs their skepticism about introspection to be assuaged for at least those critical transition points before they can accept the Fading Qualia argument as informative about rn's consciousness.

If a skeptic about silicon-based consciousness believes that genuine introspection can occur without delivering accurate judgments about consciousness, analogous difficulties arise. Either rn does not successfully introspect, merely seeming to do so, in which case the argument of the previous paragraph applies, or rn does introspect and concludes that consciousness has not disappeared or changed in any radical way. The functionalist or fallibilist skeptic about silicon-based consciousness does not trust that rn has introspected accurately. On their view, rn might in fact be a zombie, despite introspectively-based claims otherwise. Absent any reason for the fallibilist skeptic about silicon-based consciousness to trust rn's introspective judgments, why should they trust the judgments of the ris -- especially the late-stage ris? If rn can mistakenly judge itself conscious, on the basis of its introspection, might someone undergoing the gradual replacement procedure also erroneously judge its consciousness not to be fading away? Gradualness is no assurance against error. Indeed, error is sometimes easier if we (or "we") slowly slide into it.

This concern might be mitigated if loss of consciousness is sure to occur early in the replacement process, when the entity is much closer to r0 than rn, but I see no good reason to make that assumption. And even if we were to assume that phenomenal alterations would occur early in the replacement process, it's not clear why the fallibilist should regard those changes as the sort that introspection would likely detect rather than miss.

The Fading Qualia argument awkwardly pairs skepticism about rn's introspective judgments with unexplained confidence in the ri's introspective judgments, and this pairing isn't theoretically stable on any view of introspection.

The objection can be made vivid with a toy case: Suppose that we have an introspection module in the brain. When the module is involved in introspecting a conscious mental state, it will send query signals to other regions of the brain. Getting the right signals back from those other regions -- call them regions A, B, and C -- is part of the process driving the judgment that experiential changes are present or absent. Now suppose that all the neurons in region B have been replaced with silicon chips. Silicon region B will receive input signals from other regions of the brain, just as neural region B would have, and silicon region B will then send output signals to other brain regions that normally interface with neural region B. Among those output signals will be signals to the introspection module.

When the introspection module sends its query signal to region B, what signal will it receive in return? Ex hypothesi, the silicon chips perfectly functionally emulate the full range of neural processes of the neurons they have replaced; that's just the set-up of the Fading Qualia argument. Given this, the introspection module would of course receive exactly the same signal it would have received from region B had region B not been replaced. If so, then entity ri will presumably infer that activity in region B is conscious. Maybe region B normally hosts conscious experiences of thirst. The entity might then say to itself (or aloud), "Yes, I'm still feeling thirsty. I really am having that conscious experience, just as vividly, with no fading, despite the replacement of that region of my brain by silicon chips." This would be, as far as the entity could tell, a careful and accurate first-person introspective judgment.

(If, on the other hand, the brain region containing the introspection module is the region being replaced, then maybe introspection isn't occurring at all -- at least in any sense of introspection that is committed to the idea that introspection is a conscious process.)

A silicon-chip consciousness optimist who does not share the skeptical worries that motivate the need for the Fading Qualia argument might be satisfied with that demonstration. But the motivating concern, the reason we need the argument, is that some people doubt that silicon chips could host consciousness even if they can behave functionally identically with neurons. Those theorists, the target audience of the Fading Qualia argument, should remain doubtful. They ought to worry that the silicon chips replacing brain region B don't genuinely host consciousness, despite feeding output to the introspection module that leads ri to conclude that consciousness has not faded at all. They ought to worry, in other words, that the introspective process has gone awry. This needn't be a matter of "sham" chips intentionally designed to fool users. It seems to be just a straightforward engineering consequence of designing chips to exactly mimic the inputs and outputs of neurons.

This story relies on a cartoon model of introspection that is unlikely to closely resemble the process of introspection as it actually occurs. However, the present argument doesn't require the existence of an actual introspection module or query process much like the toy case above. An analogous story holds for more complex and realistic models. If silicon chips functionally emulate neurons, there is good reason for someone with the types of skeptical worries about silicon-based consciousness that the Fading Qualia argument is designed to address to similarly worry that replacing neurons with functionally perfect silicon substitutes would either create inaccuracies of introspection or replace the introspective process with whatever non-introspective process even zombies engage in.

The Fading Qualia argument thus, seemingly implausibly, combines distrust of the putative introspective judgments of rn with credulousness about the putative introspective judgments of the series of ris between r0 and rn. An adequate defense of the Fading Qualia argument will require careful justification of why someone skeptical about the seemingly introspective judgments of an entity whose brain is entirely silicon should not be similarly skeptical about similar seemingly introspective judgments that occur throughout the gradual replacement process. As it stands, the argument lacks the necessary resources legitimately to assuage the doubts of those who enter it uncertain about whether consciousness would be present in a neuron-for-neuron silicon isomorph.



"Chalmers's Fading/Dancing Qualia and Self-Knowledge" (Apr 22, 2010)

"How to Accidentally Become a Zombie Robot" (Jun 23, 2016)

Much of the text above is adapted with revisions from:

"Susan Schneider's Proposed Tests for AI Consciousness: Promising but Flawed" (with David Billy Udell), Journal of Consciousness Studies, 28 (5-6), 121-144.