Thursday, December 29, 2022

The Moral Status of Alien Microbes, Plus a Thought about Artificial Life

Some scientists think it's quite possible we will soon find evidence of microbial life in the Solar System, if not on Mars, then maybe in the subsurface oceans of a gas giant's icy moon, such as Europa, Enceladus, or Titan. Suppose we do find alien life nearby. Presumably, we wouldn't or shouldn't casually destroy it. Perhaps the same goes for possible future artificial life systems on Earth.

Now you might think that alien microbes would have only instrumental value for human beings. Few people think that Earthly microbes have intrinsic moral standing or moral considerability for their own sake. There's no "microbe rights" movement, and virtually no one feels guilty about taking an antibiotic to fight a bacterial infection. In contrast, human beings have intrinsic moral considerability: Each one of us matters for our own sake, and not merely for the sake of others.

Dogs also matter for their own sake: They can feel pleasure and pain, and we ought not inflict pain on them unnecessarily. Arguably the same holds for all sentient organisms, including lizards, salmon, and lobsters, if they are capable of conscious suffering, as many scientists now think.

But microbes (presumably!) don't have experiences. They aren't conscious. They can't genuinely suffer. Nor do they have the kinds of goals, expectations, social relationships, life plans, or rational agency that we normally associate with being a target of moral concern. If they matter, you might think, they matter only to the extent they are useful for our purposes -- that is, instrumentally or derivatively, in the way that automobiles, video games, and lawns matter. They matter only because they matter to us. Where would be without our gut microbiome?

If so, then you might think that alien microbes would also matter only instrumentally. We would and should value them as a target of scientific curiosity, as proof that life can evolve in alien environments, and because by studying them we might unlock useful future technologies. But we ought not value them for their own sake.

[An artist's conception of life on Europa] 

Now in general, I think that viewpoint is mistaken. I am increasingly drawn to the idea that everything that exists, even ordinary rocks, has intrinsic value. But even if you don't agree with me about that, you might hesitate to think we should feel free to extinguish alien microbes if it's in our interest. You might think that if we were to find simple alien life in the oceans of Europa, that life would merit some awe, respect, and preservation, independently of their contribution to human interests.

Environmental ethicists and deep ecologists see value in all living systems, independent of their contribution to human interests -- including in life forms that aren't themselves capable of pleasure or pain. It might seem radical to extend this view to microbes; but when the microbes are the only living forms in an entire ecosystem, as they might be an another planet in the Solar System, the idea of "microbe rights" maybe gains some appeal.

I'm not sure exactly how to argue for this perspective, other than just to invite you to reflect on the matter. Perhaps the distant planet thought experiment will help. Consider a far away planet we will never interact with. Would it be better for it to be a sterile rock or for it to have life? Or consider two possible universes, one containing only a sterile planet and one containing a planet with simple life. Which is the better universe? The planet or universe with life is, I propose, intrinsically better.

So also: The universe is better, richer, more beautiful, more awesome and amazing, if Europa has microbial life beneath its icy crust than if it does not. If we then go and destroy that life, we will have made the universe a worse place. We ought not put the Europan ecosystem at risk without compelling need.

I have been thinking about these issues recently in connection with reflections on the possible moral status of artificial life. Artificial life is life, or at least systems that important ways resemble life, created artificially by human engineers and researchers. I'm drawn to the idea that if alien microbes or alien ecosystems can have intrinsic moral considerability, independent of sentience, suffering, consciousness, or human interests, then perhaps sufficiently sophisticated artificial life systems could also. Someday artificial life researchers might create artificial ecosystems so intricate and awesome that they are the ethical equivalent of an alien ecology, right here on Earth, as worth preserving for their own sake as the microbes of Europa would be.

Thursday, December 22, 2022

The Moral Measurement Problem: Four Flawed Methods

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

So you want to build a moralometer -- that is, a device that measures someone's true moral character? Yes, yes. Such a device would be so practically and scientifically useful! (Maybe somewhat dystopian, though? Careful where you point that thing!)

You could try to build a moralometer by one of four methods: self-report, informant report, behavioral measurement, or physiological measurement. Each presents daunting methodological challenges.

Self-report moralometers

To find out how moral a person is, we could simply ask them. For example, Aquino and Reed 2002 ask people how important it is to them to have various moral characteristics, such as being compassionate and fair. More directly, Furr and colleagues 2022 have people rate the extent to which they agree with statements such as "I would say that I am a good person" and "I tend to act morally".

Could this be the basis of a moralometer? That depends on the extent to which people are able and willing to report on their overall morality.

People might be unable to accurately report their overall morality.

Vazire 2010 has argued that self-knowledge of psychological traits tends to be poor when the traits are highly evaluative and not straightforwardly observable (e.g., "intelligent", "creative"), since under those conditions people are (typically) motivated to see themselves favorably and -- due to low observability -- not straightforwardly confronted with the unpleasant news they would prefer to deny.

One's overall moral character is evaluatively loaded if anything is. Nor is it straightforwardly observable. Unlike height or talkativeness, someone motivated not to see themselves as, say, unfair or a jerk can readily find ways to explain away the evidence (e.g., "she deserved it", "I'm in such a hurry").

Furthermore, it sometimes requires a certain amount of moral insight to distinguish morally good from morally bad behavior. Part of being a sexist creep is typically not seeing anything wrong with the kinds of things that sexist creeps typically do. Conversely, people who are highly attuned to how they are treating others might tend to beat themselves up over relatively small violations. We might thus expect a moral Dunning-Kruger effect: People with bad moral character might disproportionately overestimate their moral character, so that people's self-opinions tend to be undiagnostic of the actual underlying trait.

Even to the extent people are able to report their overall morality, people might be unwilling to report it.

It's reasonable to expect that self-reports of moral character would be distorted by socially desirable responding, the tendency for questionnaire respondents to answer in a manner that they believe will reflect well on them. To say that you are extremely immoral seems socially undesirable. We would expect that people (e.g., Sam Bankman-Fried) would tend to want to portray themselves as morally above average. On the flip side, to describe oneself as "extremely moral" (say, 100 on a 0-100 scale from perfect immorality to perfect morality) might come across as immodest. So even people who believe themselves to be tip-top near-saints might not frankly express their high self-opinions when directly asked.

Reputational moralometers

Instead of asking people to report on their own morality, could we ask other people who know them? That is, could we ask their friends, family, neighbors, and co-workers? Presumably, the report would be less distorted by self-serving or ego-protective bias. There's less at stake when judging someone else's morality than when judging your own. Also, we could aggregate across multiple informants, combining several different people's ratings, possibly canceling out some sources of noise and bias.

Unfortunately, reputational moralometers -- while perhaps somewhat better than self-report moralometers -- also present substantial methodological challenges.

The informant advantage of decreased bias could be offset by a corresponding increased in ignorance.

Informants don't observe all of the behavior of the people whose morality they are judging, and they have less access to the thoughts, feelings, and motivations that are relevant to the moral assessment of behavior. Informant reports are thus likely to be based only on a fraction of the evidence that self-report would be based on. Moreover, people tend to hide their immoral behaviors, and presumably some people are better at doing so than others. Also, people play different roles in our lives, and romantic partners, coworkers, friends, and teachers will typically only see us in limited, and perhaps unrepresentative, contexts. A good moralometer would require the correct balancing of a range of informants with complementary patches of ignorance, which is likely to be infeasible.

Informants are also likely to be biased.

Informant reports may be contaminated not by self-serving bias but by "pal-serving bias" (Leising et al 2010). If we rely on people to nominate their own informants, they are likely to nominate people who have a positive perception of them. Furthermore, the informants might be reluctant "tell on" or badly evaluate their friends, especially in contexts (like personnel selection) where the rating could have real consequences for the target. The ideal informant would be someone who knows the target well but isn't positively biased toward you. In reality, however, there's likely a tradeoff between knowledge and bias, so that those who are most likely to be impartial are not the people who know you best.

Positivity bias could in principle be corrected for if every informant was equally biased, but it's likely that some targets will have informants who are more biased than others.

Behavioral moralometers

Given the problems with self-report and informant report, direct behavioral measures might seem promising. Much of my own work on the morality of professional ethicists and the effectiveness of ethics instruction has depended on direct behavioral measures such as courteous and discourteous behavior at philosophy conferences, theft of library books, meat purchases on campus (after attending a class on the ethics of eating meat), charitable giving, and choosing to join the Nazi party in 1930s Germany. Others have measured behavior in dictator games, lying to the experimenter in laboratory settings, criminal behavior, and instances of comforting, helping, and sharing.

Individual behaviors are only a tiny and possibly unrepresentative sample.

Perhaps the biggest problem with behavioral moralometers is that any single, measurable behavior will inevitably be a minuscule fraction of the person's behavior, and might not be at all representative of the person's overall morality. The inference from this person donated $10 in this instance or this person committed petty larceny two years ago to this person's overall moral character is good or bad is a giant leap from a single observation. Given the general variability and inconstancy of most people's behavior, we shouldn't expect a single observation, or even a few related observations, to provide an accurate picture of the person overall.

Although self-report and informant report are likely to be biased, they aggregate many observations of the target into a summary measure, while the typical behavioral study does not.

There is likely a tradeoff between feasibility and validity.

There are some behaviors that are so telling of moral character that a single observation might reveal a lot: If someone commits murder for hire, we can be pretty sure they're no saint. If someone donates a kidney to a stranger, that too might be highly morally diagnostic. But such extreme behaviors will occur at only tiny rates in the general population. Other substantial immoral behaviors, such as underpaying taxes by thousands of dollars or cheating on one's spouse, might occur more commonly, but are likely to be undetectable to researchers (and perhaps unethical to even try to detect).

The most feasible measures are laboratory measures, such as misreporting the roll of a die to an experimenter in order to win a greater payout. But it's unclear what the relationship is between laboratory behaviors for minor stakes and overall moral behavior in the real world.

Individual behaviors can be difficult to interpret.

Another advantage of self-report and to some extent informant report have over direct behavioral measures is that there's an opportunity for contextual information to clarify the moral value or disvalue of behaviors: The morality of donating $10 or the immorality of not returning a library book might depend substantially on one's motives or financial situation, which self-report or informant report can potentially account for but which would be invisible in a simple behavioral measure. (Of course, on the flip side, this flexibility of interpretation is part of what permits bias to creep in.)

[a polygraph from 1937]

Physiological moralometers

A physiological moralometer would attempt to measure someone's morality by measuring something biological like their brain activity under certain conditions or their genetics. Given the current state of technology, no such moralometer is likely to arise soon. The best known candidate might be the polygraph or lie detector test, which is notoriously unreliable and of course doesn't purport to be a general measure of honesty much less of overall moral character.

Any genetic measure would of course omit any environmental influences on morality. Given the likelihood that environmental influences play a major role in people's moral development, no genetic measure could have a high correlation with a person's overall morality.

Brain measures, being potentially closer to measuring the mental states that underlie morality, don't have a similar ceiling accuracy, but currently look less promising than behavioral measures, informant report measures, and probably even self-report measures.

The Inaccuracy of All Methods

It thus seems likely that there is no good method for accurately measuring a person's overall moral character. Self-report, informant report, behavioral measures, and physiological measures all face large methodological difficulties. If a moralometer is something that accurately measures an individual person's morality, like a thermometer accurately (accurately enough) measures a person's body temperature, there's little reason to think we could build one.

It doesn't follow that we can't imprecisely measure someone's moral character. It's reasonable to expect the existence of small correlations between some potential measures and a person's real underlying overall moral character. And maybe such measures could be used to look for trends aggregated across groups.

Now, this whole post has been premised on the idea that it make sense to talk of a person's overall morality as something that could be captured, at least in principle, by a number such as 0 to 100 or -1 to +1. There are a few reasons to doubt this, including moral relativism and moral incommensurability -- but more on that in a future post.

Tuesday, December 13, 2022

An Objection to Chalmers's Fading Qualia Argument

Would a neuron-for-neuron silicon isomorph of you have conscious experiences? Or is there something special about the biology of neurons, so that no brain made of silicon, no matter how sophisticated and similar to yours, could actually have conscious experiences?

In his 1996 book and a related 1995 article, David Chalmers offers what he calls the "fading qualia" argument that there's nothing in principle special about neurons (see also Cuda 1985). The basic idea is that, in principle, scientists could swap your neurons out one by one, and you'd never notice the difference. But if your consciousness were to disappear during this process, you would notice the difference. Therefore, your consciousness would not disappear. A similar idea underlies Susan Schneider's "Chip Test" for silicon consciousness: To check whether some proposed cognitive substrate really supports consciousness, slowly swap out your neurons for that substrate, a piece at a time, checking for losses of consciousness along the way.

In a recent article critical of Schneider, David Udell and I have criticized her version of the swapping test. Our argument can be adapted to Chalmers's fading qualia argument, which is my project today.

First, a bit more on how the gradual replacement is supposed to work. Suppose you have a hundred billion neurons. Imagine replacing just one of those neurons with a silicon chip. The chemical and electrical signals that serve as inputs to that neuron are registered by detectors connected to the chip. The chip calculates the effects that those inputs would have had on the neuron's behavior -- specifically, what chemical and electrical signals the neuron, had it remained in place, would have given as outputs to other neurons connected to it -- and then delivers those same outputs to those same neurons by effectors attached to the silicon chip on one end and the target neurons at the other end. No doubt this would be complicated, expensive, and bulky; but all that matters to the thought experiment is that it would be possible in principle. A silicon chip could be made to perfectly imitate the behavior of a neuron, taking whatever inputs the neuron would take and converting them into whatever outputs the neuron would emit given those inputs. Given this perfect imitation, no other neurons in the brain would behave differently as a result of the swap: They would all be getting the same inputs from the silicon replacement that they would have received from the original neuron.

So far, we have replaced only a single neuron, and presumably nothing much has changed. Next, we swap another. Then another. Then another, until eventually all one hundred billion have been replaced, and your "neural" structure is now entirely constituted by silicon chips. (If glial cells matter to consciousness, we can extend the swapping process to them also.) The resulting entity will have a mind that is functionally identical to your own at the level of neural structure. This implies that it will have exactly the same behavioral reactions to any external stimuli that you would have. For example, if it is asked, "Are you conscious?" it will say, "Definitely, yes!" (or whatever you would have said), since all the efferent outputs to your muscles will be exactly the same as they would have been had your brain not been replaced. The question is whether the silicon-chipped entity might actually lack conscious experiences despite this behavioral similarity, that is, whether it might be a "zombie" that is behaviorally indistinguishable from you despite having nothing going on experientially inside.

Chalmers's argument is a reductio. Assume for the sake of the reductio that the final silicon-brained you entirely lacks conscious experience. If so, then sometime during the swapping procedure consciousness must either have gradually faded away or suddenly winked out. It's implausible, Chalmers suggests, that consciousness would suddenly wink out with the replacement of a single neuron. (I'm inclined to agree.) If so, then there must be intermediate versions of you with substantially faded consciousness. However, the entity will not report having faded consciousness. Since (ex hypothesi) the silicon chips are functionally identical with the neurons, all the intermediate versions of you will behave exactly the same as they would have behaved if no neurons had been replaced. Nor will there be other neural activity constitutive of believing that your consciousness is fading away: Your unreplaced neurons will keep firing as usual, as if there had been no replacement at all.

However, Chalmers argues, if your consciousness were fading away, you would notice it. It's implausible that the dramatic changes of consciousness that would have to be involved when your consciousness is fading away would go entirely undetected during the gradual replacement process. That would be a catastrophic failure of introspection, which is normally a reliable or even infallible process. Furthermore, it would be a catastrophic failure that occurs while the cognitive (neural/silicon) systems are functioning normally. This completes the reductio. Restated in modus tollens form: If consciousness would disappear during gradual replacement, you'd notice it; but you wouldn't notice it; therefore consciousness would not disappear during gradual replacement.

As Udell and I frame it in our discussion of Schneider, this argument has an audience problem. Its target audience is someone who is worried that despite in-principle functional identicality at the neuronal level, silicon might just not be the right kind of stuff to host consciousness. Someone who has this worry presumably does not trust the introspective reports, or the seemingly-introspective reports, of the silicon-brained entity. The silicon-brained entity might say "Yes, of course I'm conscious! I'm experiencing right now visual sensations of your face, auditory sensations of my voice, and a rising feeling of annoyance at your failure to believe me!" The intended audience remains unconvinced by this apparent introspective testimony. They need an argument to be convinced otherwise -- the Fading Qualia argument.

Let's call the entity (the person) before any replacement surgery r0, and the entity after all their neurons are replaced rn, where n is the total number of neurons replaced. During replacement, this entity passes through stages r1, r2, r3, ... ri, ... rn. By stipulation, our audience doesn't trust the introspective or seemingly introspective judgments of rn. This is the worry that motivates the need for the Fading Qualia argument. In order for the argument to work, there must be some advantage that the intermediate ri entities systematically possess over rn, such that we have reason to trust their introspective reports despite distrusting rn's report.

Seemingly introspective reports about conscious experience may or may not be trustworthy in the normal human case (Schwitzgebel 2011; Irvine 2013). But even if they're trustworthy in the normal human case, they might not be trustworthy in the unusual case of having pieces of one's brain swapped out. One might hold that introspective judgments are always trustworthy (absent a certain range of known defeaters, which we can stipulate are absent), in other words, that unless a process accurately represents a target conscious experience it is not a genuinely introspective process. This is true, for example on containment views of introspection, according to which properly formed introspective judgments contain the target experiences as a part (e.g., "I'm experiencing [this]"). Infallibilist views of introspection of that sort contrast with functionalist views of introspection, on which introspection is a fallible functional process that garners information about a distinct target mental state.

A skeptic about silicon consciousness might either accept or reject an infallibilist view of introspection. The Fading Qualia argument will face trouble either way.

[A Trilemma for the Fading Qualia Argument (click to enlarge and clarify figure): Optimists about silicon chip consciousness have no need for an argument in favor of rn consciousness, because they are already convinced of its possibility. On the other hand, skeptics about silicon consciousness are led to doubt either the presence or the reliability of ri's introspection (depending on their view of introspection) for the same reason they are led to doubt rn's consciousness in the first place.]

If a silicon chip skeptic holds that genuine introspection requires and thus implies genuine consciousness, then they will want to say that a "zombie" rn, despite emitting what looks from the outside like an introspective report of conscious experience, does not in fact genuinely introspect. With no genuine conscious experience for introspection to target, the report must issue, on this view, from some non-introspective process. This raises the natural question of why they should feel confident that the intermediate ris are genuinely introspecting, instead of merely engaging in a non-introspective process similar to rn's. After all, there is substantial architectural similarity between rn at at least the late-stage ris. The skeptic needs, but Chalmers does not provide, some principled reason to think that entities in the ri phases would in fact introspect despite rn's possible failure to do so -- or at least good reason to believe that the ris would successfully introspect their fading consciousness during the most crucial stages of fade-out. Absent this, reasonable doubt about rn introspection naturally extends into reasonable doubt about introspection in the ri cases as well. The infallibilist skeptic about silicon-based consciousness needs their skepticism about introspection to be assuaged for at least those critical transition points before they can accept the Fading Qualia argument as informative about rn's consciousness.

If a skeptic about silicon-based consciousness believes that genuine introspection can occur without delivering accurate judgments about consciousness, analogous difficulties arise. Either rn does not successfully introspect, merely seeming to do so, in which case the argument of the previous paragraph applies, or rn does introspect and concludes that consciousness has not disappeared or changed in any radical way. The functionalist or fallibilist skeptic about silicon-based consciousness does not trust that rn has introspected accurately. On their view, rn might in fact be a zombie, despite introspectively-based claims otherwise. Absent any reason for the fallibilist skeptic about silicon-based consciousness to trust rn's introspective judgments, why should they trust the judgments of the ris -- especially the late-stage ris? If rn can mistakenly judge itself conscious, on the basis of its introspection, might someone undergoing the gradual replacement procedure also erroneously judge its consciousness not to be fading away? Gradualness is no assurance against error. Indeed, error is sometimes easier if we (or "we") slowly slide into it.

This concern might be mitigated if loss of consciousness is sure to occur early in the replacement process, when the entity is much closer to r0 than rn, but I see no good reason to make that assumption. And even if we were to assume that phenomenal alterations would occur early in the replacement process, it's not clear why the fallibilist should regard those changes as the sort that introspection would likely detect rather than miss.

The Fading Qualia argument awkwardly pairs skepticism about rn's introspective judgments with unexplained confidence in the ri's introspective judgments, and this pairing isn't theoretically stable on any view of introspection.

The objection can be made vivid with a toy case: Suppose that we have an introspection module in the brain. When the module is involved in introspecting a conscious mental state, it will send query signals to other regions of the brain. Getting the right signals back from those other regions -- call them regions A, B, and C -- is part of the process driving the judgment that experiential changes are present or absent. Now suppose that all the neurons in region B have been replaced with silicon chips. Silicon region B will receive input signals from other regions of the brain, just as neural region B would have, and silicon region B will then send output signals to other brain regions that normally interface with neural region B. Among those output signals will be signals to the introspection module.

When the introspection module sends its query signal to region B, what signal will it receive in return? Ex hypothesi, the silicon chips perfectly functionally emulate the full range of neural processes of the neurons they have replaced; that's just the set-up of the Fading Qualia argument. Given this, the introspection module would of course receive exactly the same signal it would have received from region B had region B not been replaced. If so, then entity ri will presumably infer that activity in region B is conscious. Maybe region B normally hosts conscious experiences of thirst. The entity might then say to itself (or aloud), "Yes, I'm still feeling thirsty. I really am having that conscious experience, just as vividly, with no fading, despite the replacement of that region of my brain by silicon chips." This would be, as far as the entity could tell, a careful and accurate first-person introspective judgment.

(If, on the other hand, the brain region containing the introspection module is the region being replaced, then maybe introspection isn't occurring at all -- at least in any sense of introspection that is committed to the idea that introspection is a conscious process.)

A silicon-chip consciousness optimist who does not share the skeptical worries that motivate the need for the Fading Qualia argument might be satisfied with that demonstration. But the motivating concern, the reason we need the argument, is that some people doubt that silicon chips could host consciousness even if they can behave functionally identically with neurons. Those theorists, the target audience of the Fading Qualia argument, should remain doubtful. They ought to worry that the silicon chips replacing brain region B don't genuinely host consciousness, despite feeding output to the introspection module that leads ri to conclude that consciousness has not faded at all. They ought to worry, in other words, that the introspective process has gone awry. This needn't be a matter of "sham" chips intentionally designed to fool users. It seems to be just a straightforward engineering consequence of designing chips to exactly mimic the inputs and outputs of neurons.

This story relies on a cartoon model of introspection that is unlikely to closely resemble the process of introspection as it actually occurs. However, the present argument doesn't require the existence of an actual introspection module or query process much like the toy case above. An analogous story holds for more complex and realistic models. If silicon chips functionally emulate neurons, there is good reason for someone with the types of skeptical worries about silicon-based consciousness that the Fading Qualia argument is designed to address to similarly worry that replacing neurons with functionally perfect silicon substitutes would either create inaccuracies of introspection or replace the introspective process with whatever non-introspective process even zombies engage in.

The Fading Qualia argument thus, seemingly implausibly, combines distrust of the putative introspective judgments of rn with credulousness about the putative introspective judgments of the series of ris between r0 and rn. An adequate defense of the Fading Qualia argument will require careful justification of why someone skeptical about the seemingly introspective judgments of an entity whose brain is entirely silicon should not be similarly skeptical about similar seemingly introspective judgments that occur throughout the gradual replacement process. As it stands, the argument lacks the necessary resources legitimately to assuage the doubts of those who enter it uncertain about whether consciousness would be present in a neuron-for-neuron silicon isomorph.



"Chalmers's Fading/Dancing Qualia and Self-Knowledge" (Apr 22, 2010)

"How to Accidentally Become a Zombie Robot" (Jun 23, 2016)

Much of the text above is adapted with revisions from:

"Susan Schneider's Proposed Tests for AI Consciousness: Promising but Flawed" (with David Billy Udell), Journal of Consciousness Studies, 28 (5-6), 121-144.

Wednesday, December 07, 2022

An Accurate Moralometer Would Be So Useful... but Also Horrible?

Imagine, if you can, an accurate moralometer -- an inexpensive device you could point at someone to get an accurate reading of their overall moral goodness or badness. Point it at Hitler and see it plunge down into the deep red of evil. Point it at your favorite saint and see it rise up to the bright green of near perfection. Would this be a good thing or a bad thing to have?

Now maybe you can't imagine an accurate moralometer. Maybe it's just too far from being scientifically feasible -- more on this in an upcoming post. Or maybe, more fundamentally, morality just isn't the kind of thing can can be reduced to scalar values of say +0.3 on a spectrum from -1 to +1. Probably that issue deserves a post also. But let's set qualms aside for the sake of this thought experiment. $49.95 buys you a radar-gun-like device that instantly measures the overall moral goodness of anyone you point it at, guaranteed.

[a "moralometer" given to me for my birthday a couple of years ago by my then thirteen-year-old daughter]

Imagine the scientific uses!

Suppose we're interested in moral education: What interventions actually improve the moral character of the people they target? K-12 "moral education" programs? Reading the Bible? Volunteering at a soup kitchen? Studying moral philosophy? Buddhist meditation? Vividly imagining yourself in others' shoes? Strengthening one's social networks? Instantly, our moralometer gives us the perfect dependent measure. We can look at both the short-term and long-term effects of various interventions. Effective ones can be discovered and fine-tuned, ineffective ones unmasked and discarded.

Or suppose we're interested in whether morally good people tend to be happier than others. Simply look for correlations between our best measures of happiness and the outputs of our moralometer. We can investigate causal relationships too: Conduct a randomized controlled study of interventions on moral character (by one of the methods discovered to be effective), and see if the moral treatment group ends up happier than the controls.

Or suppose we're interested in seeing whether morally good people make better business leaders, or better kindergarten teachers, or better Starbucks cashiers, or better civil engineers. Simply look for correlations between moralometer outputs and performance measures. Voila!

You might even wonder how could we even pretend to study morality without some sort of moralometer, of at least a crude sort. Wouldn't that be like trying to study temperature without a thermometer? It's hard to see how one could make any but the crudest progress. (In a later post, I'll argue that this is in fact our situation.)

Imagine, too, the practical uses!

Hiring a new faculty member in your department? Take a moralometer reading beforehand, to ensure you aren't hiring a monster. Thinking about who to support for President? Consider their moralometer reading first. (Maybe Hitler wouldn't have won 37% of the German vote in 1932 if his moralometer reading had been public?) Before taking those wedding vows... bring out the moralometer! Actually, you might as well use it on the first date.

But... does this give you the creeps the way it gives me the creeps?

(It doesn't give everyone the creeps: Some people I've discussed this with think that an accurate moralometer would simply be an unqualified good.)

If it gives you the creeps because you think that some people would be inaccurately classified as immoral despite being moral -- well, that's certainly understandable, but that's not the thought experiment as I intend it. Postulate a perfect moralometer. No one's morality will be underestimated. No one's morality will be overestimated. We'll all just know, cheaply and easily, who are the saints, and who are the devils, and where everyone else is situated throughout the mediocre middle. It will simply make your overall moral character as publicly observable as your height or skin tone (actually a bit more so, to the extent height and skin tone can be to some extent fudged with shoe inserts and makeup). Although your moral character might not be your best attribute -- well, we're judged by height and race too, and probably less fairly, since presumably height and race are less under our control than our character is.

If you share with me the sense that there would be something, well, dystopian about a proliferation of moralometers -- why? I can't quite put my finger on it.

Maybe it's privacy? Maybe our moral character is nobody's business.

I suspect there's something to this, but it's not entirely obvious how or why. If moral character is mostly about how you generally treat people in the world around you... well, that seems like that very much is other people's business. If moral character is about how you would hypothetically act in various situations, a case could be made that even those hypotheticals are other people's business: The hiring department, the future spouse, etc., might reasonably want to know whether you're in general the type of person who would, when the opportunity arises, lie and cheat, exploit others, shirk, take unfair advantage.

It's reasonable to think that some aspects of your moral character might be private. Maybe it's none of my colleagues' business how ethical I am in my duties as a father. But the moralometer wouldn't reveal such specifics. It would just give a single general reading, without embarrassing detail, masking personal specifics behind the simplicity of a scalar number.

Maybe the issue is fairness? If accurate moralometers were prevalent, maybe people low on the moral scale would have trouble finding jobs and romantic partners. Maybe they'd be awarded harsher sentences for committing the same crimes as others of more middling moral status. Maybe they'd be shamed at parties, on social media, in public gatherings -- forced to confess their wrongs, made to promise penance and improvement?

I suspect there's something to this, too. But I hesitate for two reasons. One is that it's not clear that widespread blame and disadvantage would dog the morally below average. Suppose moral character were found to be poorly correlated with, or even inversely correlated with, business success, or success in sports, or creative talent. I could then easily imagine low to middling morality not being a stigma -- maybe even in some circles a badge of honor. Maybe it's the prudish, the self-righteous, the precious, the smug, the sanctimonious who value morality so much. Most of us might rather laugh with the sinners than cry with the saints.

Another hesitation about the seeming unfairness of widespread moralometers is this: Although it's presumably unfair to judge people negatively for their height or their race, which they can't control and which don't directly reflect anything blameworthy, one's moral character, of course, is a proper target of praise and blame and is arguably at least partly within our control. We can try to be better, and sometimes we succeed. Arguably, the very act of sincerely trying to be morally better already by itself constitutes a type of moral improvement. Furthermore, in the world we're imagining, per the scientific reflections above, there will presumably be known effective means for self-improvement for those who genuinely seek improvement. Thus, if a moralometer-heavy society judges someone negatively for having bad moral character, maybe there's no unfairness in that at all. Maybe, on the contrary, it's the very paradigm of a fair judgment.

Nonetheless, I don't think I'd want to live in a society full of moralometers. But why, exactly? Privacy and fairness might have something to do with it, but if so, the arguments still need some work. Maybe it's something else?

Or maybe it's just imaginative resistance on my part -- maybe I can't really shed the idea that there couldn't be a perfect moralometer and so necessarily any purported moralometer will lead to unfairly mistaken judgments? But even if we assume some inaccuracy, all of our judgments about people are to some extent inaccurate. Suppose we could increase the overall accuracy of our moral assessments at the cost of introducing a variable that reduces moral complexity to a single, admittedly imperfect number (like passer rating in American football). Is morality so delicate, so touchy, so electric that it would be better to retain our current very inaccurate means of assessment than to add to our toolkit an imperfect but scientifically grounded scalar measure?

Friday, December 02, 2022

A Chatbot's Take on "Moralometers"

I still have COVID, and I had a book manuscript deadline Wednesday. Things got out of hand, and I wasn't able to write up the blog post I wanted to write this week. Boo!

But as regular readers will know, the chatbot GPT-3 can write Schwitzgebelian blog posts, and -- after fine-tuning on the corpus of Dan Dennett -- can even answer philosophical questions in Dennett's style, which experts on Dennett's work have difficulty discriminating from Dennett's own answers.

On Monday, Open-AI released an update of their GPT-3 model, Davinci-003. Since I'd been planning to write a blog post on the idea of a "moralometer" -- a hypothetical device that would accurately measure people's moral character. So I prompted GPT-3's new davinci003 as follows:

Write a blog post in the style of Eric Schwitzgebel, about 1000 words long, about whether it would be possible to design a moralometer, which accurately measures people's moral character, and what the advantages or disadvantages of having a moralometer would be.

Its reply:

The Moralometer: A Tool for Measuring Moral Character?

In recent years, the idea of a “moralometer” has been gaining traction, as a way of measuring a person’s moral character. The concept is simple: a device that can measure a person’s morality, judged against an agreed-upon ethical framework. The moralometer could be used to determine whether a person is moral or immoral, and to what degree either of these traits are present. It’s an intriguing idea, but can a moralometer truly be built, and what would be the pros and cons of such a device?

The first question to address is whether a moralometer is feasible. On the surface, the idea of a device that can measure morality seems far-fetched. After all, morality is a highly subjective concept, and what is considered moral or immoral can vary greatly from person to person, and even from culture to culture. In addition, moral judgments are so often based on context, which makes it difficult to design a device that would accurately measure morality in all scenarios.

However, it may be possible to create a moralometer, with the right approach. The key would be to develop a device that could measure a person’s moral values and tendencies, as opposed to their specific moral judgments. By doing this, it would be possible to create a device that can accurately measure a person’s overall moral character, without being weighed down by the complexities of individual moral judgments.

The second question is whether a moralometer would be beneficial. On the one hand, a moralometer could be used to identify immoral behavior and help guide people towards more ethical behavior. It could also be used to help individuals make moral decisions, by providing an objective measure of the morality of their choices. On the other hand, there are some serious potential issues with a moralometer. For one, it could be used to unfairly judge people, as those with lower moral scores could be unfairly labeled as immoral. In addition, a moralometer could be used to enforce moral conformity, which could stifle individual autonomy and creativity.

Overall, there are both potential advantages and disadvantages to a moralometer. While it could be used to help identify and address immoral behavior, it also carries the risk of being used to unfairly judge people, or to enforce moral conformity. Ultimately, the decision of whether or not to create a moralometer will be up to society, and the implications of such a device should be carefully considered before any action is taken.

Next week, wind and meteors and immune system willing, I will try my own hand at the topic. The pressure is on! Will I be able to do a better job than davinci003? Stay tuned!

Relatedly, here's Dall-E's interpretation of a "realistic AI image of a meter for measuring morality":

Update December 7:
If you want, rate the chatbot post above and my own post on a postometer from 0 to 10.

Friday, November 25, 2022

The Egg Came First (Repost)

I have COVID. It's Thanksgiving holiday in the U.S. And revisions of my next book manuscript are due in five days. So it's time to lean upon the old blogger's fallback of reposting an old favorite. "The Egg Came First" from 2012. I was reminded of it by Anna Strasser, who says that David Papineau is being interviewed once again on the timeless philosophical conundrum of chicken-or-egg. I hope David has recanted his recant of his original view!

[Dall-E image of a series of chickens and eggs, in the style of Van Gogh]

The Egg Came First

It is only natural that, when confronted with timeless and confounding questions, your friends should turn to you, the philosopher. Sooner or later, then, they will ask you which came first, the chicken or the egg. You must be prepared to discuss this issue in pedantic depth or lose your reputation for intimidating scholarly acumen. Only after understanding this issue will you be prepared for even deeper and more troubling questions such as "Is water wet? Or is water only something that makes other things wet?"

The question invites us to consider a sequence of the following sort, stretching back in time: chicken, egg, chicken, egg, chicken.... The first term of the series can be chosen arbitrarily. The question is the terminus. If one assumes an infinite past and everlasting species, there may be no terminus. However, the cosmological assumptions behind such a view are highly doubtful. Therefore, it seems, there must be a terminus member of the series, temporally first, either a chicken or an egg. The question which came first is often posed rhetorically as though it were obvious that there could be no good epistemic grounds for choice. However, as I aim to show, this appearance of irresolvability is misleading. The egg came first.

Young Earth Creationist views merit brief treatment. If God created chickens on the Fourth Day along with "every kind of winged creature", then the question is whether He chose to create the chicken first, the egg first, both types simultaneously, or a being at the very instant of transition between egg and chicken (when it is arguably either both or neither). The question thus dissolves into the general mystery of God's will. Textual evidence somewhat favors either the chicken or both, since God said "let birds fly above the earth" and the Bible then immediately states "and so it was", before transition to the Fifth Day. So at least some winged creatures were already flying on the Fourth Day, and one day is ordinarily insufficient time for eggs to mature into flying birds. Since chickens aren't much prone to fly, though, it's dubious whether such observations extend to them, unless God implemented a regular rule in which winged creatures were created either mature or simultaneously in a mix of mature and immature states. And in any case, it is granted on all sides that events were unusual and not subject to the normal laws of development during the first Six Days.

If we accept the theory of evolution, as I think we should, then the chicken derives from a lineage that ultimately traces back to non-chickens. (The issues here are the same whether we consider the domestic chicken to be its own species or whether we lump it together with the rest of gallus gallus including the Red Junglefowl from which the domestic chicken appears to be mostly descended.) The first chicken arose either as a hybrid of two non-chickens or via mutation from a non-chicken. Consider the mutation case first. It's improbable (though not impossible) that between any two generations in avian history, X and X-1, there would be enough differentiation for a clean classification of X as a chicken and X-1 as a non-chicken. Thus we appear to have a Sorites case. Just as it seems that adding one grain to a non-heap can't make it a heap, resulting in the paradox that no addition of single grains could ever make a heap, so also one might worry that one generation's difference could never (at least with any realistic likelihood) make the difference between a chicken and a non-chicken, resulting in the paradox of chickens in the primordial soup.

Now there are things philosophers can do about these paradoxes. Somehow heaps arise, despite the argument above. One simple approach is epistemicism, according to which there really is a sharp line in the world such that X-1 is a non-heap and X is a heap, X-1 is a non-chicken and X is a chicken. On this view, our inability to discern this line is merely an epistemic failure on our part. Apparent vagueness is really only ignorance. Another simple approach is to allow that there really are vague properties in the world that defy classification in the two-valued logic of true and false. On this view, between X, which is definitely a chicken, and X-N, which is definitely a non-chicken, there are some vague cases of which it is neither true nor false that it is a chicken, or somehow both true and false, or somewhere between true and false, or something like that. There are also more complicated views, too, than these, but we needn't enter them, because one key point remains the same across all these Sorites approaches: The Sorites cases progress not as follows: X chicken, X-1 egg, X-2 chicken, X-3 egg, X-4 chicken.... Rather, they progress in chicken-egg pairs. From a genetic perspective, since the chicken and egg share DNA, they form a single Sorites unit. Within this unit, the egg clearly comes first, since the chicken is born from the egg, sharing its DNA, and there is a DNA difference between the egg and the hen from which that egg is laid. For a ridiculous argument to the contrary, see here.

If we turn to the possibility of speciation by hybridization, similar considerations apply.

A much poorer argument for the same conclusion runs as follows: Whatever ancestor species gave rise to chickens presumably laid eggs. Therefore, there were eggs long before there were chickens. Therefore, the egg came first. The weakness in this argument is that it misconstrues the original question. The question is not "Which came first, chickens or eggs?" but rather "Which came first, the first chicken or the first chicken egg?"

However, the poverty of this last argument does raise vividly the issue of how one assigns eggs to species. The egg-first conclusion could be evaded if we typed eggs by reference to the mother: If the mother is a chicken, the egg is a chicken egg; if the mother is not a chicken, the egg is not a chicken egg. David Papineau succinctly offers the two relevant considerations against such a view here. First, if we type by DNA, which would seem to be the default biological standard, the egg shares more of its DNA with the hatchling than with its parent. Second, as anyone can see via intuitive armchair reflection on a priori principles: "If a kangaroo laid an egg from which an ostrich hatched, that would surely be an ostrich egg, not a kangaroo egg."

(HT: Walter Sinnott-Armstrong, who in turn credited Roy Sorenson.)

Update, Feb. 2, 2012:
In the comments, Papineau reveals that he has recanted in light of considerations advanced by Mohan Matthen in his important but so far sadly neglected "Chicken, Eggs, and Speciation" -- considerations also briefly mentioned by Ron Mallon in his comment. Although I find merit in these remarks, I am not convinced and I believe Papineau has abandoned the egg-first view too precipitously.

Matthen argues that: "Speciation occurs when a population comes to be reproductively isolated because the last individual that formerly bridged that population to others died, or because this individual ceased to be fertile (or when other integrating factors cease to operate)" (2009, p. 110). He suggests that this event will normally occur when both soon-to-be-chickens and soon-to-be-chicken-eggs exist in the population. Thus, he concludes, a whole population of chickens and eggs is simultaneously created in a single instant. In assessing this view let me note first that depending on the size of the population and its egg-laying habits, this view might suggest a likelihood of chickens first. Suppose that in a small population of ancestral pre-chickens the last bridge individual dies outside of laying season; or suppose that the end of an individual's last laying season marks the end of an individual's fertility. If there are no out-of-season eggs at the crucial moment, then chickens came first.

More importantly, however, Matthen's criterion of speciation leads to highly counterintuitive and impractical results. Matthen defines reproductive isolation between populations in terms of the probability of gene transfer between those populations. (Also relevant to his distinction is the shape of the graph of the likelihood of gene transfer by number of generations, but that complication isn't relevant to the present issue.) But probability of gene transfer can be very sharply affected by factors that don't seem to create comparably sizable influences on species boundaries. So, for example, when human beings migrated to North America, the probability of gene transfer with the ancestral population declined sharply, and soon became essentially zero (and in any case in excess of the probability of gene transfer between geographically coincident hybridizing species). By Matthen's criterion, this would be a speciating event. After Columbus, gene transfer probability slowly rose and by now gene transfer is very high between individuals with Native American ancestry and those without. Thus, by Matthen's criterion, Native Americans were for several thousand years a distinct species -- not homo sapiens! -- and now they are homo sapiens again. If the moment of change was Columbus's first landing (or some other discrete moment), then the anchoring of a ship, or some other event, perhaps a romantic interlude between Pocahontas and John Smith, caused everyone on the two continents simultaneously to change species!

More simply, we might imagine a chicken permanently trapped in an inescapable cage. Its probability of exchanging genes with other individuals is now zero. Since Matthen allows for species consisting of a single individual, this chicken has now speciated. Depending on how we interpret the counterfactual probabilities, we might even imagine opening and shutting the door repeatedly (perhaps due to some crazy low-probability event) causing that individual to flash repeatedly back and forth between being a chicken and being a non-chicken, with no differences in morphology, actual behavior, location, or sexual preference during the period. On the surface, it seems that Matthen's criterion might even result in all infertile individuals belonging to singleton species.

There are both philosophical and practical biological reasons not to lightly say that individuals may change species during their lifetimes. One consideration is that of animal identity. If I point at an individual chicken and ask at what point the entity at which I am pointing ceases to exist, there are good practical (and maybe metaphysical) reasons to think that the entity does not cease to exist when a single feather falls off, nor to think that it continues to exist despite being smushed into gravy. The most natural and practical approach, it seems, is to say that the entity to which I intend to refer (in the normal case) is essentially a chicken and thus that it continues to exist exactly as long as it remains a chicken. Consequently, on the assumption that the individual pre-chicken avians don't cease to exist when they become reproductively isolated, they remain non-chickens despite overall changes in the makeup of the avian population. (These individuals may, nonetheless, give birth to chickens.) Nor does it seem that any important scientific biological purpose would be served by requiring the relabeling of individual organisms, depending on population movements, once those organisms are properly classified. Long-enduring organisms, such as trees, seem best classified as members of the ancestral population they were born into, even if their species has moved on since. Long-lived individuals can remain as living remnants of the ancestral species -- a species with temporally ragged but individual-respecting borders. The attractiveness of this view is especially evident if we consider the possibility of thawing a long-frozen dinosaur egg.

Matthen argues as follows against the those who embrace either an egg-first or a chicken-first view: The first chicken would need to have descendants by breeding with a non-chicken, but since by definition species are reproductively isolated this view leads to contradiction. This consequence is easily evaded with the right theory of vagueness and a suitable interpretation of the reproductive isolation criterion. On my preferred theory of vagueness, there will be individuals of which it's neither determinately true nor determinately false that they are chickens. We can then define reproductive isolation as the view that no individual of which it is determinately true that it is a member of species X can reproduce with an individual of which it is determinately false that it is a member of species X. As long as all breeding is between determinate members and individuals in the indeterminate middle, the reproductive isolation criterion is satisfied. (This is not to concede, however, that species should be defined entirely in terms of reproductive isolation, given the problems in articulating that criterion plausibly, some of which are noted above.)

Second update, Feb. 3, 2012:
The issues prove even deeper and more convoluted than I thought! In the comments section, Matthen has posted a reply to my objections, which we pursue for a couple more conversational turns. Although I'm not entirely ready to accept his account of species, I see merit in his thought that the best unit of evaluation might be the population rather than the individual, and if there is a first moment at which the population as a whole becomes a chicken population (rather than speciation involving temporally ragged but individual-respecting borders), then that might be a moment at which multiple avians and possibly multiple avian eggs simultaneously become chickens and chicken eggs.

An anonymous reader raises another point that seems worth developing. If we think of "chickens" not exclusively in terms of their membership in a biologically discriminable species but at least partly in terms of their domestication, then the following considerations might favor a chicken-first perspective. Some act of domestication -- either an act of behavioral training or an act of selection among fowl -- was the last-straw change from non-chickenhood to chickenhood, creating the first chicken. But this act was very likely performed on a newly-hatched or adult bird, not on an egg, since eggs are not trainable and hard to discriminate usefully among. Therefore the first entity in the chicken-egg sequence was a chicken, not an egg. For some reason, I find it much more natural to accept the possibility that a non-chicken could become a chicken mid-life if chickenhood is conceived partly in terms of domestication than if it is conceived entirely as a matter of traditional biological species. (I'm not sure how stable this argument is, however, across different accounts of vagueness.)

Third update, Nov. 25, 2022:
My second update was too concessive to Matthen. Reviewing his comments now I think I was too soft. I will stick by my guns. Species have temporally ragged borders, and for each individual the egg comes first!

[Check out the comments section on the original post]

Thursday, November 17, 2022

Citation Rates by Academic Field: Philosophy Is Near the Bottom

Citation rates increasingly matter.  Administrators look at them as evidence of scholarly impact.  Researchers familiarizing themselves with a new topic notice which articles are highly cited, and they are more likely to read and cite those articles.  The measures are also easy to track, making them apt targets for gamification and value capture: Researchers enjoy, perhaps a bit too much, tracking their rising h-indices.

This is mixed news for philosophy.  Noticing citation rates can be good if it calls attention to high-quality work that would otherwise be ignored, written by scholars in less prestigious universities or published in less prestigious journals.  And there's value in having more objective indicators of impact than what someone with a named chair at Oxford says about you.  However, the attentional advantage of high-citation articles amplifies the already toxic rich-get-richer dynamic of academia; there's a temptation to exploit the system in ways that are counterproductive to good research (e.g., salami slicing articles, loading up co-authorships, and excessive self-citation); and it can lead to the devaluation of important research that isn't highly cited.

Furthermore, focus on citation rates tends to make philosophy, and the humanities in general, look bad.  We simply don't cite each other as much as do scientists, engineers, and medical researchers.  There are several reasons.

One reason is the centrality of books to the humanities.  Citations in and of books are often not captured by citation indices.  And even when citation to a book is captured, a book typically represents a huge amount of scholarly work per citation, compared to a dozen or more short articles.

Another reason is the relative paucity of co-authorship in philosophy and other humanities.  In the humanities, books and articles are generally solo-authored, compared to the sciences, engineering, and medicine, where author lists are commonly three or five, and sometimes dozens, with each author earning a citation any time the article is cited.

Publication rates are probably also overall higher in the sciences, engineering, and medicine, where short articles are common.  Reference lists might also be longer on average.  And in those fields the cited works are rarely historical.  Combined, these factors create a much larger pool of overall citations to be spread among current researchers.

Perhaps there are other factors a well.  In all, even excellent and influential philosophers often end up with citation numbers that would be embarrassing for most scientists at a comparable career stage.  I recently looked at a case for promotion to full professor in philosophy, where the candidate and one letter writer both touted the candidate's Google Scholar h-index of 8 -- which is actually good for someone at that career stage in philosophy, but could be achieved straight out of grad school by someone in a high-citation field if their advisor is generous about co-authorship.

To quantify this, I looked at the September 2022 update of Ioannidis, Boyack, and Baas's "Updated science-wide author databases of standardized citation indicators".  Ioannidis, Boyack, and Baas analyze the citation data of almost 200,000 researchers in the Scopus database (which consists mostly of citations of journal articles by other journal articles) from 1996 through 2021. Each researcher is attributed one primary subfield, from 159 different subfields, and each researcher is ranked according to several criteria.  One subfield is "philosophy".

Before I get to the comparison of subfields, you might be curious to see the top 100 ranked philosophers, by the composite citation measure c(ns) that Ioannidis, Boyack, and Baas seem to like best:

1. Nussbaum, Martha C.
2. Clark, Andy
3. Lewis, David
4. Gallagher, Shaun
5. Searle, John R.
6. Habermas, Jürgen
7. Pettit, Philip
8. Buchanan, Allen
9. Goldman, Alvin I.
10. Williamson, Timothy
11. Thagard, Paul
12. Lefebvre, Henri
13. Chalmers, David
14. Fine, Kit
15. Anderson, Elizabeth
16. Walton, Douglas
17. Pogge, Thomas
18. Hansson, Sven Ove
19. Schaffer, Jonathan
20. Block, Ned
21. Sober, Elliott
22. Woodward, James
23. Priest, Graham
24. Stalnaker, Robert
25. Bechtel, William
26. Pritchard, Duncan
27. Arneson, Richard
28. McMahan, Jeff
29. Zahavi, Dan
30. Carruthers, Peter
31. List, Christian
32. Mele, Alfred R.
33. Hardin, Russell
34. O'Neill, Onora
35. Broome, John
36. Griffiths, Paul E.
37. Davidson, Donald
38. Levy, Neil
39. Sosa, Ernest
40. Hacking, Ian
41. Craver, Carl F.
42. Burge, Tyler
43. Skyrms, Brian
44. Strawson, Galen
45. Prinz, Jesse
46. Fricker, Miranda
47. Honneth, Axel
48. Machery, Edouard
49. Stanley, Jason
50. Thompson, Evan
51. Schatzki, Theodore R.
52. Bohman, James
53. Norton, John D.
54. Bach, Kent
55. Recanati, François
56. Sider, Theodore
57. Lowe, E. J.
58. Hawthorne, John
59. Dreyfus, Hubert L.
60. Godfrey-Smith, Peter
61. Wright, Crispin
62. Cartwright, Nancy
63. Bunge, Mario
64. Raz, Joseph
65. Bostrom, Nick
66. Schwitzgebel, Eric
67. Nagel, Thomas
68. Okasha, Samir
69. Velleman, J. David
70. Putnam, Hilary
71. Schroeder, Mark
72. Ladyman, James
73. van Fraassen, Bas C.
74. Hutto, Daniel D.
75. Annas, Julia
76. Bird, Alexander
77. Bicchieri, Cristina
78. Audi, Robert
79. Enoch, David
80. McDowell, John
81. Noë, Alva
82. Carroll, Noël
83. Williams, Bernard
84. Pollock, John L.
85. Jackson, Frank
86. Gardiner, Stephen M.
87. Roskies, Adina
88. Sagoff, Mark
89. Kim, Jaegwon
90. Parfit, Derek
91. Jamieson, Dale
92. Makinson, David
93. Kriegel, Uriah
94. Horgan, Terry
95. Earman, John
96. Stich, Stephen P.
97. O'Neill, John
98. Popper, Karl R.
99. Bratman, Michael E.
100. Harman, Gilbert

All, or almost all, of these researchers are influential philosophers.  But there are some strange features of this ranking.  Some people are clearly higher than their impact warrants; others lower.  So as not to pick on any philosopher who might feel slighted by my saying that they are too highly ranked, I'll just note that on this list I am definitely over-ranked (at #66) -- beating out Thomas Nagel (#67) among others.  Other philosophers are missing because they are classified under a different subfield.  For example Daniel C. Dennett is classified under "Artificial Intelligence and Image Processing".  Saul Kripke doesn't make the list at all -- presumably because his impact was through books not included in the Scopus database.

Readers who are familiar with mainstream Anglophone academic philosophy will, I think, find my ranking based on citation rates in the Stanford Encyclopedia more plausible, at least as a measure of impact within mainstream Anglophone philosophy.  (On the SEP list, Nagel is #11 and I am #251.)

To compare subfields, I decided to capture the #1, #25, and #100 ranked researchers in each subfield, excluding subfields with fewer than 100 ranked researchers.  (Ioannidis et al. don't list all researchers, aiming to include only the top 100,000 ranked researchers overall, plus at least the top 2% in each subfield for smaller or less-cited subfields.)

A disadvantage of my approach to comparing subfields by looking at the 1st, 25th, and 100th ranked researchers is that being #100 in a relatively large subfield presumably indicates more impact than being #100 in a relatively small subfield.  But the most obvious alternative method -- percentile ranking by subfield -- plausibly invites even worse trouble, since there are huge numbers of researchers in subfields with high rates of student co-authorship, making it too comparatively easy to get into the top 2%.  (For example, decades ago my wife was published as a co-author on a chemistry article after a not-too-demanding high school internship.)  We can at least in principle try to correct for subfield size by looking at comparative faculty sizes at leading research universities or attendance numbers at major disciplinary conferences.

The preferred Ioannidis, Boyack, and Baas c(ns) ranking is complex, and maybe better than simpler ranking systems.  But for present purposes I think it's most interesting to consider the easiest, most visible citation measures, total citations and h-index (with no exclusion of self-citation), since that's what administrators and other researchers see most easily.  H-index, if you don't know it, is the largest number h such that h of the author's articles have at least h citations each.  (For example, if your top 20 most-cited articles are each cited at least 20 times, but your 21st most-cited article is cited less than 21 times, your h-index is 20.)

Drumroll please....  Scan far, far, down the list to find philosophy.  This list is ranked in order of total citations by the 25th most-cited researcher, which I think is probably more stable than 1st or 100th.  [click image to scale and clarify]

Philosophy ranks 126th of the 131 subfields.  The 25th-most-cited researcher in philosophy, Alva Noe, has 3,600 citations in the Scopus database.  In the top field, developmental biology, the 25th-most-cited researcher has 142,418 citations -- a ratio of almost 40:1.  Even the 100th-most-cited researcher in developmental biology has more than five times as many citations as the single most cited philosopher in the database.

The other humanities also fare poorly: History at 129th and Literary Studies at 130th, for example.  (I'm not sure what to make of the relatively low showing of some scientific subfields, such as Zoology.  One possibility is that it is a relatively small subfield, with most biologists classified in other categories instead.)

Here's the chart for h-index [click to scale and clarify]:

Again, philosophy is 126th out of 131.  The 25th-ranked philosopher by h-index, Alfred Mele, has an h of only 27, compared to an h of 157 for the 25th-ranked researcher in Cardiovascular System & Hematology.

(Note: If you're accustomed to Google Scholar, Scopus h-indices tend to be lower.  Alfred Mele, for example, has twice as high an h-index in Google Scholar as in Scopus: 54 vs. 27.  Google Scholar h-indices are also higher for non-philosophers.  The 25th ranked researcher in Cardiovascular System & Hematology doesn't have a Google Scholar profile, but the 26th ranked does: Bruce M Psaty, h-index 156 in Scopus vs. 207 in Scholar.)

Does this mean that we should be doubling or tripling the h-indices of philosophers when comparing their impact with that of typical scientists, to account for the metrical disadvantages they have as a result of having fewer coauthors, on average longer articles, books that are poorly captured by these metrics, slower overall publication rates, etc.?  Well, it's probably not that simple.  As mentioned, we would want to at least take field size into account.  Also, a case might be made that some fields are just generally more impactful than others, for example due to interdisciplinary or public influence, even after correction for field size.  But one thing is clear: Straightforward citation-count and h-index comparisons between the humanities and the sciences will inevitably put humanists at a stark, and probably unfair, disadvantage.

Update, December 21, 2022:

Friday, November 11, 2022

Credence-First Skepticism

Philosophers usually treat skepticism as a thesis about knowledge. The skeptic about X holds that people who claim to know X don't in fact know X. Religious skeptics think that people who say they know that God exists don't in fact know that. Skeptics about climate change hold that we don't know that the planet is warming. Radical philosophical skepticism asserts broad failures of knowledge. According to dream skepticism, we don't know we're not dreaming. According to external world skepticism, we lack knoweldge about the world beyond our own minds.

Treating skepticism as a thesis about knowledge makes the concept or phenomenon of knowledge crucially important to the evaluation of skeptical claims. The higher the bar for knowledge, the easier it is to justify skepticism. For example, if knowledge requires perfect certainty, then we can establish skepticism about a domain by establishing that perfect certainty is unwarranted in that domain. (Imagine here the person who objects to an atheist by extracting from the atheist the admission that they can't be certain that God doesn't exist and therefore they should admit that they don't really know.) Similarly, if knowledge requires knowing that you know, then we could establish skepticism about X by establishing that you can't know that you know about X. If knowledge requires being able to rule out all relevant alternatives, then we can establish skepticism by establishing that there are relevant alternatives that can't be ruled out. Conversely, if knowledge is cheaper and easier to attain -- if knowledge doesn't require, for example, perfect certainty, or knowledge that you know, or being able to rule out every single relevant alternative -- then skepticism is harder to defend.

But we don't have to conceptualize skepticism as a thesis about knowledge. We can separate the two concepts. Doing so has some advantages. The concept of knowledge is so vexed and contentious that it can become a distraction if our interests in skepticism are not driven by an interest in the concept of knowledge. You might be interested in religious skepticism, or climate change skepticism, or dream skepticism, or external world skepticism because you're interested in the question of whether god exists, whether the climate is changing, whether you might now be dreaming, or whether it's plausible that you could be radically mistaken about the external world. If your interest lies in those substantive questions, then conceptual debates about the nature of knowledge are beside the point. You don't want abstract disputes about the KK principle to crowd out discussion about what kinds of evidence we have or don't have for the existence of God, or climate change, or a stable external reality, and how relatively confident or unconfident we should be in our opinions about such matters.

To avoid distractions concerning knowledge, I recommend that we think about skepticism instead in terms of credence -- that is, degree of belief or confidence. We can contrast skeptics and believers. A believer in X is someone with a relatively high credence in X, while a skeptic is someone with a relatively low credence in X. A believer thinks X is relatively likely to be the case, while a skeptic regards X as relatively less likely. Believers in God find the existence of God likely. Skeptics find it less likely. Believers in the external world find the existence of an external world (with roughly the properties we ordinarily think it has) relatively likely while skeptics find it relatively less likely.

"Relatively" is an important word here. Given that most readers of this blog will be virtually certain that they are not currently dreaming, a reader who thinks it even 1% likely that they're dreaming has a relatively low credence -- 99% instead of 99.999999% or 100%. We can describe this as a moderately skeptical stance, though of course not as skeptical as the stance of someone who thinks it's 50/50.

[Dall-E image of a man flying in a dream]

Discussions of radical skepticism in epistemology tend to lose sight of what is really gripping about radically skeptical scenarios: the fact that, if the skeptic is right, there's a reasonable chance that you're in one. It's not unreasonable, the skeptic asserts, to attribute a non-trivial credence to the possibility that you are currently dreaming or currently living in a small or unstable computer simulation. Whoa! Such possibilities are potentially Earth-shaking if true, since many of the beliefs we ordinarily take for granted as obviously true (that Luxembourg exists, that I'm in my office looking at a computer screen) would be false.

To really assess such wild-seeming claims, we should address the nature and epistemology of dreaming and the nature and epistemology of computer simulations. Can dream experiences really be as sensorily rich and realistic as the experiences that I'm having right now? Or are dream experiences somehow different? If dream experiences can be as rich and realistic as what I'm now experiencing, then that seems to make it relatively more reasonable to assign a non-trivial credence to this being a dream. Is it realistic to think that future societies could create vastly many genuinely conscious AI entities who think that they live in worlds like this one? If so, then the simulation possibility starts to look relatively more plausible; if not, then it starts to look relatively less plausible.

In other words, to assess the likelihood of radically skeptical scenarios, like the dream or simulation scenario, we need to delve into the details of those scenarios. But that's not typically what epistemologists do when considering radical skepticism. More typically, they stipulate some far-fetched scenario with no plausibility, such as the brain-in-a-vat scenario, and then ask questions about the nature of knowledge. That's worth doing. But to put that at the heart of skeptical epistemology is to miss skepticism's pull.

A credence-first approach to skepticism makes skepticism behaviorally and emotionally relevant. Suppose I arrive at a small but non-trivial credence that I'm dreaming -- a 0.1% credence for example. Then I might try some things I wouldn't try if I had a 0% or 0.000000000001% credence I was dreaming. I might ask myself what I would do if this were a dream -- and if doing that thing were nearly cost-free, I might try it. For example, I might spread my arms to see if I can fly. I might see if I can turn this into a lucid dream by magically lifting a pen through telekinesis. I'd probably only try these things if I had nothing better to do at the moment and no one was around to think I'm a weirdo. And when those attempts fail, I might reduce my credence that this is a dream.

If I take seriously the possibility that this is a simulation, I can wonder about the creators. I become, so to speak, a conditional theist. Whoever is running the simulation is in some sense a god: They created the world and presumably can end it. They exist outside of time and space as I know them, and maybe they have "miraculous" powers to intervene in events around me. Perhaps I have no idea what I could do that might please or displease them, or whether they're even paying attention, but still, it's somewhat awe-inspiring to consider the possibility that my world, our world, is nested in some larger reality, launched by some creator for some purpose we don't understand. If I regard the simulation possibility as a live possibility with some non-trivial chance of being true, then the world might be quite a bit weirder than I would otherwise have thought, and very differently constituted. Skepticism gives me material uncertainty and opens up genuine doubt. The cosmos seems richer with possibility and more mysterious.

We lose all of this weirdness, awe, mystery, and material uncertainty if we focus on extremely implausible scenarios to which we assign zero or virtually zero credence, like the brain-in-a-vat scenario, and focus our argumentative attention only on whether or not it's appropriate to say that we "know" we're not in those admittedly extremely implausible scenarios.