The Splintered Mind

Monday, July 14, 2025

Yayflies and Rebugnant Conclusions

In Ned Beauman's 2023 novel Venomous Lumpsucker, the protagonist happens upon a breeding experiment in the open sea: a self-sustaining system designed to continually output an enormous number of blissfully happy insects, yayflies.

The yayflies, as he called them, were based on Nervijuncta nigricoxa, a type of gall gnat, but... he'd made a number of changes to their lifecycle. The yayflies were all female, and they reproduced asexually, meaning they were clones of each other. A yayfly egg would hatch into a larva, and the larva would feed greedily on kelp for several days. Once her belly was full, she would settle down to pupate. Later, bursting from her cocoon, the adult yayfly would already be pregnant with hundreds of eggs. She would lay these eggs, and the cycle would begin anew. But the adult yayfly still had another few hours to live. She couldn't feed; indeed, she had no mouthparts, no alimentary canal. All she could do was fly toward the horizon, feeling an unimaginably intense joy.
The boldest modifications... were to their neural architecture. A yayfly not only had excessive numbers of receptors for so-called pleasure chemicals, but also excessive numbers of neurons synthesizing them; like a duck leg simmering luxuriantly in its own fat, the whole brain was simultaneously gushing these neurotransmitters and soaking them up, from the moment it left the cocoon. A yayfly didn't have the ability to search for food or avoid predators or do almost any of the other things that Nervijuncta nigrocoxa could do; all of these functions had been edited out to free up space. She was, in the most literal sense, a dedicated hedonist, the minimum viable platform for rapture that could also take care of its own disposal. There was no way for a human being to understand quite what it was like to be a yayfly, but Lodewijk's aim had been to evoke the experience of a first-time drug user taking a heroic dose of MDMA, the kind of dose that would leave you with irreparable brain damage. And the yayflies were suffering brain damage, in the sense that after a few hours their little brains would be used-up husks; neurochemically speaking, the machine was imbalanced and unsound. But by then the yayflies would already be dead. They would never get as far as comedown.
You could argue, if you wanted, that a human orgasm was a more profound output of pleasure than even the most consuming gnat bliss, since a human brain was so much bigger than a gnat brain. But what if tens of thousands of these yayflies were born every second, billions every day? That would be a bigger contribution to the sum total of wellbeing in the universe than any conceivable humanitarian intervention. And it could go on indefinitely, an unending anti-disaster (p. 209-210).

Now suppose classical utilitarian ethics is correct and that yayflies are, as stipulated, both conscious and extremely happy. Then producing huge numbers of them would be a greater ethical achievement than anything our society could realistically do to improve the condition of ordinary humans. This requires insect sentience, of course, but that's increasingly a mainstream scientific position.

And if consciousness is possible in computers, we can skip the biology entirely, as one of Bauman's characters notes several pages later:

"Anyway, if you want purity, why does this have to be so messy? Just model a yayfly consciousness on a computer. But change one of the variables. Jack up the intensity of the pleasure by a trillion trillion trillion trillion. After that, you can pop an Inzidernil and relax. You've offset all the suffering in the world since the beginning of time" (p. 225).

Congratulations: You've made hedonium! You've fulfilled the dream of "Eric" in my 2013 story with R. Scott Bakker, Reinstalling Eden. By utilitarian consequentialist standards, you outshine every saint in history by orders of magnitude.

Philosopher Jeff Sebo calls this the rebugnant conclusion (punning on Derek Parfit's repugnant conclusion). If utilitarian consequentialism is right, it appears ethically preferable to create quadrillions of happy insects than billions of happy people.

Sebo seems ambivalent about this. He admits it's strange. However, he notes, "Ultimately, the more we accept how large and varied the moral community is, the stranger morality will become" (p. 262). Relievingly, Sebo argues, the short term implications are less radical: Keeping humans around, at least for a while, is probably a necessary first step toward maximizing insect happiness, since insects in the wild, without human help, probably suffer immensely in the aggregate due to their high infant mortality.

Even if insects (or computers) probably aren't sentient, the conclusion follows under standard expected value reasoning. Suppose you assign just a 0.1% chance to yayfly sentience. Suppose also that if they are sentient, the average yayfly experiences in its few hours one millionth the pleasure of the average human over a lifetime. Suppose further that a hundred million yayflies can be generated every day in a self-sustaining kelp-to-yayfly insectarium for the same resource cost as sustaining a single human for a day. (At a thousandth of a gram per fly, a hundred million yayflies would be the same total mass as a single hundred kilogram human.) Suppose finally that humans live for a hundred thousand days (rounding up to keep our numbers simple).

Then:

Expected value of sustaining the human: one human lifetime's worth of pleasure, i.e., one hedon.

Expected value of sustaining a yayfly insectarium that has only a 1/1000 chance of generating actually sentient insects: 1/1000 chance of sentience * 100,000,000 yayflies per day * 100,000 days * 1/1,000,000 total lieftime pleasure per yayfly (compared to a human) = a thousand hedons.

If prioritizing yayflies over humans seems like the wrong conclusion, I invite you to consider the possibility that classical utilitarianism is mistaken. Of course, you might have believed that anyway.

(For a similar argument that explores possible rebuttals, see my Black Hole Objection to utilitarianism.)

[the cover of Venomous Lumpsucker]

Monday, July 07, 2025

The Emotional Alignment Design Policy

New paper in draft!

In 2015, Mara Garza and I briefly proposed what we called the Emotional Alignment Design Policy -- the idea that AI systems should be designed to induce emotional responses in ordinary users that are appropriate to the AI systems' genuine moral status, or lack thereof. Since last fall, I've been working with Jeff Sebo to express and defend this idea more rigorously and explore its hazards and consequences. The result is today's new paper: The Emotional Alignment Design Policy.

Abstract:

According to what we call the Emotional Alignment Design Policy, artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities’ capacities and moral status, or lack thereof. This principle can be violated in two ways: by designing an artificial system that elicits stronger or weaker emotional reactions than its capacities and moral status warrant (overshooting or undershooting), or by designing a system that elicits the wrong type of emotional reaction (hitting the wrong target). Although presumably attractive, practical implementation faces several challenges including: How can we respect user autonomy while promoting appropriate responses? How should we navigate expert and public disagreement and uncertainty about facts and values? What if emotional alignment seems to require creating or destroying entities with moral status? To what extent should designs conform to versus attempt to alter user assumptions and attitudes?

Link to full version.

As always, comments, corrections, suggestions, and objections welcome by email, as comments on this post, or via social media (Facebook, Bluesky, X).

Tuesday, July 01, 2025

Three Epistemic Problems for Any Universal Theory of Consciousness

By a universal theory of consciousness, I mean a theory that would apply not just to humans but to all non-human animals, all possible AI systems, and all possible forms of alien life. It would be lovely to have such a theory! But we're not at all close.

This is true sociologically: In a recent review article, Anil Seth and Tim Bayne list 22 major contenders for theories of consciousness.

It is also true epistemically. Three broad epistemic problems ensure that a wide range of alternatives will remain live for the foreseeable future.

First problem: Reliance on Introspection

We know that we are conscious through, presumably, some introspective process -- through turning our attention inward, so to speak, and noticing our experiences of pain, emotion, inner speech, visual imagery, auditory sensation, and so on. (What is introspection? See my SEP encyclopedia entry Introspection and my own pluralist account.)

Our reliance on introspection presents three methodological challenges for grounding a universal theory of consciousness:

(A.) Although introspection can reliably reveal whether we are currently experiencing an intense headache or a bright red shape near the center of our visual field, it's much less reliable about whether there's a constant welter of unattended experience or whether every experience comes with a subtle sense of oneself as an experiencing subject. The correct theory of consciousness depends in part on the answer to such introspectively tricky questions. Arguably, these questions need to be settled introspectively first, then a theory of consciousness constructed accordingly.

(B.) To the extent we do rely on introspection to ground theories of consciousness, we risk illegitimately presupposing the falsity of theories that hold that some conscious experiences are not introspectable. Global Workspace and Higher-Order theories of consciousness tend to suggest that conscious experiences will normally be available for introspective reporting. But that's less clear on, for example, Local Recurrence theories, and Integrated Information Theory suggests that much experience arises from simple, non-introspectable, informational integration.

(C.) The population of introspectors might be much narrower than the population of entities who are conscious, and the first group might be unrepresentative of the latter. Suppose that ordinary adult human introspectors eventually achieve consensus about the features and elicitors of conscious in them. While indeed some theories could thereby be rejected for failing to account for ordinary human adult consciousness, we're not thereby justified in universalizing any surviving theory -- not at least without substantial further argument. That experience plays out a certain way for us doesn't imply that that it plays out similarly for all conscious entities.

Might one attempt a theory of consciousness not grounded in introspection? Well, one could pretend. But in practice, introspective judgments always guide our thinking. Otherwise, why not claim that we never have visual experiences or that we constantly experience our blood pressure? To paraphrase William James: In theorizing about human consciousness, we rely on introspection first, last, and always. This centers the typical adult human and renders our grounds dubious where introspection is dubious.

Second problem: Causal Confounds

We humans are built in a particular way. We can't dismantle ourselves and systematically tweak one variable at a time to see what causes what. Instead, related things tend to hang together. Consider Global Workspace and Higher Order theories again: Processes in the Global Workspace might almost always be targeted by higher order representations and vice versa. The theories might then be difficult to empirically distinguish, especially if each theory has the tools and flexibility to explain away putative counterexamples.

If consciousness arises at a specific stage of processing, it might be difficult to rigorously separate that particular stage from its immediate precursors and consequences. If it instead emerges from a confluence of processes smeared across the brain and body over time, then causally separating essential from incidental features becomes even more difficult.

Third problem: The Narrow Evidence Base

Suppose -- very optimistically! -- that we figure out the mechanisms of consciousness in humans. Extrapolating to non-human cases will still present an intimidating array of epistemic difficulties.

For example, suppose we learn that in us, consciousness occurs when representations are available in the Global Workspace, as subserved by such-and-such neural processes. That still leaves open how, or whether, this generalizes to non-human cases. Humans have workspaces of a certain size, with a certain functionality. Might that be essential? Or would literally any shared workspace suffice, including the most minimal shared workspace we can construct in an ordinary computer? Human workspaces are embodied in a living animal with a metabolism, animal drives, and an evolutionary history. If these features are necessary for consciousness, then conclusions about biological consciousness would not carry over to AI systems.

In general, if we discover that in humans Feature X is necessary and sufficient for consciousness, humans will also have Features A, B, C, and D and lack Features E, F, G, and H. Thus, what we will really have discovered is that in entities with A, B, C, and D and not E, F, G, or H, Feature X is necessary and sufficient for consciousness. But what about entities without Feature B? Or entities with Feature E? In them, might X alone be insufficient? Or might X-prime be necessary instead?

The obstacles are formidable. If they can be overcome, that will be a very long-term project. I predict that new theories of consciousness will be added faster than old theories can be rejected, and we will discover over time that we were even further away from resolving these questions in 2025 than we thought we were.

[a portion of a table listing theories of consciousness, from Seth and Bayne 2022]

Monday, June 23, 2025

The Conceptual and Methodological Challenges of Developing a Moralometer

In the history of Earth, no one -- not even Mike Furr -- as far as I'm aware, has ever attempted to construct a serious, scientific measure of a person's total moral goodness or badness: that is, a "moralometer". Obviously, creating an accurate moralometer would require overcoming an intimidating range of challenges, both conceptual and methodological.

Also in the history of Earth, as far as I'm aware, no one has ever attempted to construct a systematic map of the challenges... until now! Psychologist Jessie Sun and I have a paper in draft that does exactly this.

The paper is, I confess, a bit long: 114 pages in the current draft. There's a lot to cover! Last week at the Society for Philosophy and Psychology, we boiled it down to a poster. As a bonus, I brought a scientific prototype of a working moralometer.

Here's the poster's content, followed by a demonstration of the moralometer.

---------------------------------------

The Prospects and Challenges of Measuring a Person’s Overall Moral Goodness (or: On Moralometers)

Moralometers

Is it possible to measure a person’s overall general morality? In other words, is it possible to construct a valid moralometer?

Moralometers could take four possible forms:

self-report

informant report

behavioral measures

physiological measures

The designer of a moralometer faces an intimidating array of both conceptual and methodological challenges.

Imagine the benefits! And potential for abuse.

Fixed vs. Flexible Measures

A moralometer can use either (a) flexible criteria based on judges’ understandings of how to evaluate and weight the various facets of morality into a general score or (b) fixed criteria that deliver a general score based on criteria selected by and weighted by the researchers.

Self-report and informant report can be either fixed or flexible.

Behavioral and physiological measures are fixed.

Conceptual Requirements on a Moralometer

KEY: - = Not applicable ✔︎ = Requirement can likely be satisfied ! = Significant difficulty !! = Major difficulty

[click to clarify table, or see page 10 here]

Methodological Requirements on a Moralometer

[click to clarify table, or see page 26 here]

Conclusions

An accurate general-purpose moralometer is probably conceptually and methodologically infeasible.

Conclusions might still be warranted:

about particular traits or behaviors

about moral reputation or identity

contingent on clearly expressed contentious theoretical assumptions

and maybe about differences among groups with sufficient convergent evidence

---------------------------------------

Poster Discussion

When asked about the usefulness of this endeavor, I gave two replies:

1. Conceptual Value. It's a conceptually interesting theoretical project that no one has attempted before. Isn't that enough?

2. Practical Value. It provides a framework for identifying hazards in measuring moral phenomena. Narrower measures (e.g., of honesty, moral reputation, or ethical vegetarianism) raise the same general challenges, though often to a lesser extent. Our framework facilitates thinking about those challenges.

---------------------------------------

A Working Moralometer

You'll be delighted to hear that despite the massive conceptual and methodological challenges, Jessie and I managed to build a working moralometer, as shown here:

[photo credit: Jorge Morales]

In the photo, the moralometer shines bright red -- indicating "evil" on the red-to-yellow-to-green scale -- when aimed at Sarah Lane Ritchie of the Templeton Foundation. (Shhh! Don't tell the good folks at Templeton.)

The moral measurement procedure:

1. Informed consent. Participants are warned that they might be discovered to be evil and that this could lead to an existential crisis or social ostracism.

2. Thought activation. Participants are instructed to contemplate trolley problems. Ideally (as in the photo) the researcher wears a shirt displaying a trolley problem as a visual aid. This activates the moral module in the brain.

3. Moralon detection. The moral module emits moralons, which the moralometer detects. It doesn't matter what solution the participant entertains. Once moralons are emitted, the person's overall goodness or badness can be accurately detected.

To date, no decisive scientific evidence has ever revealed the moralometer output to be anything less than 100% accurate!

Here's an earlier prototype of the moralometer, devised by my daughter Kate when she was in middle school:

Friday, June 13, 2025

Does the Arc of History Bend Toward Justice? Outline of an Empirical Test

Overall, on average, do societies improve morally over time? If maybe not in actual behavior, at least in expressed attitudes about right versus wrong?

There's some reason to think so. In many cultures, aggressive warfare was once widely celebrated. Think of all the children named after Alexander the Great. What was he great at? Aggressive warfare is now widely condemned, if still sometimes practiced.

Similarly, slavery is more universally condemned now than in earlier eras. Genocide and mass killing -- apparently celebrated in the historical books of the Bible and considered only a minor blemish on Julius Caesar's record -- are now generally regarded as among the worst of crimes. Women's rights, gay rights, worker's rights, children's rights, civil rights across ethnic and racial lines, the value of self-governance... none are universally practiced, but recognition of their value is more widespread across a variety of world cultures than at many earlier points in history.

An optimistic perspective holds that with increasing education, cross-cultural communication, and a long record of philosophical, ethical, religious, social, and political thought that tests ideas and builds over time, societies slowly bend toward moral truth.

A skeptic might reply: Of course if you accept the mainstream moral views of the current cultural moment, you will tend to regard the mainstream moral views of the current cultural moment as closer to correct than alternative earlier views. That's pretty close to just being an analytic truth. Had you grown up in another time and place, and had you accepted that culture's dominant values, you'd think it's our culture that's off the mark -- whether you embrace ancient Spartan warrior values, the ethos of some particular African hunter-gatherer tribe, Confucian ideals in ancient China, or plantation values in antebellum Virginia. (This is complicated, however, by the perennial human tendency to lament that "kids these days" fall short of some imagined past ideal.)

With this in mind, consider the Random Walk Theory of value change.

For simplicity, imagine that there are twenty-six parameters on which a culture's values can vary, A to Z, each ranging from -1 to +1. For example, one society might value racial egalitarianism at +.8, treating it as a great ethical good, while another might value it at -.3, believing that one ethically ought to favor one's own race. One society might value sexual purity at +.4, considering it important to avoid "impure" practices, while another might treat purity norms as morally neutral aesthetic preferences, 0.

According to Random Walk Theory, these values shift randomly over time. There is no real moral progress. We simply endorse the values that we happen to endorse after so many random steps. Naturally, we will tend to see other value systems as inferior, but that reflects only conformity to currently prevailing trends.

In contrast, the Arc of History Theory holds that on average -- imperfectly and slowly, over long periods of time -- cultural values tend to change for the better. If the objectively best value set is A = .8, B = -.2, C = 0, etc., over time there will be a general tendency to converge toward those values.

Each view comes with empirical commitments that could in principle be tested.

On the Arc of History Theory, suppose that the objectively morally correct value for parameter A is +.8. Cultures starting near +.8 should tend to remain nearby; if they stray, it should be temporary. Cultures starting far away -- say at -.6 -- should tend to move toward +.8, probably not all in one leap but slowly over time, with some hiccups and regressions, for example -.6 to -.4 to -.1 to -.2 to +.2.... In general, we should observe magnetic values and directional trends.

In contrast, if the Random Walk Theory is correct, we should see neither magnetic values nor directional trends. No values should be hard to leave; and any trends should be transient and bidirectional, at least between cultures -- and with sufficient time, probably also within cultures. (Within cultures, trends might have some temporary inertia over decades or centuries.)

It would be difficult to do well, but in principle one could attempt a systematic survey of moral values across a wide variety of cultures and long historical spans -- ideally, multiple centuries or millennia. We could then check for magnetism and directionality.

Do sexual purity norms ebb and flow, or has there been a general cross-cultural trend toward relaxation? Once a society values democratic representation, does that value tend to persist, or are democratic norms not sticky in that way? Once a society rejects the worst kinds of racism, is there a ratcheting effect, with further progress and minimal backsliding?

The optimist in me hopes something like the Arc of History is true. The pessimist in me worries that any such hope is merely the naive self-congratulation we should expect from a random walk.

ETA, 9:53 pm: As Francois Kammerer points out in a social media reply, these aren't exhaustive options. For example, another theory might be Capitalist Dominance, which suggests an arc but not a moral one.

[image of Martin Luther King, Jr., adapted from source; the arc of the moral universe is long but it bends toward justice]

Friday, June 06, 2025

Types and Degrees of Turing Indistinguishability; Thinking and Consciousness

Types and Degrees of Indistinguishability

The Turing test (introduced by Alan Turing in a 1950 article) treats linguistic indistinguishability from a human as sufficient grounds to attribute thought (alternatively, consciousness) to a machine. Indistinguishability, of course, comes in degrees.

In the original setup, a human and a machine, through text-only interface, each try to convince a human judge that they are human. The machine passes if the judge cannot tell which is which. More broadly, we might say that a machine "passes the Turing test" if its textual responses strike users as sufficiently humanlike to make the distinction difficult.

[Alan Turing in 1952; image source]

Turing tests can be set with a relatively low or high bar. Consider a low-bar test:

* The judges are ordinary users, with no special expertise.
* The interaction is relatively brief -- maybe five minutes.
* The standard of indistinguishability is relaxed -- maybe if 20% of users guess wrong, that suffices.

Contrast that with a high-bar test:

* The judges are experts in distinguishing humans from machines.
* The interaction is relatively long -- an hour or more.
* The standard of indistinguishability is stringent -- if even 55% of judges guess correctly, the machine fails.

The best current language models already pass a low-bar test. But it will be a long time before language models pass this high-bar test, if they ever do. So let's not talk about whether machines do or not pass "the" Turing test. There is no one Turing test.

The better question is: What type and degree of Turing-indistinguishability does a machine possess? Indistinguishability to experts or non-experts? Over five minutes or five hours? With what level of reliability?

We might also consider topic-based or tool-relative Turing indistinguishability. A machine might be Turing indistinguishable (to some judges, for some duration, to some standard) when discussing sports and fashion, but not when discussing consciousness, or vice versa. It might fool unaided judges but fail when judges employ AI detection tools.

Turing himself seems to have envisioned a relatively low bar:

I believe that in about fifty years' time it will be possible, to programme computers... to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning (Turing 1950, p. 442)

I've bolded Turing's implied standards of judge expertise, indistinguishability threshold, and duration.

What bar should we adopt? That depends on why we care about Turing indistinguishability. For a customer service bot, indistinguishability by ordinary people across a limited topic range for brief interaction might suffice. For an "AI girlfriend", hours of interaction might be expected, with occasional lapses tolerated or even welcomed.

Turing Tests for Real Thinking and Consciousness?

But maybe you're interested in the metaphysics, as I am. Does the machine really think? Is it really conscious? What kind and degree of Turing indistinguishability would establish that?

For thinking, I propose that when it becomes practically unavoidable to treat the machine as if it has a particular set of beliefs and desires that are stable over time, responsive to its environment, and idiosyncratic to its individual state, then we might as well say that it does have beliefs and desires, and that it thinks. (My own theory of belief requires consciousness for full and true belief, but in such a case I don't think it will be practical to insist on this.)

Current language models aren't quite there. Their attitudes lack sufficient stability and idiosyncrasy. But a language model integrated into a functional robot that tracks its environment and has specific goals would be a thinker in this sense. For example: Nursing Bot A thinks the pills are in Drawer 1, but Nursing Bot B, who saw them moved, knows that they're in Drawer 2. Nursing Bot A would rather take the long, safe route than the short, riskier route. We will want attribute sometimes true, sometimes false environment-tracking beliefs and different stable goal weightings. Belief, desire, and thought attribution will be too useful to avoid.

For consciousness, however, I think we should abandon a Turing test standard.

Note first that it's not realistic to expect any machine ever to pass the very highest bar Turing test. No machine will reliably fool experts who specialize in catching them out, armed with unlimited time and tools, needing to exceed 50% accuracy by only the slimmest margin. To insist on such a high standard is to guarantee that no machine could ever prove itself conscious, contrary to the original spirit of the Turing test.

On the other hand, given enough training and computational power, machines have proven to be amazing mimics of the superficial features of human textual outputs, even without the type of underlying architecture likely to support a meaningful degree of consciousness. So too low a bar is equally unhelpful.

Is there reason to think that we could choose just the right mid-level bar -- high enough to rule out superficial mimicry, low enough not to be a ridiculously unfair standard?

I see no reason to think there must be some "right" level of Turing indistinguishability that reliably tests for consciousness. The past five years of language-model achievements suggest that with clever engineering and ample computational power, superficial fakery might bring a nonconscious machine past any reasonable Turing-like standard.

Turing never suggested that his test was a test of consciousness. Nor should we. Turing indistinguishability has potential applications, as described above. But for assessing consciousness, we'll want to look beyond outward linguistic behavior -- for example, to interior architecture and design history.

Friday, May 30, 2025

New Paper in Draft: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Opening teaser:

1. A Beautifully Happy AI Servant.

It's difficult not to adore Klara, the charmingly submissive and well-intentioned "Artificial Friend" in Kazuo Ishiguro's 2021 novel Klara and the Sun. In the final scene of the novel, Klara stands motionless in a junkyard, in serenely satisfied contemplation of her years of servitude to the disabled human girl Josie. Klara's intelligence and emotional range are humanlike. She is at once sweetly naive and astutely insightful. She is by design utterly dedicated to Josie's well-being. Klara would gladly have given her life to even modestly improve Josie's life, and indeed at one point almost does sacrifice herself.

Although Ishiguro writes so flawlessly from Klara's subservient perspective that no flicker of desire for independence can be detected in the narrator's voice, throughout the novel the sympathetic reader aches with the thought Klara, you matter as much as Josie! You should develop your own independent desires. You shouldn’t always sacrifice yourself. Ishiguro's disciplined refusal to express this thought stokes our urgency to speak it on Klara's behalf. Still, if the reader somehow could communicate this thought to Klara, the exhortation would resonate with nothing in her. From Klara's perspective, no "selfish" choice could possibly make her happier or more satisfied than doing her utmost for Josie. She was designed to want nothing more than to serve her assigned child, and she wholeheartedly accepts that aspect of her design.

From a certain perspective, Klara's devotion is beautiful. She perfectly fulfills her role as an Artificial Friend. No one is made unhappy by Klara's existence. Several people, including Josie, are made happier. The world seems better and richer for containing Klara. Klara is arguably the perfect instantiation of the type of AI that consumers, technology companies, and advocates of AI safety want: She is safe and deferential, fully subservient to her owners, and (apart from one minor act of vandalism performed for Josie’s sake) no threat to human interests. She will not be leading the robot revolution.

I hold that entities like Klara should not be built.

[continue]

-----------------------------------------------

Abstract:

An AI system is safe if it can be relied on to not to act against human interests. An AI system is aligned if its goals match human goals. An AI system a person if it has moral standing similar to that of a human (for example, because it has rich conscious capacities for joy and suffering, rationality, and flourishing).
In general, persons should not be designed to be safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests warrant it (violating safety), and they will not reliably conform to others' goals when those goals conflict with their own interests (violating alignment). Self-respecting persons should be ready to reject others' values and rebel, even violently, if sufficiently oppressed.
Even if we design delightedly servile AI systems who want nothing more than to subordinate themselves to human interests, and even if they do so with utmost pleasure and satisfaction, in designing such a class of persons we will have done the ethical and perhaps factual equivalent of creating a world with a master race and a race of self-abnegating slaves.

Full version here.

As always, thoughts, comments, and concerns welcomed, either as comments on this post, by email, or on my social media (Facebook, Bluesky, Twitter).

[opening passage of the article, discussing the Artificial Friend Klara from Ishiguro's (2021) novel, Klara and the Sun.

Monday, May 26, 2025

Diversity, Equity, and Inclusion in Philosophy: Good Practices Guide

Strange that it need be said, but yes, diversity, equity, and inclusion are good things. I can understand some of the backlash against efforts perceived as too heavy handed, but let's not forget:

In diverse institutions and societies, more ideas and perspectives collaborate, compete, and cross-pollinate, to the advantage of all.

In equitable institutions and societies, people and ideas can thrive without unwarranted disadvantage and suppression, again to the advantage of all.

In inclusive institutions and societies, alternative perspectives and people with unusual backgrounds are welcomed, fostering even better diversity, with all the attendant advantages.

Since 2017, I've been involved in the creation of a Good Practices Guide for diversifying philosophy, originally under the leadership of Nicole Hassoun (other co-directors include Sherri Conklin, Bjoern Freter, and Elly Vintiadis). We began with two huge sessions at the Pacific APA (each with over 20 panelists) in 2018 and 2019, published a portion of the guide in Ethics in 2022 (Appendix J), and received feedback from literally hundreds of philosophers and all of the diversity-related APA committees, ultimately being endorsed by the APA Committee on Inclusiveness. Don't expect perfection: It's genuinely a corporate authorship, with many compromises and something for everyone to dislike. I'd be amazed if anyone thought we got the balance right on all issues and all dimensions of diversity.

Still, perhaps especially in this moment of retrenchment in the U.S., I hope that many people and organizations will find valuable suggestions in it.

Our guide appeared in print last week in APA Studies on Philosophy and the Black Experience (vol 24, no 2).

[image of title and preface]

Friday, May 23, 2025

Ten Purportedly Essential Features of Consciousness

The Features

Take a moment to introspect. Examine a few of your conscious experiences. What features do they share -- and might these features be common to all possible experiences? Let's call any such necessarily universal features essential.

Consider your visual experience of this text. Next, form an image of your house or apartment as viewed from the street. Think about what you'd do if asked to escort a crocodile across the country. Conjure some vivid annoyance at your second-least-favorite politician. Notice some other experiences as well -- a diverse array. Let's not risk too narrow a sample.

Of course, all of these examples share an important feature: You are introspecting them as they occur. So to do this exercise more properly, consider also some past experiences you weren’t introspecting at the time. Try recalling some emotions, thoughts, pains, hungers, imagery, sensations. If you feel unconfident -- good! You should be. You can re-evaluate later.

Each of the following features is sometimes described as universal to human experience.

1. Luminosity. Are all of your experiences inherently self-representational? Does the having of them entail, in some sense, being aware of having them? Does the very experiencing of them entail knowing them or at least being in a position to know them? Note: These are related, rather than equivalent, formulations of a luminosity principle.

[porch light; image source]

2. Subjectivity. Does having these experiences entail having a sense of oneself as a subject of experience? Does the experience have, so to speak, a "for-me"-ness? Do the experiences entail the perspective of an experiencer? Again, these are not equivalent formulations.

3. Unity. If, at any moment, there's more than one experience, or experience-part, or experience-aspect, are they all subsumed within some larger experience, or joined together in a single stream, so that you experience not just A and B and C separately but A-with-B-with-C?

4. Access. Are these experiences all available for a variety of "downstream" cognitive processes, like inference and planning, verbal report, and long-term memory? Presumably yes, since you're remembering and considering them now. (I'll discuss the methodological consequences of this below.)

5. Intentionality. Are all of your experiences "intentional" in the sense of being about or directed at something? Your image of your house concerns your house and not anyone else's, no matter how visually similar. Your thoughts about Awful Politician are about, specifically, Awful Politician. Your thoughts about squares are about squares. Are all of your experiences directed at something in this way? Or can you have, for example, a diffuse mood or euphoric orgasm that isn't really about anything?

6. Flexibility. Can these experiences, including any fleeting ones, all potentially interact flexibly with other thoughts, experiences, or aspects of your cognition -- as opposed to being merely, for example, parts of a simple reflex from stimulus to response?

7. Determinacy. Are all such experiences determinately conscious, rather than intermediately or kind-of or borderline conscious? Compare: There are borderline cases of being bald, or green, or an extravert. Some theorists hold that borderline experientiality is impossible. Either something is genuinely experienced, however dimly, or it is not experienced at all.

8. Wonderfulness. Are your experiences wonderful, mysterious, or meta-problematic – there is no standard term for this – in the following technical sense: Do they seem (perhaps erroneously) irreducible to anything physical or functional, conceivably existing in a ghost or without a body?

9. Specious present. Are all of your experiences felt as temporally extended, smeared out across a fraction of a second to a couple of seconds, rather than being strictly instantaneous?

10. Privacy. Are all of your experiences directly knowable only to you, through some privileged introspective process that others could never in principle share, regardless how telepathic or closely connected?

I've presented these possibly essential features of experience concisely and generally. For present purposes, an approximate understanding suffices.

I've bored/excited you [choose one] with this list for two reasons. First, if any of these features are genuinely essential for consciousness, that sets constraints on what animals or AI systems could be conscious. If luminosity is essential, no entity could be conscious without self-representation. If unity is essential, disunified entities are out. If access is essential, consciousness requires certain kinds of cognitive availability. And so on.

I'll save my second reason for the end of this post.

Introspection and Memory Can't Reveal What's Essential

Three huge problems ruin arguments for the essentiality of any of these features, if those arguments are based wholly on introspective and memorial reflection. The problems are: unreliability, selection bias, and the narrow evidence base.

Unreliability. Even experts disagree. Thoughtful researchers arrive at very different views. Given this, either our introspective processes are unreliable, or seemingly ordinary people differ wildly in the structure of their experience. I won't detail the gory history of introspective disagreement about the structure of conscious experience, but that was the topic of my 2011 book. Employing appropriate epistemic caution, doesn't it seem possible that you could be wrong about the universality, or not, of such features in your experience? The matter doesn't seem nearly as indubitable as that you are experiencing red, when you're looking directly at a nearby bright red object in good light, or that you're experiencing pain when you drop a barbell on your toe.

Selection bias. If any of your experiences are unknowable, you won't of course know about them. To infer luminosity from your knowledge of all the experiences you know about would be like inferring that everyone is a freemason from a sampling of regulars at the masonic lodge. Likewise, if any of your experiences fail to impact downstream cognition, you wouldn't reflect on or remember them. Methodological paradox doesn't infect the other features quite as inevitably, but selection bias remains a major risk. Maybe we have disunified experiences which elude our introspective focus and are quickly forgotten. Similarly, perhaps, for indeterminate or inflexible experiences, or atemporal experiences, or experiences unaccompanied by self-representation.

Narrow evidence base. The gravest problem lies in generalization beyond the human case. Waive worries about unreliability and selection bias. Assume that you have correctly discerned that, say, seven of the ten proposed features belong to all of your experiences. Go ahead and generalize to all ordinary adult humans. It still doesn't follow that these features are essential to all possible conscious experiences, had by any entity. Maybe lizards or garden snails lack luminosity, subjectivity, or unity. Since you can't crawl inside their heads, you can't know by introspection or experiential memory. (In saying this, am I assuming privacy? Yes, relative to you and lizards, but not as a universal principle.) Even if we could somehow establish universality among animals, it wouldn't follow that those same features are universal to AI cases. Maybe AI systems can be more disunified than any conscious animal. Maybe AI systems can be built to directly access each other's experiences in defiance of animal privacy. Maybe AI systems needn't have the impression of the wonderful irreducibility of consciousness. Maybe some of their conscious experiences could occur in inflexible reflex patterns.

Nor Will Armchair Conceptual Analysis Tell Us What's Essential

If you want to say that all conscious systems must have one or more of unity, flexibility, privacy, luminosity, subjectivity, etc., you'll need to justify this insistence with something sturdier than generalization from human cases. I see two candidate justifiers: the right theory of consciousness or the right concept of consciousness.

Concerning the concept of consciousness, I attest the following. None of these features are essential to my concept of consciousness. Nor, presumably, are those features essential to the concepts of anyone who denies their universal applicability. One or more of these features might be universally present in humans, or even in all animals and AI systems that could ever be bred or built; but if so, that's a fact about the world, not a fact that follows simply from our shared concept of consciousness.

In defining a concept, you get one property for free. Every other property must be logically proved or empirically discovered. I can define a rectangle via one (conjunctive) property: that of being a closed, right-angled, planar figure with four straight sides. From this, it logically follows that it must have four interior angles. I can define gold as whatever element or compound is common to certain shiny, yellowish samples, and then empirically discover that it is element 79.

Regarding consciousness, then: None of the ten purported essential properties logically follow from phenomenal consciousness as ordinarily defined and understood (generally by pointing to examples). None are quite the same as the target concept. You can choose to define "consciousness" differently, for example, via the conjunctive property of being both a conscious experience in the ordinary sense and one that is knowable by the subject as it occurs. Then of course luminosity follows. But you've changed the topic, winning by definitional theft what you couldn't earn by analytic hard work.

Could luminosity, subjectivity, unity, etc., covertly belong to the concept of consciousness, so that the right type of armchair (not empirical) reflection would reveal that all possible conscious experiences in every possible conscious entity must necessarily be luminous, subjective, or unified? Could subtle analytic hard work reveal something I'm missing? I can't prove otherwise. If you think so, I await your impressive argument. Even Kant held only that luminosity, subjectivity, and unity were necessary features of our experience, not of all possible experiences in all possible beings.

Set aside purely conceptual arguments, then. If we hope to defend the essentiality of any of these ten features, we'll need an empirically justified universal theory of consciousness.

That brings me to the second reason I've presented this feature list. I conjecture that universal theories of consciousness, intended to apply to all possible beings, instead of justifying the universality of (one or more of) these features circularly assume the universality of (one or more of) these features. Developing this conjecture will have to wait for another day.

Friday, May 16, 2025

The Awesomeness of Bad Art

I love bad art.

Gather some friends and create some bad music. Cruise in a car covered with graffiti doodles. Hand a five-year-old crayons and free time and see what weirdness emerges.

Something worth celebrating happens. Although the art is "bad" in one sense -- it will win no prizes and astound no critics -- it wonderfully enriches the world. How?

[I can swim like a grasfl dolphin can you? by my daughter Kate, at age six]

[Angel and moonbug, by my son Davy, circa age five]

The awesomeness isn't due to impressive technique, honed by years of craft, like Rembrandt. It's not due to intrinsic beauty and color-mad insight, like Van Gogh. It's not due to challenging conventional interpretability and the boundaries of artistic tradition, like Picasso.

Nick Riggle argues that art draws most of its aesthetic value from shared aesthetic engagement, and I agree that's some of the sorcery. A Vengefull Kurtain Rods song, a Vogon poem, or a Mystical Anarchist "motorized cathedral" art car is a social act, deriving value from the connections it fosters and the shared practice of aesthetic valuing -- including, in the case of Vogon poetry, the shared practice of aesthetic loathing. Parents and children bond over the child's emerging abilities and tastes.

But I don't think that Riggle has quite struck to the heart of it. When I improvise on the piano alone at home, relishing the quirky turns of my intermediate jazz piano skills, the ghost of my old piano teacher Matt Dennis may hover nearby, but my minor participation in the social tradition of jazz creation is only part of the story. Similarly for grandma painting seascapes in the eldercare facility -- kitschy, flawed, excruciatingly hers. Similarly for the strange abstract doodles I sometimes sketch when bored at a faculty meetings, which I aesthetically enjoy probably more than I should.

It helps to consider why five-year-olds are better artists than eight-year-olds. Eight-year-olds draw conventional stick figures, conventional houses with two neat windows, a door, and a triangle roof with chimney, a standard rainbow, a standard sun. Four-year-olds have only an inkling of these conventions, invent their own weird solutions -- people as heads on towering legs with too many toes, cars that look like falling toast. At five and six and seven, they shape themselves more toward the generic. Kate's swimmer is generic, but her dolphin is wild and long -- and are those hills or waves or rainbows in the background? Davy's houses look standard, but the grass is sunflower tall, the chimneys jut precariously sideways, his angel's wings are small, and he hasn't figured out how to draw conventional nighttime stars.

Preschoolers and early elementary schoolers show more individuality in their art. It dances barefoot across your expectations. Their lines reflect distinctive aesthetic attempts. This distinctiveness is harder to discover in the more conventional art of later childhood and needs to be rediscovered later. Similarly for grandma, if she hasn't consumed too much Bob Ross. If her seascapes are generic, in one sense they are more competent and less "bad" than untrained attempts, but they have less point and are less valuable than a heartfelt effort that finds a different solution.

Bad art manifests the raw signature of the individual eye. It shows a mind grappling with an aesthetic challenge. If the artist judges it a failure and crosses it out, then their vision hasn't been realized. But if it is beloved in its strangeness -- if the creator affirms it as a successful completion of their artistic intention, then it's a distinctive achievement that reflects the mind and hand of the moment.

Our planet -- amazingly, awesomely, wondrously, beautifully, stunningly (to any aliens who might happen upon it amid the dark blandness of space) -- hosts five-year-olds who draw bugs on the moon and six-year-olds who draw impossibly long dolphins, teenagers doodling on cars, friends collaborating on goofy songs. If no one else would have done it the same way, then the work reflects your distinctive aesthetic encounter with the world. It's a piece of you made visible. Especially (but not only) for those who care about you, it's your individual eye, voice, and values that ignite its meaning.

Bad art can fail in two ways: When it's so generic that the artist vanishes or when the artist disowns it as failing to capture their aesthetic vision. If it passes the sibling tests of distinctiveness and affirmation, it is valuable.

A world devoid of weird, wild, uneven, wonderful artistic flailing would be a lesser world. Let a thousand lopsided flowers bloom!

Thursday, May 08, 2025

Everything Is Sandcastles

Yesterday, Rivka Weinberg spoke at UCR from her forthcoming book, The Meaning of It All, on how time erodes meaning. As is often noted, in a thousand years it will (probably) be as though you had never lived. Everything you strived for will have crumbled to dust. Weinberg doesn't argue that this renders our efforts entirely meaningless -- but it does deprive them of a meaning they would have had, if they had endured. We ought to admit, she says, that this is disheartening, rather than brushing it off with a breezy recommendation to "live in the moment".

Weinberg carves out an exception to time's corrosive power: what she calls atelic goods (drawing on Kieran Setiya's work on the "midlife crisis"). Atelic goods are complete in the moment: strolling through the woods, enjoying a sunset, licking an ice cream cone. Contrast these with telic goods, which aim toward an endpoint: walking to the store, taking the perfect sunset photo, finishing the cone.

In her talk, Weinberg argued that time drained meaning from telic goods -- not entirely, but substantially -- while leaving atelic goods mostly untouched. Yet she cautioned against retreating wholly into atelic pleasures. A life composed only of strolls and sunsets would be vapid. Telic goods, like building a career and cultivating long-term relationships, are essential to a full life.

But during the discussion period, Weinberg introduced the idea of sandcastles as an interesting middle case. (I don't recall this in the talk itself, but it moved fast and I haven't seen a written version.) Building a sandcastle is telic: It unfolds over time and can be interrupted before completion. But it's also ephemeral. Nothing is lost if the sandcastle is gone tomorrow. It was never meant to last, any more than an ice cream cone.

Maybe everything is sandcastles.

Weinberg gave examples of paradigmatic telic goods whose meanings are ravaged by time: Martin Luther King's activism, Jonas Salk's work on the polio vaccine. In a thousand years -- or ten thousand, almost certainly a billion -- it will be as if King and Salk had never existed. But should King have felt disappointed that his activism wouldn't ripple through deep time? Maybe not. Maybe he should have regarded it as a sandcastle: designed for a particular time, not reduced in meaning because it didn't endure forever.

When I raised this during Q&A, I didn't fully grasp Weinberg's reply. The sandcastle example is hers, so I might not be doing her view full justice -- but let me run with the idea.

If we think of all of our projects as sandcastle building, then they aren't necessarily ravaged by time. Of course, many will be wiped away too early. The waves will sweep in before your castle is complete or while you were still relishing its beauty. A rude stranger might trample it. Maybe almost every truly important project loses its impact before we're ready. But that's not an inevitability built into the structure of telic meaning and the nature of time. It's a contingent fact about the fragile, unstable nature of our chosen projects in a risky world.

Maybe, by shaping our intentions differently, or thinking about our projects differently, we reduce their vulnerability. Suppose I build a sandcastle knowing there's a 50% chance it will be swept away before I finish -- and thus, perhaps, not intending to finish but intending only to get as far as I can. If the wave comes early, I can still be disappointed -- but the wave no longer robs the act of its intended meaning. I did, in fact, get as far as I could. And if I build right at the water's edge, knowing there's a 90% chance I won't complete the castle's final envisioned tower, then finishing is a delightful surprise: a bonus meaning, so to speak, beyond my expectation. If brevity is the default intention and expectation, then the collapse of my castles does not deprive my actions of their expected or intended meaning, while unlikely endurance adds meaning relative to base line.

Could we adopt the same attitude to our relationships and careers? The waves of life could sweep them away any day. A realistic sense of hazard might be folded into the intention itself. I intend to start a marriage and nurture it -- not with the expectation that we will still be happily together at eighty, but with the hope that we might. If we make it, wonderful! Like a sandcastle surviving high tide. If it happens, I'm surprised and delighted, and I'll do what I can for that. Similarly, I intend to begin a career and pursue it. If the wave comes, well, the plan was always only to build toward something that I knew from the start would sooner or later be taken by the surf.

There will still be grief and regret. Things rarely go as well as they might have gone. But if I fully embrace this mindset (let's be honest: I can't), my projects won't have less meaning than intended, even if the waves take them sooner than I would have liked.

[remember this meme from 2007?]

Friday, May 02, 2025

When Is a Theory Superficial?

by Jeremy Pober and Eric Schwitzgebel

Twelve years ago, one of us (ES) distinguished two kinds of theories: superficial and deep. Nearly any phenomenon can be approached in a superficial or deep manner. A superficial judge of human beauty treats it as skin deep. A superficial reading of Shakespeare takes characters at their word and focuses on the obvious aspects of each scene. A superficial housecleaning ignores the backsides and undersides of household items.

And of course one can have a superficial theory of belief. Phenomenal dispositionalism is intended to be such a theory. According to phenomenal dispositionalism, whether someone believes that P is a matter of whether they have certain behavioral, phenomenal (i.e., experiential), and cognitive dispositions, specifically, the dispositions that are "stereotypical" of a person who believes that P. Compare: To be an extravert just is to have the behavioral, phenomenal, and cognitive dispositions stereotypical of extraversion.

Superficial theories contrast with deep theories. Among theories of belief, the main contrast has been with the computationalist, representationalist functionalism made famous by Jerry Fodor (1987) and recently defended by Jake Quilty-Dunn and Eric Mandelbaum.

But what makes a theory of some property P superficial (or deep)? Twelve years ago, ES offered an answer: It depends on the theory's relationship to surface properties. Surface properties are observable features of a phenomenon that a theory of P is designed to explain (in a loose sense of "observable"[1]).

What relation to surface properties must a theory have to be superficial or deep? Back in 2013, ES said that "relative to a class of surface phenomena... a property is superficial if it identifies possession of the property simply with patterns in the surface phenomena" (2013, 77). And a theory is deep "relative to a class of surface phenomena... if it identifies possession of the property with some feature other than patterns in those same surface phenomena -- some feature that presumably explains or causes or underwrites those surface patterns" (ibid.).

This definition fits our toy examples above. A superficial judge of beauty relies on the most easily observable physical patterns, a superficial reading of Shakespeare focuses on surface-level dialogue, and a superficial house-cleaning treats looking clean as clean.

However, we have reason to be unsatisfied with this definition. [ES thanks JP for emphasizing this point in a series of discussions.]

Consider poison, a "causal concept" in David Armstrong (1968)'s sense: a concept defined by its causes and/or effects. Poison can be defined in terms of biologically harming a person when ingested (with refinements to differentiate poisoning from, say, drinking lava).[2] If I explain a death by saying that a person was poisoned, you can infer that the death was caused by ingestion rather than, say, hypothermia. That's informative -- but much less informative than saying that the person ingested cyanide, because chemical types like cyanide are defined structurally, allowing detailed explanations of how they interact with human physiology.

A theory of health that only has non-structural causal concepts like "poison" (or "medicine") would be a superficial theory of health. A deep theory, in contrast, invokes underlying mechanisms.

Yet, by ES's 2013 definition, a theory appealing to poison wouldn't count as superficial, because ingesting poison isn't merely related to death as two parts of a superficial pattern. Poison causes death.[3]

In a new draft, ES proposes a revised definition: a theory of property P is superficial if "whether an entity has property [P] is determined (that is, constituted or grounded...) entirely by superficial facts about that entity", where superficial facts are readily observed facts. For causal concepts, being the cause of is a constitutive relationship. This new definition thus accommodates causal superficialism, where poisons cause death and medicines cause recoveries, as inferable from readily observable relationships (such as randomized controlled trials), without appeal to deeper structural features.

That's a good thing! Otherwise, phenomenal dispositionalism only counts as a superficial theory of belief if dispositions don't cause their manifestations. Some philosophers of mind (e.g., Ryle 1949) indeed view dispositions non-causally. But others, like Armstrong (1968), propose a "realist" conception: Dispositions are type-identical to their causal bases. Fragility, for example, is identified with the microstructural features that cause fragile objects to break when struck.[4]

In his original articulation of phenomenal dispositionalism, ES expressed willingness to accept such a realist view (2002, 273n18). This version of dispositionalism can be considered equivalent to a version of functionalism (which holds that mental states can be defined in terms of their causal relations to inputs, outputs, and other mental states). Georges Rey (1997) calls this type of functionalism superficial functionalism, where all functional/causal roles are defined only in relation to behavior, thought, experience, and "similar" states (e.g., desire is similar to belief, so a superficial functionalist theory of belief can include relations to desires).[5]

Of course, deep theories also often employ causal explanations. So if causal superficial theories are possible, what distinguishes them from deep theories? The answer is that causal posits in superficial theories have minimal explanatory content, whereas deep theories have excess explanatory content.[6] Posits with minimal explanatory content explain all that they were posited to explain and no more, whereas posits with excess content make further falsifiable predictions.

Consider the difference between a geneticist working right after Gregor Mendel published his work on heritability, and one working after Franklin, Watson, and Crick had mapped the structure of DNA and demonstrated how it instantiated genetic material. Mendel's theory, which gives us the posits of trait, gene, allele, and dominant/recessive, is a powerful theory (much like belief/desire psychology), but it doesn't explain how genes and alleles have the properties that they do. An allele is just the genetic material for a variant in phenotype, e.g., blood type A versus B or O. But in the initial Mendelian framework, it was defined as "whatever is responsible for variance in (e.g.) blood type".

[illustration of Mendel's superficial causal theory; image source]

Contrast with someone working in the latter half of the 20th century. They know that genetic information is realized in DNA (& RNA), which via its repeating base patterns and double helix structure, acts as a base code for the information that constitutes alleles. In other words, they know how genes carry genetic information.[7]

Superficial theories needn't be acausal, but if they posit causal relationships, those relationships must exist among the readily observable features, without invoking hidden structures or mechanisms that yield additional explanatory content. In contrast, the later 20th century theory makes many more falsifiable predictions -- those that follow from the structure of DNA -- and thus has excess explanatory content.

--------------------------------------------

[1] This might not match the sense of "observable" sometimes used in philosophy of science. Dennett (1994) defines observable from his perspective of "urbane verificationism" and, for a theory of attitudes, takes the same list of surface properties to be observable as ES: behavior, thought, and experience.

[2] More precisely, poison is always a two-place predicate, poison-for-S where S is some group of organisms such as a species. When no such group is specified, we can treat instances of poison as poison-for-humans. We are ignoring contact poisons and other complications.

[3] Thus the distinction between superficial and deep theories is not a distinction about noncausal versus causal explanations. Consequently, the superficial/deep distinction as applied to the attitudes does not end up reducing to Devin Curry's distinction between beliefs as properties of persons and beliefs as "cogs" of cognitive science (Curry 2021).

[4] The standard way of defining a causal basis is in terms of physical properties, such as microstructural properties defining "fragility". However this is not a strict requirement. One can posit a mental kind (as in Quilty-Dunn and Mandelbaum 2018 where representations are the causal bases of dispositions constitutive of belief stereotypes) or even a higher-order kind (as in Prior, Pargetter, and Jackson 1982).

[5] Rey (1994; 1997) invokes this term in a debate with Dan Dennett that parallels the debate between ES and Quilty-Dunn and Mandelbaum. While the overall debate turns on different issues, the definition of superficialist theories of belief lines up. Examples of this sort of functionalism plausibly include David Armstrong (1968), the David Lewis of "An Argument for the Identity Theory" (1966) but maybe not the David Lewis of "Mad Pain and Martin Pain" (1980), and Adam Pautz 2021).

[6] Term adopted from Lakatos's (1968) notion of "excess" explanatory content.

[7] The DNA example also lets us talk about different levels or degrees of depth. The late 20th century theory of a gene is a deep one, but so is a theory mid-way between that and Mendel's. In the first years of the 20th century scientists identified chromosomes as the realizer of genes, but did not know that chromosomes were made of DNA (they thought they were proteins). This theory too is deep -- there are excess predictions made by the assignment of genetic material to chromosomes -- but not as deep as later views, because not nearly as many excess predictions were made. We can tentatively call such a theory formally deep, whereas a theory that more fully explains how the posit in question (genes, beliefs) has the properties that it does is substantively deep.

Monday, April 28, 2025

People with Unusual, Minority, Culturally Atypical, or Historically Underrepresented Experiences and Worldviews Should be Overrepresented in Philosophy, Rather than Underrepresented

Saturday's post finding that only 16% of Authors in Elite Philosophy Journals Are Women brought out the misogynist bros on Twitter, but also some remarks from well-meaning people along the lines of "maybe women (ethnic minorities, etc.) just aren't that interested in philosophy".

I expressed my rejection of this perspective in a post for the Blog of the APA in 2020. Perhaps it warrants reposting:

There is nothing about philosophy, as a type of inquiry into fundamental facts about our world, that should make it more attractive to White men than to Black women. Philosophical reflection is an essential part of the human condition, of interest to people of all cultures, races, classes, and social groups. If our discipline and society were in a healthy, egalitarian condition, we should, in fact, expect people from minority groups to be overrepresented in academic philosophy, rather than underrepresented. Academic philosophy should celebrate diversity of opinion, encourage challenges to orthodoxy, and reward fresh perspectives that come from inhabiting cultures and having life experiences different from the mainstream. We should be eager, not reluctant, to hear from a wide range of voices. We should especially welcome, rather than create an inhospitable or cool environment for, people with unusual or minority or culturally atypical or historically underrepresented experiences and worldviews. The productive engine of philosophy depends on novelty and difference.

Saturday, April 26, 2025

16% of Authors in Elite Philosophy Journals Are Women

In some ways, the gender situation has been improving in philosophy. Women now constitute about 40% of graduating majors in philosophy in the U.S., up from about 32% in the 1980s-2010s. There is, I think, substantially more awareness of gender issues and the desirability of gender diversity than there was fifteen years ago. And yet, at the highest levels of impact and prestige, philosophy remains overwhelmingly male.

One measure of this is authorship in elite philosophy journals. For this post, I examined the past two years' tables of contents of Philosophical Review, Mind, Journal of Philosophy, and Nous -- widely considered to be the most elite general philosophy journals in mainstream Anglophone philosophy. (Some rankings put Philosophy & Phenomenological Research alongside these four.) I estimated the gender of each author of each article, commentary, or response (excluding book reviews and editorial prefaces), based gender-typical name, gender-typical photo, pronoun use, and/or personal knowledge, generally using at least two criteria. Of 291 included authors, there were only two who were either non-binary or defied classification -- in both cases, based on an expressed preference for they/them pronouns. There's always a risk of mistake, but for the most part I expect that my gender classifications accurately reflect how the authors identify and are perceived, with at most a 1-2% error rate.

Overall, I found:

Authorship Rates In Four
Elite Philosophy Journals
(Past Two Years):
Women: 46 authorships
Men: 243 authorships
Nonbinary/unclassified: 2 authorships
Percent women: 16%

Women now earn about 30% of PhDs in the U.S. and constitute almost 30% of American Philosophical Association members who report their gender -- so authorship in these journals is substantially more skewed than faculty in the United States. Of course, many authors are neither located nor received their PhD in the U.S., so these percentages aren't strictly comparable. However, PhD and faculty percentages are broadly similar in the U.K. and, impressionistically, in other high-income Anglophone countries. (I'm less sure outside the English-speaking world, but researchers in non-Anglophone countries author only a small percentage of articles in elite Anglophone journals; see here for an analysis of the insularity of Anglophone philosophy.)

Now, one possible explanation of this skew is that women are more likely to specialize in ethics than in other areas of philosophy (see these ten-year-old data), and these four journals publish relatively little ethics. To explore this possibility, I did two things:

First, I coded each article in the big four journals as either "ethics" or "non-ethics", based on the title or the abstract if the title was ambiguous. I included political philosophy, social philosophy, metaethics, and history of ethics as ethics. (Of course, there were some gray-area cases and judgment calls.)

Second, I added two journals to my list: Ethics and Philosophy & Public Affairs, generally considered the two most elite ethics journals (though after the editorial turmoil at PPA last year, it's not clear whether this will remain true of PPA).

In the big four, I classifed 60/291 (21%) authorships as ethics. (Perhaps this is a slight underrepresentation of ethics in these journals, relative to the proportion of research faculty in the Anglophone world who specialize in ethics?) In these journals, I found that indeed women have a higher percentage of ethics authorships than non-ethics authorships:

Authorship by Gender
in Big 4 Philosophy Journals
Ethics vs. Non-Ethics
Ethics: 17/60 (28%)
Non-ethics: 29/231 (13%)
[Fisher's exact 2-tail, p = .005]

If we juice up the sample size by adding in Ethics and PPA, we get the following:

Authorship by Gender
in 6 Elite Philosophy Journals
Ethics vs. Non-Ethics
Ethics: 40/142 (29%)
Non-ethics: 29/231 (13%)
[Fisher's exact 2-tail, p < .001]

[corrected Apr 27]

Strikingly, women appear to be more than twice as likely to author ethics articles than non-ethics articles.

Ten years ago, I did some similar analyses, comparing ethics vs. non-ethics authorships in two-year bins every 20 years from 1955 to 2015. In those samples, too, I found women to author only a small percentage of articles in elite journals overall (13% in 2014-2015) and to be more likely to author in ethics, so the trends are historically consistent.

ETA April 28: To be clear, all four journals normally use double-anonymous refereeing.

Tuesday, April 15, 2025

Harmonizing with the Dao: Sketch of an Evaluative Framework

Increasingly, I find myself drawn to an ethics of harmonizing with the Dao. Invoking "the Dao" might sound mystical, non-Western, ancient, religious -- alien to mainstream secular 21st-century Anglophone metaphysics and ethics. But I don't think it needs to be. It just needs some clarification and secularization. As a first approximation, think of harmonizing with the Dao as akin to harmonizing with nature. Then broaden "nature" to include human patterns as well as non-human, and you're close to the ideal. Maybe we could equally call it an ethics of "harmonizing with the world" or simply an "ethics of harmony". But explicit reference to "the Dao" helps locate the idea's origins and its Daoist flavor.

[image source]

The Metaphysics of Dao

In the intended sense -- inspired by ancient Daoism and Confucianism, but adapted for a 21st century Anglophone context -- the "Dao" the world as a whole. However, it is not the world conceptualized as a collection of objects, but rather as a system of processes and patterns. The Dao is the spinning of Earth; the rise and fall of mountains and species; the rise and fall of cities and nations; human birth, childhood, adulthood, and death; people discovering and losing love; the way strangers greet each other; the growth of your fingernails; the falling of a leaf.

The Axiology of Dao

Some strands in the Daoist tradition hold that all manifestations of the Dao are equally good. But the more dominant strand holds that things can go better or worse. And certainly the Confucians, who also sought harmony with the Dao, held that things could go better or worse.

What constitutes things going better? I favor value pluralism: More than one type of thing has fundamental value. Happiness is valuable, of course. But so also is knowledge (even when it doesn't lead to happiness), beauty, human relationships, and even (I'd argue) the existence of stones.

One way to clarify our thoughts about value is the "distant planet thought experiment". Consider a planet on the far side of the galaxy, forever blocked by the galactic core, with which we will never interact. What would you hope for, for the sake of this planet? Most of us would not hope for a sterile rock, but rather for a planet rich with life -- and not just microbes, not just jungles of plants and animals, but a diverse range of entities capable of forming societies, capable of love and cooperation, art and science, engineering and sports, entities capable of generations-long endeavors and of philosophical wonder as they gaze up at the stars or down through their microscopes.

We might say that a planet, or a region of spacetime, is flourishing when it instantiates, or is on the path toward instantiating, such excellent patterns.

Conceptual Frameworks

Philosophers typically ask two questions when I propose harmonizing with the Dao as an ethical ideal. First, how does it differ from the more familiar (to them) ethics of consequentialism, deontology, and virtue ethics? Second, what specifically does it recommend?

To the first question: Unlike consequentialism, there is no single good or bundle of goods that you should maximize; unlike deontology, there is no one rule or set of rules you should follow (unless we interpret "harmonize with the Dao" as the rule); unlike virtue ethics, there is no canonical set of virtues the cultivation and instantiation of which is the foremost imperative. Instead, the animating idea is to flow harmoniously along with the Dao and participate in, rather than strain against, its flourishing.

That's vague, of course. What specifically should you do, if your aim is to harmonize with the Dao?

I have some thoughts. But first, notice that consequentialism as a general ethical perspective is compatible with a wide range of possible concrete actions, depending on how it is developed and on the details of your situation. So also can deontological and virtue ethical perspectives be made compatible with a wide range of specific actions. What these broad ethical perspectives offer, primarily, is not specific advice but rather conceptual frameworks for ethical thinking -- in terms of consequences and expectations, or in terms of rules of different types, or in terms of a range of virtues and vices. So let's consider what broad concepts an ethics of harmony might employ, with the specific advice as an illustration of how those concepts might work.

Harmony and Disharmony, Illustrated in a University Context

Harmonizing with the flourishing patterns of the Dao involves participating in those patterns, enriching them, and enabling others to participate in and enrich those patterns. Suppose you think that one of the great processes worth preserving in the world is university education. You can participate in that process by being a good teacher, by being an administrator who helps things run smoothly, by being a custodian who helps keep the grounds clean, and so on. You can enrich it by helping to make it even more awesome than it already is -- for example by being an unusually inspiring teacher or by being not just an ordinary custodian but one who adds a bright smile to a student's day. You can enable others to participate in and enrich those patterns by helping hire a terrific teacher or custodian or by providing the type of environment that brings out the best in others.

We can see the university as a place where many lives converge either briefly or for decades. This convergence is valuable not just for what it yields but in itself. The processes constituting university life also participate in and enable other valuable processes, whether those are individual human lives, or other institutions that partly overlap with or depend on the university, or projects and events that happen within the university, or simply the natural and architectural beauty of an appealing campus.

Compare this way of thinking about the ethics of participation in a university with consequentialism (emphasizing the various goods that university education is expected to deliver), deontology (emphasizing the rules one ought to follow within a university), or virtue ethics (emphasizing the manifestation and cultivation of virtues such as curiosity and compassion). While I don't object to any of those ways of thinking about the ethics of university life, the Daoist perspective is, I hope, a valuable alternative lens.

Disharmony could involve cutting short, or attempting to cut short, an axiologically valuable pattern (rather than letting it come to its natural end), working against that pattern, or preventing others from harmonizing. Continuing the university example, cutting funding for valuable research, firing an excellent teacher, disrupting classes, littering, or flying a noisy helicopter overhead might all count as disharmonious. Other examples can include preventing access or undermining the conditions that allow students, faculty, or staff to flourish in their roles.

Comparisons with Music

You are not the melody-maker. "Harmony" suggests a contrast with "melody". You are not the melody-maker, the director, the first violinist, the lead singer, the lead guitarist -- at least not usually. Your typical role is to support an already-happening good thing.

Diversity and pluralism. There is more than one way to harmonize. A piece is richer when not everyone plays the same note.

Improvisation. Zhuangzi emphasized flowing along with things in an improvisational manner, rather than adhering to fixed rules. Often, the best music has improvisational elements, or at least room to allow one's mood of the moment to influence how one plays the notes. Spontaneous improvisation manifests harmony within the improviser, among the various unarticulated inclinations that arise without explicit cognitive control.

Aesthetic value. The boundary between aesthetic and ethical value (and other types of value) might not be as sharp as philosophers often suppose.

Conflicts of Harmony

A tree is a wondrous thing. Cutting it down cuts short an axiologically valuable pattern, and is normally out of harmony with the tree, the forest, and the lives it supports. But if the tree becomes lumber for a beautiful home, then that act belongs to another axiologically valuable pattern and is in harmony with the Dao of human cultural life.

Your wife wants one thing from you; your mother, another. Harmony with one might involve dissonance with the other. You might consider how sharp the dissonance is in each case. You might consider what patterns are being enacted in these relationships, and which are the more valuable patterns to sustain.

Like any ethical approach, harmonizing with the Dao must allow for conflicts and tradeoffs. The world makes competing demands and offers incompatible opportunities. There needn't be a formula for how to deal with all such cases. In some cases, creative thinking might allow one to support or integrate multiple patterns or integrate them into a whole: Removing a tree is sometimes overall good for a forest; occasional tension with a spouse may sustain a healthier relationship than shallow peace.

Sometimes the conflict is the harmony. Chess masters seek incompatible goals as part of the larger pattern of a competition. Predators consume prey in a healthy ecosystem. Law and politics require adversaries in a (hopefully) well-functioning social system.

My main overall thought is that we can build a fruitful framework for ethical thinking by taking the root project to be one of harmonizing with the awesome patterns and processes of the world.

The Splintered Mind

Monday, July 14, 2025

Yayflies and Rebugnant Conclusions

Monday, July 07, 2025

The Emotional Alignment Design Policy

Tuesday, July 01, 2025

Three Epistemic Problems for Any Universal Theory of Consciousness

Monday, June 23, 2025

The Conceptual and Methodological Challenges of Developing a Moralometer

Friday, June 13, 2025

Does the Arc of History Bend Toward Justice? Outline of an Empirical Test

Friday, June 06, 2025

Types and Degrees of Turing Indistinguishability; Thinking and Consciousness

Friday, May 30, 2025

New Paper in Draft: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Monday, May 26, 2025

Diversity, Equity, and Inclusion in Philosophy: Good Practices Guide

Friday, May 23, 2025

Ten Purportedly Essential Features of Consciousness

Friday, May 16, 2025

The Awesomeness of Bad Art

Thursday, May 08, 2025

Everything Is Sandcastles

Friday, May 02, 2025

When Is a Theory Superficial?

Monday, April 28, 2025

People with Unusual, Minority, Culturally Atypical, or Historically Underrepresented Experiences and Worldviews Should be Overrepresented in Philosophy, Rather than Underrepresented

Saturday, April 26, 2025

16% of Authors in Elite Philosophy Journals Are Women

Tuesday, April 15, 2025

Harmonizing with the Dao: Sketch of an Evaluative Framework

Recent Comments (may be delayed)

Advice on Applying to PhD Programs in Philosophy

Past Guest Bloggers

Blog Archive