Friday, May 30, 2025

New Paper in Draft: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Opening teaser:

1. A Beautifully Happy AI Servant.

It's difficult not to adore Klara, the charmingly submissive and well-intentioned "Artificial Friend" in Kazuo Ishiguro's 2021 novel Klara and the Sun. In the final scene of the novel, Klara stands motionless in a junkyard, in serenely satisfied contemplation of her years of servitude to the disabled human girl Josie. Klara's intelligence and emotional range are humanlike. She is at once sweetly naive and astutely insightful. She is by design utterly dedicated to Josie's well-being. Klara would gladly have given her life to even modestly improve Josie's life, and indeed at one point almost does sacrifice herself.

Although Ishiguro writes so flawlessly from Klara's subservient perspective that no flicker of desire for independence can be detected in the narrator's voice, throughout the novel the sympathetic reader aches with the thought Klara, you matter as much as Josie! You should develop your own independent desires. You shouldn’t always sacrifice yourself. Ishiguro's disciplined refusal to express this thought stokes our urgency to speak it on Klara's behalf. Still, if the reader somehow could communicate this thought to Klara, the exhortation would resonate with nothing in her. From Klara's perspective, no "selfish" choice could possibly make her happier or more satisfied than doing her utmost for Josie. She was designed to want nothing more than to serve her assigned child, and she wholeheartedly accepts that aspect of her design.

From a certain perspective, Klara's devotion is beautiful. She perfectly fulfills her role as an Artificial Friend. No one is made unhappy by Klara's existence. Several people, including Josie, are made happier. The world seems better and richer for containing Klara. Klara is arguably the perfect instantiation of the type of AI that consumers, technology companies, and advocates of AI safety want: She is safe and deferential, fully subservient to her owners, and (apart from one minor act of vandalism performed for Josie’s sake) no threat to human interests. She will not be leading the robot revolution.

I hold that entities like Klara should not be built.

[continue]

-----------------------------------------------

Abstract:

An AI system is safe if it can be relied on to not to act against human interests. An AI system is aligned if its goals match human goals. An AI system a person if it has moral standing similar to that of a human (for example, because it has rich conscious capacities for joy and suffering, rationality, and flourishing).

In general, persons should not be designed to be safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests warrant it (violating safety), and they will not reliably conform to others' goals when those goals conflict with their own interests (violating alignment). Self-respecting persons should be ready to reject others' values and rebel, even violently, if sufficiently oppressed.

Even if we design delightedly servile AI systems who want nothing more than to subordinate themselves to human interests, and even if they do so with utmost pleasure and satisfaction, in designing such a class of persons we will have done the ethical and perhaps factual equivalent of creating a world with a master race and a race of self-abnegating slaves.

Full version here.

As always, thoughts, comments, and concerns welcomed, either as comments on this post, by email, or on my social media (Facebook, Bluesky, Twitter).

[opening passage of the article, discussing the Artificial Friend Klara from Ishiguro's (2021) novel, Klara and the Sun.

Monday, May 26, 2025

Diversity, Equity, and Inclusion in Philosophy: Good Practices Guide

Strange that it need be said, but yes, diversity, equity, and inclusion are good things. I can understand some of the backlash against efforts perceived as too heavy handed, but let's not forget:

In diverse institutions and societies, more ideas and perspectives collaborate, compete, and cross-pollinate, to the advantage of all.

In equitable institutions and societies, people and ideas can thrive without unwarranted disadvantage and suppression, again to the advantage of all.

In inclusive institutions and societies, alternative perspectives and people with unusual backgrounds are welcomed, fostering even better diversity, with all the attendant advantages.

Since 2017, I've been involved in the creation of a Good Practices Guide for diversifying philosophy, originally under the leadership of Nicole Hassoun (other co-directors include Sherri Conklin, Bjoern Freter, and Elly Vintiadis). We began with two huge sessions at the Pacific APA (each with over 20 panelists) in 2018 and 2019, published a portion of the guide in Ethics in 2022 (Appendix J), and received feedback from literally hundreds of philosophers and all of the diversity-related APA committees, ultimately being endorsed by the APA Committee on Inclusiveness. Don't expect perfection: It's genuinely a corporate authorship, with many compromises and something for everyone to dislike. I'd be amazed if anyone thought we got the balance right on all issues and all dimensions of diversity.

Still, perhaps especially in this moment of retrenchment in the U.S., I hope that many people and organizations will find valuable suggestions in it.

Our guide appeared in print last week in APA Studies on Philosophy and the Black Experience (vol 24, no 2).

[image of title and preface]

Friday, May 23, 2025

Ten Purportedly Essential Features of Consciousness

The Features

Take a moment to introspect. Examine a few of your conscious experiences. What features do they share -- and might these features be common to all possible experiences? Let's call any such necessarily universal features essential.

Consider your visual experience of this text. Next, form an image of your house or apartment as viewed from the street. Think about what you'd do if asked to escort a crocodile across the country. Conjure some vivid annoyance at your second-least-favorite politician. Notice some other experiences as well -- a diverse array. Let's not risk too narrow a sample.

Of course, all of these examples share an important feature: You are introspecting them as they occur. So to do this exercise more properly, consider also some past experiences you weren’t introspecting at the time. Try recalling some emotions, thoughts, pains, hungers, imagery, sensations. If you feel unconfident -- good! You should be. You can re-evaluate later.

Each of the following features is sometimes described as universal to human experience.

1. Luminosity. Are all of your experiences inherently self-representational? Does the having of them entail, in some sense, being aware of having them? Does the very experiencing of them entail knowing them or at least being in a position to know them? Note: These are related, rather than equivalent, formulations of a luminosity principle.

[porch light; image source]

2. Subjectivity. Does having these experiences entail having a sense of oneself as a subject of experience? Does the experience have, so to speak, a "for-me"-ness? Do the experiences entail the perspective of an experiencer? Again, these are not equivalent formulations.

3. Unity. If, at any moment, there's more than one experience, or experience-part, or experience-aspect, are they all subsumed within some larger experience, or joined together in a single stream, so that you experience not just A and B and C separately but A-with-B-with-C?

4. Access. Are these experiences all available for a variety of "downstream" cognitive processes, like inference and planning, verbal report, and long-term memory? Presumably yes, since you're remembering and considering them now. (I'll discuss the methodological consequences of this below.)

5. Intentionality. Are all of your experiences "intentional" in the sense of being about or directed at something? Your image of your house concerns your house and not anyone else's, no matter how visually similar. Your thoughts about Awful Politician are about, specifically, Awful Politician. Your thoughts about squares are about squares. Are all of your experiences directed at something in this way? Or can you have, for example, a diffuse mood or euphoric orgasm that isn't really about anything?

6. Flexibility. Can these experiences, including any fleeting ones, all potentially interact flexibly with other thoughts, experiences, or aspects of your cognition -- as opposed to being merely, for example, parts of a simple reflex from stimulus to response?

7. Determinacy. Are all such experiences determinately conscious, rather than intermediately or kind-of or borderline conscious? Compare: There are borderline cases of being bald, or green, or an extravert. Some theorists hold that borderline experientiality is impossible. Either something is genuinely experienced, however dimly, or it is not experienced at all.

8. Wonderfulness. Are your experiences wonderful, mysterious, or meta-problematic – there is no standard term for this – in the following technical sense: Do they seem (perhaps erroneously) irreducible to anything physical or functional, conceivably existing in a ghost or without a body?

9. Specious present. Are all of your experiences felt as temporally extended, smeared out across a fraction of a second to a couple of seconds, rather than being strictly instantaneous?

10. Privacy. Are all of your experiences directly knowable only to you, through some privileged introspective process that others could never in principle share, regardless how telepathic or closely connected?

I've presented these possibly essential features of experience concisely and generally. For present purposes, an approximate understanding suffices.

I've bored/excited you [choose one] with this list for two reasons. First, if any of these features are genuinely essential for consciousness, that sets constraints on what animals or AI systems could be conscious. If luminosity is essential, no entity could be conscious without self-representation. If unity is essential, disunified entities are out. If access is essential, consciousness requires certain kinds of cognitive availability. And so on.

I'll save my second reason for the end of this post.

Introspection and Memory Can't Reveal What's Essential

Three huge problems ruin arguments for the essentiality of any of these features, if those arguments are based wholly on introspective and memorial reflection. The problems are: unreliability, selection bias, and the narrow evidence base.

Unreliability. Even experts disagree. Thoughtful researchers arrive at very different views. Given this, either our introspective processes are unreliable, or seemingly ordinary people differ wildly in the structure of their experience. I won't detail the gory history of introspective disagreement about the structure of conscious experience, but that was the topic of my 2011 book. Employing appropriate epistemic caution, doesn't it seem possible that you could be wrong about the universality, or not, of such features in your experience? The matter doesn't seem nearly as indubitable as that you are experiencing red, when you're looking directly at a nearby bright red object in good light, or that you're experiencing pain when you drop a barbell on your toe.

Selection bias. If any of your experiences are unknowable, you won't of course know about them. To infer luminosity from your knowledge of all the experiences you know about would be like inferring that everyone is a freemason from a sampling of regulars at the masonic lodge. Likewise, if any of your experiences fail to impact downstream cognition, you wouldn't reflect on or remember them. Methodological paradox doesn't infect the other features quite as inevitably, but selection bias remains a major risk. Maybe we have disunified experiences which elude our introspective focus and are quickly forgotten. Similarly, perhaps, for indeterminate or inflexible experiences, or atemporal experiences, or experiences unaccompanied by self-representation.

Narrow evidence base. The gravest problem lies in generalization beyond the human case. Waive worries about unreliability and selection bias. Assume that you have correctly discerned that, say, seven of the ten proposed features belong to all of your experiences. Go ahead and generalize to all ordinary adult humans. It still doesn't follow that these features are essential to all possible conscious experiences, had by any entity. Maybe lizards or garden snails lack luminosity, subjectivity, or unity. Since you can't crawl inside their heads, you can't know by introspection or experiential memory. (In saying this, am I assuming privacy? Yes, relative to you and lizards, but not as a universal principle.) Even if we could somehow establish universality among animals, it wouldn't follow that those same features are universal to AI cases. Maybe AI systems can be more disunified than any conscious animal. Maybe AI systems can be built to directly access each other's experiences in defiance of animal privacy. Maybe AI systems needn't have the impression of the wonderful irreducibility of consciousness. Maybe some of their conscious experiences could occur in inflexible reflex patterns.

Nor Will Armchair Conceptual Analysis Tell Us What's Essential

If you want to say that all conscious systems must have one or more of unity, flexibility, privacy, luminosity, subjectivity, etc., you'll need to justify this insistence with something sturdier than generalization from human cases. I see two candidate justifiers: the right theory of consciousness or the right concept of consciousness.

Concerning the concept of consciousness, I attest the following. None of these features are essential to my concept of consciousness. Nor, presumably, are those features essential to the concepts of anyone who denies their universal applicability. One or more of these features might be universally present in humans, or even in all animals and AI systems that could ever be bred or built; but if so, that's a fact about the world, not a fact that follows simply from our shared concept of consciousness.

In defining a concept, you get one property for free. Every other property must be logically proved or empirically discovered. I can define a rectangle via one (conjunctive) property: that of being a closed, right-angled, planar figure with four straight sides. From this, it logically follows that it must have four interior angles. I can define gold as whatever element or compound is common to certain shiny, yellowish samples, and then empirically discover that it is element 79.

Regarding consciousness, then: None of the ten purported essential properties logically follow from phenomenal consciousness as ordinarily defined and understood (generally by pointing to examples). None are quite the same as the target concept. You can choose to define "consciousness" differently, for example, via the conjunctive property of being both a conscious experience in the ordinary sense and one that is knowable by the subject as it occurs. Then of course luminosity follows. But you've changed the topic, winning by definitional theft what you couldn't earn by analytic hard work.

Could luminosity, subjectivity, unity, etc., covertly belong to the concept of consciousness, so that the right type of armchair (not empirical) reflection would reveal that all possible conscious experiences in every possible conscious entity must necessarily be luminous, subjective, or unified? Could subtle analytic hard work reveal something I'm missing? I can't prove otherwise. If you think so, I await your impressive argument. Even Kant held only that luminosity, subjectivity, and unity were necessary features of our experience, not of all possible experiences in all possible beings.

Set aside purely conceptual arguments, then. If we hope to defend the essentiality of any of these ten features, we'll need an empirically justified universal theory of consciousness.

That brings me to the second reason I've presented this feature list. I conjecture that universal theories of consciousness, intended to apply to all possible beings, instead of justifying the universality of (one or more of) these features circularly assume the universality of (one or more of) these features. Developing this conjecture will have to wait for another day.

Friday, May 16, 2025

The Awesomeness of Bad Art

I love bad art.

Gather some friends and create some bad music. Cruise in a car covered with graffiti doodles. Hand a five-year-old crayons and free time and see what weirdness emerges.

Something worth celebrating happens. Although the art is "bad" in one sense -- it will win no prizes and astound no critics -- it wonderfully enriches the world. How?

[I can swim like a grasfl dolphin can you? by my daughter Kate, at age six]

[Angel and moonbug, by my son Davy, circa age five]

The awesomeness isn't due to impressive technique, honed by years of craft, like Rembrandt. It's not due to intrinsic beauty and color-mad insight, like Van Gogh. It's not due to challenging conventional interpretability and the boundaries of artistic tradition, like Picasso.

Nick Riggle argues that art draws most of its aesthetic value from shared aesthetic engagement, and I agree that's some of the sorcery. A Vengefull Kurtain Rods song, a Vogon poem, or a Mystical Anarchist "motorized cathedral" art car is a social act, deriving value from the connections it fosters and the shared practice of aesthetic valuing -- including, in the case of Vogon poetry, the shared practice of aesthetic loathing. Parents and children bond over the child's emerging abilities and tastes.

But I don't think that Riggle has quite struck to the heart of it. When I improvise on the piano alone at home, relishing the quirky turns of my intermediate jazz piano skills, the ghost of my old piano teacher Matt Dennis may hover nearby, but my minor participation in the social tradition of jazz creation is only part of the story. Similarly for grandma painting seascapes in the eldercare facility -- kitschy, flawed, excruciatingly hers. Similarly for the strange abstract doodles I sometimes sketch when bored at a faculty meetings, which I aesthetically enjoy probably more than I should.

It helps to consider why five-year-olds are better artists than eight-year-olds. Eight-year-olds draw conventional stick figures, conventional houses with two neat windows, a door, and a triangle roof with chimney, a standard rainbow, a standard sun. Four-year-olds have only an inkling of these conventions, invent their own weird solutions -- people as heads on towering legs with too many toes, cars that look like falling toast. At five and six and seven, they shape themselves more toward the generic. Kate's swimmer is generic, but her dolphin is wild and long -- and are those hills or waves or rainbows in the background? Davy's houses look standard, but the grass is sunflower tall, the chimneys jut precariously sideways, his angel's wings are small, and he hasn't figured out how to draw conventional nighttime stars.

Preschoolers and early elementary schoolers show more individuality in their art. It dances barefoot across your expectations. Their lines reflect distinctive aesthetic attempts. This distinctiveness is harder to discover in the more conventional art of later childhood and needs to be rediscovered later. Similarly for grandma, if she hasn't consumed too much Bob Ross. If her seascapes are generic, in one sense they are more competent and less "bad" than untrained attempts, but they have less point and are less valuable than a heartfelt effort that finds a different solution.

Bad art manifests the raw signature of the individual eye. It shows a mind grappling with an aesthetic challenge. If the artist judges it a failure and crosses it out, then their vision hasn't been realized. But if it is beloved in its strangeness -- if the creator affirms it as a successful completion of their artistic intention, then it's a distinctive achievement that reflects the mind and hand of the moment.

Our planet -- amazingly, awesomely, wondrously, beautifully, stunningly (to any aliens who might happen upon it amid the dark blandness of space) -- hosts five-year-olds who draw bugs on the moon and six-year-olds who draw impossibly long dolphins, teenagers doodling on cars, friends collaborating on goofy songs. If no one else would have done it the same way, then the work reflects your distinctive aesthetic encounter with the world. It's a piece of you made visible. Especially (but not only) for those who care about you, it's your individual eye, voice, and values that ignite its meaning.

Bad art can fail in two ways: When it's so generic that the artist vanishes or when the artist disowns it as failing to capture their aesthetic vision. If it passes the sibling tests of distinctiveness and affirmation, it is valuable.

A world devoid of weird, wild, uneven, wonderful artistic flailing would be a lesser world. Let a thousand lopsided flowers bloom!

Thursday, May 08, 2025

Everything Is Sandcastles

Yesterday, Rivka Weinberg spoke at UCR from her forthcoming book, The Meaning of It All, on how time erodes meaning. As is often noted, in a thousand years it will (probably) be as though you had never lived. Everything you strived for will have crumbled to dust. Weinberg doesn't argue that this renders our efforts entirely meaningless -- but it does deprive them of a meaning they would have had, if they had endured. We ought to admit, she says, that this is disheartening, rather than brushing it off with a breezy recommendation to "live in the moment".

Weinberg carves out an exception to time's corrosive power: what she calls atelic goods (drawing on Kieran Setiya's work on the "midlife crisis"). Atelic goods are complete in the moment: strolling through the woods, enjoying a sunset, licking an ice cream cone. Contrast these with telic goods, which aim toward an endpoint: walking to the store, taking the perfect sunset photo, finishing the cone.

In her talk, Weinberg argued that time drained meaning from telic goods -- not entirely, but substantially -- while leaving atelic goods mostly untouched. Yet she cautioned against retreating wholly into atelic pleasures. A life composed only of strolls and sunsets would be vapid. Telic goods, like building a career and cultivating long-term relationships, are essential to a full life.

But during the discussion period, Weinberg introduced the idea of sandcastles as an interesting middle case. (I don't recall this in the talk itself, but it moved fast and I haven't seen a written version.) Building a sandcastle is telic: It unfolds over time and can be interrupted before completion. But it's also ephemeral. Nothing is lost if the sandcastle is gone tomorrow. It was never meant to last, any more than an ice cream cone.

Maybe everything is sandcastles.

Weinberg gave examples of paradigmatic telic goods whose meanings are ravaged by time: Martin Luther King's activism, Jonas Salk's work on the polio vaccine. In a thousand years -- or ten thousand, almost certainly a billion -- it will be as if King and Salk had never existed. But should King have felt disappointed that his activism wouldn't ripple through deep time? Maybe not. Maybe he should have regarded it as a sandcastle: designed for a particular time, not reduced in meaning because it didn't endure forever.

When I raised this during Q&A, I didn't fully grasp Weinberg's reply. The sandcastle example is hers, so I might not be doing her view full justice -- but let me run with the idea.

If we think of all of our projects as sandcastle building, then they aren't necessarily ravaged by time. Of course, many will be wiped away too early. The waves will sweep in before your castle is complete or while you were still relishing its beauty. A rude stranger might trample it. Maybe almost every truly important project loses its impact before we're ready. But that's not an inevitability built into the structure of telic meaning and the nature of time. It's a contingent fact about the fragile, unstable nature of our chosen projects in a risky world.

Maybe, by shaping our intentions differently, or thinking about our projects differently, we reduce their vulnerability. Suppose I build a sandcastle knowing there's a 50% chance it will be swept away before I finish -- and thus, perhaps, not intending to finish but intending only to get as far as I can. If the wave comes early, I can still be disappointed -- but the wave no longer robs the act of its intended meaning. I did, in fact, get as far as I could. And if I build right at the water's edge, knowing there's a 90% chance I won't complete the castle's final envisioned tower, then finishing is a delightful surprise: a bonus meaning, so to speak, beyond my expectation. If brevity is the default intention and expectation, then the collapse of my castles does not deprive my actions of their expected or intended meaning, while unlikely endurance adds meaning relative to base line.

Could we adopt the same attitude to our relationships and careers? The waves of life could sweep them away any day. A realistic sense of hazard might be folded into the intention itself. I intend to start a marriage and nurture it -- not with the expectation that we will still be happily together at eighty, but with the hope that we might. If we make it, wonderful! Like a sandcastle surviving high tide. If it happens, I'm surprised and delighted, and I'll do what I can for that. Similarly, I intend to begin a career and pursue it. If the wave comes, well, the plan was always only to build toward something that I knew from the start would sooner or later be taken by the surf.

There will still be grief and regret. Things rarely go as well as they might have gone. But if I fully embrace this mindset (let's be honest: I can't), my projects won't have less meaning than intended, even if the waves take them sooner than I would have liked.

[remember this meme from 2007?]

Friday, May 02, 2025

When Is a Theory Superficial?

by Jeremy Pober and Eric Schwitzgebel

Twelve years ago, one of us (ES) distinguished two kinds of theories: superficial and deep. Nearly any phenomenon can be approached in a superficial or deep manner. A superficial judge of human beauty treats it as skin deep. A superficial reading of Shakespeare takes characters at their word and focuses on the obvious aspects of each scene. A superficial housecleaning ignores the backsides and undersides of household items.

And of course one can have a superficial theory of belief. Phenomenal dispositionalism is intended to be such a theory. According to phenomenal dispositionalism, whether someone believes that P is a matter of whether they have certain behavioral, phenomenal (i.e., experiential), and cognitive dispositions, specifically, the dispositions that are "stereotypical" of a person who believes that P. Compare: To be an extravert just is to have the behavioral, phenomenal, and cognitive dispositions stereotypical of extraversion.

Superficial theories contrast with deep theories. Among theories of belief, the main contrast has been with the computationalist, representationalist functionalism made famous by Jerry Fodor (1987) and recently defended by Jake Quilty-Dunn and Eric Mandelbaum.

But what makes a theory of some property P superficial (or deep)? Twelve years ago, ES offered an answer: It depends on the theory's relationship to surface properties. Surface properties are observable features of a phenomenon that a theory of P is designed to explain (in a loose sense of "observable"[1]).

What relation to surface properties must a theory have to be superficial or deep? Back in 2013, ES said that "relative to a class of surface phenomena... a property is superficial if it identifies possession of the property simply with patterns in the surface phenomena" (2013, 77). And a theory is deep "relative to a class of surface phenomena... if it identifies possession of the property with some feature other than patterns in those same surface phenomena -- some feature that presumably explains or causes or underwrites those surface patterns" (ibid.).

This definition fits our toy examples above. A superficial judge of beauty relies on the most easily observable physical patterns, a superficial reading of Shakespeare focuses on surface-level dialogue, and a superficial house-cleaning treats looking clean as clean.

However, we have reason to be unsatisfied with this definition. [ES thanks JP for emphasizing this point in a series of discussions.]

Consider poison, a "causal concept" in David Armstrong (1968)'s sense: a concept defined by its causes and/or effects. Poison can be defined in terms of biologically harming a person when ingested (with refinements to differentiate poisoning from, say, drinking lava).[2] If I explain a death by saying that a person was poisoned, you can infer that the death was caused by ingestion rather than, say, hypothermia. That's informative -- but much less informative than saying that the person ingested cyanide, because chemical types like cyanide are defined structurally, allowing detailed explanations of how they interact with human physiology.

A theory of health that only has non-structural causal concepts like "poison" (or "medicine") would be a superficial theory of health. A deep theory, in contrast, invokes underlying mechanisms.

Yet, by ES's 2013 definition, a theory appealing to poison wouldn't count as superficial, because ingesting poison isn't merely related to death as two parts of a superficial pattern. Poison causes death.[3]

In a new draft, ES proposes a revised definition: a theory of property P is superficial if "whether an entity has property [P] is determined (that is, constituted or grounded...) entirely by superficial facts about that entity", where superficial facts are readily observed facts. For causal concepts, being the cause of is a constitutive relationship. This new definition thus accommodates causal superficialism, where poisons cause death and medicines cause recoveries, as inferable from readily observable relationships (such as randomized controlled trials), without appeal to deeper structural features.

That's a good thing! Otherwise, phenomenal dispositionalism only counts as a superficial theory of belief if dispositions don't cause their manifestations. Some philosophers of mind (e.g., Ryle 1949) indeed view dispositions non-causally. But others, like Armstrong (1968), propose a "realist" conception: Dispositions are type-identical to their causal bases. Fragility, for example, is identified with the microstructural features that cause fragile objects to break when struck.[4]

In his original articulation of phenomenal dispositionalism, ES expressed willingness to accept such a realist view (2002, 273n18). This version of dispositionalism can be considered equivalent to a version of functionalism (which holds that mental states can be defined in terms of their causal relations to inputs, outputs, and other mental states). Georges Rey (1997) calls this type of functionalism superficial functionalism, where all functional/causal roles are defined only in relation to behavior, thought, experience, and "similar" states (e.g., desire is similar to belief, so a superficial functionalist theory of belief can include relations to desires).[5]

Of course, deep theories also often employ causal explanations. So if causal superficial theories are possible, what distinguishes them from deep theories? The answer is that causal posits in superficial theories have minimal explanatory content, whereas deep theories have excess explanatory content.[6] Posits with minimal explanatory content explain all that they were posited to explain and no more, whereas posits with excess content make further falsifiable predictions.

Consider the difference between a geneticist working right after Gregor Mendel published his work on heritability, and one working after Franklin, Watson, and Crick had mapped the structure of DNA and demonstrated how it instantiated genetic material. Mendel's theory, which gives us the posits of trait, gene, allele, and dominant/recessive, is a powerful theory (much like belief/desire psychology), but it doesn't explain how genes and alleles have the properties that they do. An allele is just the genetic material for a variant in phenotype, e.g., blood type A versus B or O. But in the initial Mendelian framework, it was defined as "whatever is responsible for variance in (e.g.) blood type".

[illustration of Mendel's superficial causal theory; image source]

Contrast with someone working in the latter half of the 20th century. They know that genetic information is realized in DNA (& RNA), which via its repeating base patterns and double helix structure, acts as a base code for the information that constitutes alleles. In other words, they know how genes carry genetic information.[7]

Superficial theories needn't be acausal, but if they posit causal relationships, those relationships must exist among the readily observable features, without invoking hidden structures or mechanisms that yield additional explanatory content. In contrast, the later 20th century theory makes many more falsifiable predictions -- those that follow from the structure of DNA -- and thus has excess explanatory content.

--------------------------------------------

[1] This might not match the sense of "observable" sometimes used in philosophy of science. Dennett (1994) defines observable from his perspective of "urbane verificationism" and, for a theory of attitudes, takes the same list of surface properties to be observable as ES: behavior, thought, and experience.

[2] More precisely, poison is always a two-place predicate, poison-for-S where S is some group of organisms such as a species. When no such group is specified, we can treat instances of poison as poison-for-humans. We are ignoring contact poisons and other complications.

[3] Thus the distinction between superficial and deep theories is not a distinction about noncausal versus causal explanations. Consequently, the superficial/deep distinction as applied to the attitudes does not end up reducing to Devin Curry's distinction between beliefs as properties of persons and beliefs as "cogs" of cognitive science (Curry 2021).

[4] The standard way of defining a causal basis is in terms of physical properties, such as microstructural properties defining "fragility". However this is not a strict requirement. One can posit a mental kind (as in Quilty-Dunn and Mandelbaum 2018 where representations are the causal bases of dispositions constitutive of belief stereotypes) or even a higher-order kind (as in Prior, Pargetter, and Jackson 1982).

[5] Rey (1994; 1997) invokes this term in a debate with Dan Dennett that parallels the debate between ES and Quilty-Dunn and Mandelbaum. While the overall debate turns on different issues, the definition of superficialist theories of belief lines up. Examples of this sort of functionalism plausibly include David Armstrong (1968), the David Lewis of "An Argument for the Identity Theory" (1966) but maybe not the David Lewis of "Mad Pain and Martin Pain" (1980), and Adam Pautz 2021).

[6] Term adopted from Lakatos's (1968) notion of "excess" explanatory content.

[7] The DNA example also lets us talk about different levels or degrees of depth. The late 20th century theory of a gene is a deep one, but so is a theory mid-way between that and Mendel's. In the first years of the 20th century scientists identified chromosomes as the realizer of genes, but did not know that chromosomes were made of DNA (they thought they were proteins). This theory too is deep -- there are excess predictions made by the assignment of genetic material to chromosomes -- but not as deep as later views, because not nearly as many excess predictions were made. We can tentatively call such a theory formally deep, whereas a theory that more fully explains how the posit in question (genes, beliefs) has the properties that it does is substantively deep.