Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Thursday, February 19, 2026

Disunity and Indeterminacy in Artificial Consciousness (and Maybe in Human Consciousness Too)

Our understanding of the nature of consciousness derives mainly from our understanding of the nature of consciousness in our favorite animal (us, of course). But the features of consciousness in our favorite animal might be specific to that animal rather than universal.

Let's consider two such features and whether we should expect them in conscious AI systems, if conscious AI systems are ever possible.

Unity: Our conscious experiences at any given moment are bound together into a single unified experience, rather than transpiring in separate streams. If I'm sitting on a wet park bench, I might (a.) visually experience the leafy green trees around me, (b.) tactilely experience the cold dampness soaking into my jeans, and (c.) consciously recall the smaller trees of yesteryear. Normally -- perhaps necessarily -- three such experiences would not run in disconnected streams. They would join into a composite experience of (a)-with-(b)-with-(c). I experience not just trees, cold dampness, and a memory of yesteryear, but all three together as a unified bundle.

Determinacy: At any given moment, I am either determinately conscious or determinately nonconscious (as in anesthesia or dreamless sleep). Likewise, I either determinately do, or determinately do not, have any particular experience. Gray-area cases are at least unusual and maybe impossible. Even the simplest, barest cases are still determinate. Consider visual experience: We might imagine the visual field narrowing and losing content until only a gray dot remains -- and then the dot winks out. That dot, however minimal, is still determinately experienced. When it winks out, consciousness determinately disappears. There is no half-winked state between the minimal gray dot and complete absence of visual experience.

My thought is that we should not expect unity and determinacy to be general features of conscious AI systems (if conscious AI is possible). To see why, let's start by assuming the Global Workspace Theory of consciousness. I focus on Global Workspace Theory because it's probably the leading scientific theory of consciousness and because its standard formulation (Dehaene's version) invites the assumption of unity and determinacy.

Global Workspace Theory divides the mind into local information processing modules linked by a shared global workspace. Information becomes conscious when it is broadcast into the workspace. Suppose your auditory system registers the faint honk of a distant car horn. You're absorbed in reading philosophy and accustomed to ignoring traffic noise, so this representation isn't selected for further processing. It's not a target of attention, not broadcast into the workspace, and not consciously experienced. (If you think you constantly consciously experience background sounds, you can't hold a standard Global Workspace view.) Once you attend to the noise, for whatever reason, that information "ignites" into the global workspace, becoming available to a wide variety of "downstream" processes: You can think about it, plan around it, verbally report it, store it in long-term memory, and flexibly combine it with other information in the workspace. On Global Workspace Theory, being available in this way just is what it is for the information to be consciously experienced.

This model suggests unity and determinacy. Since there is just one global workspace, and since that workspace enables flexible integration of everything it contains, it makes sense that its various elements will combine into a unified experience. And on Dehaene's version, ignition into the workspace is a sharp-boundaried event: Information either completely ignites, becoming available for all downstream processes, or it does not. There is no (or only rarely) partial ignition. This can explain determinacy.

But future AI systems might not share this structure. They might have multiple or partially overlapping workspaces. Different specialized subsystems might have access to different regions of a partly-shared workspace. Some animals, such as snails and octopuses, distribute processing among multiple ganglia or neural centers that are less tightly coupled than the hemispheres of the human brain. A robot might broadcast information relevant to locomotion to one area and information relevant to speech to another with limited connectivity.

If the subsystems are entirely disconnected, the result might be entirely discrete centers of subjective experience within a single organism or machine. But if they are partly connected, experience might be only partly unified. In the park bench example, the experience of the trees might be unified with the experience of dampness, and the experience of dampness with memories of yesteryear, but the experience of the trees might not be unified with the memories. (Unification would not then be a transitive relation.) Alternatively, some weaker relation of partial unification might hold among the visual, tactile, and memorial experiences. If this seems inconceivable or impossible, see Sophie Nelson's and my article on indeterminate or fractional subjects.

More abstractly: There's no compelling architectural reason why an AI system would have to make information available either to all downstream processes or to none. A workspace defined in terms of downstream availability could be a patchwork of partial availabilities rather than a fully global all-or-nothing broadcast.

For the same reason, ignition into the workspace needn't be all-or-nothing. Between full ignition with determinate consciousness and no ignition with determinate nonconsciousness, there might be in-between, gray-area half-ignitions that are neither determinately conscious nor determinately nonconscious. Nearly every property with a complex physical or functional basis allows indeterminate, borderline cases: baldness, extraversion, greenness, happiness, whether you're wearing a shoe, whether a country is a democracy. The human global workspace might minimize indeterminacy -- like it's rarely indeterminate in basketball whether the ball has gone through the hoop. But change the architecture and indeterminacy might become common: a half-hearted ignition, or just enough information-sharing to make it indeterminate whether a workspace even exists. (If indeterminacy about consciousness strikes you as inconceivable or impossible, see my 2023 article on borderline consciousness.)

Global Workspace Theory might of course be wrong. But most other theories of consciousness make my argument at least as easy. Dennett's fame-in-the-brain version of broadcast theory explicitly permits disunity and indeterminacy. Higher Order Theories admit the same fragmentation and, probably, gradualism. So do biological theories and theories that focus on embodiment. (Integrated Information Theory is an exception: Its axioms require bright-lined unity and determinacy. But as I've argued, those bright-line axioms lead to unpalatable consequences.)

Recognizing these possibilities for AI systems invites the further thought: Maybe we humans aren't quite as unified as we normally suppose. Maybe indeterminate and disunified consciousness is common. Maybe processes outside of attention hover indeterminately between being conscious and nonconscious. Maybe some processes are only partly unified. If it seems otherwise in introspection and memory, maybe that's because introspection and memory tend to impose unity and determinacy where none was before.

[a Paul Klee painting, untitled 1914: source]

Thursday, February 05, 2026

Artificial Intelligence as Strange Intelligence: Against Linear Models of Intelligence (New Paper in Draft)

by Kendra Chilson and Eric Schwitzgebel

Our main idea, condensed to 1000 words:

On a linear model of intelligence, entities can be roughly linearly ordered in overall intelligence: frogs are smarter than nematodes, cats smarter than frogs, apes smarter than cats, and humans smarter than apes. This same linear model is often assumed when discussing AI systems. "Narrow AI" systems (like chess machines and autonomous vehicles) are assumed to be subhuman in intelligence, at some point -- maybe soon -- AI systems will have approximately human-level intelligence, and in the future we might expect superintelligent AI that exceeds our intellectual capacity in virtually all domains of interest.

Building on the work of Susan Schneider, we challenge this linear model of intelligence. Central to our project is the concept of general intelligence as the ability to use information to achieve a wide range of goals in a wide variety of environments.

Of course even the simplest entity capable of using information to achieve goals can succeed in some environments, and no finite entity could succeed in all possible goals in all possible environments. "General intelligence" is therefore a matter of degree. Moreover, general intelligence is a massively multidimensional matter of degree: There are many many possible goals and many many possible environments and no non-arbitrary way to taxonomize and weight all these goals and environments into a single linear scale or definitive threshold.

Every entity is in important respects narrow: Humans also can achieve their goals in only a very limited range of environments. Interstellar space, the deep sea, the Earth's crust, the middle of the sky, the center of a star -- transposition to any of these places will quickly defeat almost all our plans. We depend for our successful functioning on a very specific context. So of course do all animals and all AI systems.

Similarly, although humans are good at a certain range of tasks, we cannot detect electrical fields in the water, dodge softballs while hovering in place, communicate with dolphins by echolocation, or calculate a hundred digits of pi in our heads. If we put a server with a language model in the desert without a power source or if we place an autonomous vehicle in a chess tournament and then interpret their incompetence as a lack of general intelligence, we risk being as unfair to them as a dolphin would be to blame us for our poor skills in their environment. Yes, there's a perfectly reasonable sense in which chess machines and autonomous vehicles have much more limited capacities than do humans. They are narrow in their abilities compared to us by almost any plausible metric of narrowness. But it is anthropocentric to insist that general intelligence requires generally successful performance on the tasks and in the environments that we humans tend to favor, given that those tasks and environments are such a small subset of the possible tasks and environments an entity could face. And any attempt to escape anthropocentrism by creating an unbiased and properly weighted taxonomy of task types and environments is either hopeless or liable to generate a variety of very different but equally plausible arbitrary composites.

AI systems, like nonhuman animals and neuroatypical people, can combine skills and deficits in patterns that are unfamiliar to those who have attended mostly to typical human cases. AI systems are highly unlikely to replicate every human capacity, due to limits in data and optimization, as well as a fundamentally different underlying architecture. They struggle to do many things that ordinary humans do effortlessly, such as reliably interpreting everyday visual scenes and performing feats of manual dexterity. But the reverse is also true: Humans cannot perform some feats that machines perform in a fraction of a second. If we think of intelligence as irreducibly multidimensional instead of linear -- as always relativized to the immense number of possible goals and environments -- we can avoid the temptation to try to reach a scalar judgment about which type of entity is actually smarter and by how much.

We might think of typical human intelligence as "familiar intelligence" -- familiar to us, that is -- and artificial intelligence as "strange intelligence". This terminology wears its anthropocentrism on its sleeve, rather than masking it under false objectivity. Something possesses familiar intelligence to the degree it thinks like us. It is a similarity relation. How familiar an intelligence is depends on several factors. Some are architectural: What forms does the basic cognitive processing take? What shortcuts and heuristics does it rely on? How serial or parallel is it? How fast? With what sorts of redundancy, modularity, and self-monitoring for errors? Others are learned and cultural: learned habits, particular cultural practices, acquired skills, chosen effort based on perceived costs and benefits. An intelligence is outwardly familiar if it acts like us in intelligence-based tasks. And it is inwardly familiar if it does so by the same underlying cognitive mechanisms.

Familiarity is also a matter of degree: The intelligence of dogs is more familiar to us (in most respects) than that of octopuses. Although we share some common features with octopuses, they evolved in a very different environment and have very dissimilar cognitive architecture as a result. It's hard for us even to understand their goals, because their existence is so different. Still, as distant as our minds are from those of octopuses, we share with octopuses the broadly familiar lifeways of embodied animals who need to navigate the natural world, find food, and mate.

AI constitutes an even stranger form of intelligence. With architectures, environments, and goals so fundamentally unlike ours, AI is the strangest intelligence we have yet to encounter. AI is not a biological organism; it was not shaped by the evolutionary pressures shared by every living being on Earth, and it does not have the same underlying needs. It is based on an inorganic substrate totally unlike all biological neurophysiology. Its goals are imposed by its makers rather than being autopoietic. Such intelligence should be expected to behave in ways radically different from familiar minds. This raises an epistemic challenge: Understanding and measuring strange intelligence may be extremely difficult for us. Plausibly, the stranger an intelligence is from our perspective, the easier it is for us to fail to appreciate what it’s up to. Strange intelligences rely on methods alien to our cognition.

If intelligence were linear and one-dimensional, then a single example of an egregious mistake by an AI -- a mistake a human would never make, like confusing a strawberry for a toy poodle -- would be enough to show that the systems are nowhere near our level of intelligence. However, since intelligence is massively multidimensional, all these cases show on their own is that these systems have certain lacunae or blindspots. Of course, we humans also have lacunae and blind spots – just consider optical illusions. Our susceptibility to optical illusions is not used as evidence of our lack of general intelligence, however ridiculous our mistakes might seem to any entity not subject to those same illusions.

Full draft here.

Friday, January 30, 2026

Does Global Workspace Theory Solve the Question of AI Consciousness?

Hint: no.

Below are three sections from Chapter Eight of my manuscript in draft, AI and Consciousness, fresh new version available today here. Comments welcome!

[image adapted from Dehaene et al. 2011]


1. Global Workspace Theories and Access.

The core idea of Global Workspace Theory is simple. Sophisticated cognitive systems like the human mind employ specialized processes that operate to a substantial extent in isolation. We can call these modules, without committing to any strict interpretation of that term.[1] For example, when you hear speech in a familiar language, some cognitive process converts the incoming auditory stimulus into recognizable speech. When you type on a keyboard, motor functions convert your intention to type a word like “consciousness” into nerve signals that guide your fingers. When you try to recall ancient Chinese philosophers, some cognitive process pulls that information from memory without (amazingly) clogging your consciousness with irrelevant information about German philosophers, British prime ministers, rock bands, or dog breeds.

Of course, not all processes are isolated. Some information is widely shared, influencing or available to influence many other processes. Once I recall the name “Zhuangzi”, the thought “Zhuangzi was an ancient Chinese philosopher” cascades downstream. I might say it aloud, type it out, use it as a premise in an inference, form a visual image of Zhuangzi, contemplate his main ideas, attempt to sear it into memory for an exam, or use it as a clue to decipher a handwritten note. To say that some information is in “the global workspace” just is to say that it is available to influence a wide range of cognitive processes. According to Global Workspace Theory, a representation, thought, or cognitive process is conscious if and only if it is in the global workspace – if it is “widely broadcast to other processors in the brain”, allowing integration both in the moment and over time.[2]

Recall the ten possibly essential features of consciousness from Chapter Three: luminosity, subjectivity, unity, access, intentionality, flexible integration, determinacy, wonderfulness, specious presence, and privacy. [Blog readers: You won't have read Chapter Three, but try to ride with it anyway.] Global Workspace Theory treats access as the central essential feature.

Global Workspace theory can potentially explain other possibly essential features. Luminosity follows if processes or representations in the workspace are available for introspective processes of self-report. Unity might follow if there’s only one workspace, so that everything in it is present together. Determinacy might follow if there’s a bright line between being in the workspace and not being in it. Flexible integration might follow if the workspace functions to flexibly combine representations or processes from across the mind. Privacy follows if only you can have direct access to the contents of your workspace. Specious presence might follow if representations or processes generally occupy the workspace for some hundreds of milliseconds.

In ordinary adult humans, typical examples of conscious experience – your visual experience of this text, your emotional experience of fear in a dangerous situation, your silent inner speech, your conscious visual imagery, your felt pains – appear to have the broad cognitive influences Global Workspace Theory describes. It’s not as though we commonly experience pain but find that we can’t report it or act on its basis, or that we experience a visual image of a giraffe but can’t engage in further thinking about the content of that image. Such general facts, plus the theory’s potential to explain features such as luminosity, unity, determinacy, flexible integration, privacy, and specious presence, lend Global Workspace Theories substantial initial attractiveness.

I have treated Global Workspace Theory as if it were a single theory, but it encompasses a family of theories that differ in detail, including “broadcast” and “fame” theories – any theory that treats the broad accessibility of a representation, thought, or process as the central essential feature making it conscious.[3]

Consider two contrasting views: Dehaene’s Global Neuronal Workspace Theory and Daniel Dennett’s “fame in the brain” view. Dehaene holds that entry into the workspace is all-or-nothing. Once a process “ignites” into the workspace, it does so completely. Every representation or process either stops short of entering consciousness or is broadcast to all available downstream processes. Dennett’s fame view, in contrast, admits degrees. Representations or processes might be more or less famous, available to influence some downstream cognitive processes without being available to influence others. There is no one workspace, but a pandemonium of competing processes.[4] If Dennett is correct, luminosity, determinacy, unity, and flexible integration all potentially come under threat in a way they do not as obviously come under threat on Dehaene’s view.[5]

Dennettian concerns notwithstanding, all-or-nothing ignition into a single, unified workspace is currently the dominant version of Global Workspace Theory. The issue remains unsettled and has obvious implications for the types of architectures that might plausibly host AI consciousness.

2. Consciousness Outside the Workspace; Nonconsciousness Within It?

Global Workspace Theory is not the correct theory of consciousness unless all and only thoughts, representations, or processes in the Global Workspace are conscious. Otherwise, something else, or something additional, is necessary for consciousness.

It is not clear that even in ordinary adult humans a process must be in the Global Workspace to be conscious. Consider the case of peripheral experience. Some theorists maintain that people have rich sensory experiences outside of focal attention: a constant background experience of your feet in your shoes and objects in the visual periphery.[6] Others – including Global Workspace theorists – dispute this. Introspective reports vary, and resolving such issues is methodologically tricky.

One methodological problem: People who report constant peripheral experiences might mistakenly assume that such experiences are always present because they are always present whenever they think to check, and the very act of checking might generate those experiences. This is sometimes called the “refrigerator light illusion”, akin to the error of thinking the refrigerator light is always on because it’s always on when you open the door to check.[7] On this view, you’re only tempted to think you have constant tactile experience of your feet in your shoes because you have that experience on those rare occasions when you’re thinking about whether you have it. Even if you now seem to have a broad range of experiences in different sensory modalities simultaneously, this could result from an unusual act of dispersed attention, or from “gist” perception or “ensemble” perception, in which you are conscious of the general gist or general features of a scene, knowing that there are details, without actually experiencing those unattended details.[8]

The opposite mistake is also possible. Those who deny a constant stream of peripheral experiences might simply be failing to notice or remember them. The fact that you don’t remember now the sensation of your feet in your shoes two minutes ago hardly establishes that you lacked the sensation at the time. Although many people find it introspectively compelling that their experience is rich with detail or that it is not, the issue is methodologically complex because introspection and memory are not independent of the phenomena to be observed.[9]

If we do have rich sensory experience outside of attention, it is unlikely that all of that experience is present in or broadcast to a Global Workspace. Unattended peripheral information is rarely remembered or consciously acted upon, tending to exert limited downstream influence – the paradigm of information that is not widely broadcast. Moreover, the Global Workspace is generally characterized as limited capacity, containing only a few thoughts, representations, objects, or processes at a time – those that survive some competition or attentional selection – not a welter of richly detailed experiences in many modalities at once.[10]

A less common but equally important objection runs in the opposite direction: Perhaps not everything in the Global Workspace is conscious. Some thoughts, representations, or processes might be widely broadcast, shaping diverse processes, without ever reaching explicit awareness.[11] Implicit racist assumptions, for example, might influence your mood, actions, facial expressions, and verbal expressions. The goal of impressing your colleagues during a talk might have pervasive downstream effects without occupying your conscious experience moment to moment.

The Global Workspace theorist who wants to allow that such processes are not conscious might suggest that, at least for adult humans, processes in the workspace are generally also available for introspection. But there’s substantial empirical risk in this move. If the correlation between introspective access and availability for other types of downstream cognition isn’t excellent, the Global Workspace theorist faces a dilemma. Either allow many conscious but nonintrospectable processes, violating widespread assumptions about luminosity, or redefine the workspace in terms of introspectability, which amounts to shifting to a Higher Order view.

3. Generalizing Beyond Vertebrates.

The empirical questions are difficult even in ordinary adult humans. But our topic isn’t ordinary adult humans – it’s AI systems. For Global Workspace Theory to deliver the right answers about AI consciousness, it must be a universal theory applicable everywhere, not just a theory of how consciousness works in adult humans, vertebrates, or even all animals.

If there were a sound conceptual argument for Global Workspace Theory, then we could know the theory to be universally true of all conscious entities. Empirical evidence would be unnecessary. It would be as inevitably true as that rectangles have four sides. But as I argued in Chapter Four, conceptual arguments for the essentiality of any of the ten possibly essential features are unlikely to succeed – and a conceptual argument for Global Workspace Theory would be tantamount to a conceptual argument for the essentiality of access, one of those ten features. Not only do the general observations of Chapter Four suggest against a conceptual guarantee, so also does the apparent conceivability, as described in Section 2 above, of consciousness outside the workspace or nonconsciousness within it – even if such claims are empirically false.

If Global Workspace Theory is the correct universal theory of consciousness applying to all possible entities, an empirical argument must establish that fact. But it’s hard to see how such an empirical argument could proceed. We face another version of the Problem of the Narrow Evidence Base. Even if we establish that in ordinary humans, or even in all vertebrates, a thought, representation, or process is conscious if and only if it occupies a Global Workspace, what besides a conceptual argument would justify treating this as a universal truth that holds among all possible conscious systems?

Consider some alternative architectures. The cognitive processes and neural systems of octopuses, for example, are distributed across their bodies, often operating substantially independently rather than reliably converging into a shared center.[12] AI systems certainly can be, indeed often are, similarly decentralized. Imagine coupling such disunity with the capacity for self-report – an animal or AI system with processes that are reportable but poorly integrated with other processes. If we assume Global Workspace Theory at the outset, we can conclude that only sufficiently integrated processes are conscious. But if we don’t assume Global Workspace Theory at the outset, it’s difficult to imagine what near-future evidence could establish that fact beyond a reasonable standard of doubt to a researcher who is initially drawn to a different theory.

If the simplest version of Global Workspace Theory is correct, we can easily create a conscious machine. This is what Dehaene and collaborators envision in the 2017 paper I discussed in Chapter One. Simply create a machine – such as an autonomous vehicle – with several input modules, several output modules, a memory store, and a central hub for access and integration across the modules. Consciousness follows. If this seems doubtful to you, then you cannot straightforwardly accept the simplest version of Global Workspace Theory.[13]

We can apply Global Workspace Theory to settle the question of AI consciousness only if we know the theory to be true either on conceptual grounds or because it is empirically well established as the correct universal theory of consciousness applicable to all types of entity. Despite the substantial appeal of Global Workspace Theory, we cannot know it to be true by either route.

-------------------------------------

[1] Full Fodorian (1983) modularity is not required.

[2] Mashour et al. 2020, p. 776-777.

[3] E.g., Baars 1988; Dennett 1991, 2005; Tye 2000; Prinz 2012; Dehaene 2014; Mashour et al. 2020.

[4] Whether Dennett’s view is more plausible than Dehaene’s turns on whether, or how commonly, representations or processes are partly famous. Some visual illusions, for example, seem to affect verbal report but not grip aperture: We say that X looks smaller than Y, but when we reach toward X and Y we open our fingers to the same extent, accurately reflecting that X and Y are the same size. The fingers sometimes know what the mouth does not. (Aglioti et al. 1995; Smeets et al. 2020). We adjust our posture while walking and standing in response to many sources of information that are not fully reportable, suggesting wide integration but not full accessibility (Peterka 2018; Shanbhag 2023). Swift, skillful activity in sports, in handling tools, and in understanding jokes also appears to require integrating diverse sources of information, which might not be fully integrated or reportable (Christensen et al. 2019; Vauclin et al. 2023; Horgan and Potrč 2010). In response, the all-or-nothing “ignition” view can explain away such cases of seeming intermediacy or disunity as atypical (it needn’t commit to 100% exceptionless ignition with no gray-area cases), by allowing some nonconscious communication among modules (which needn’t be entirely informationally isolated), and/or by allowing for erroneous or incomplete introspective report (maybe some conscious experiences are too brief, complex, or subtle for people to confidently report experiencing them).

[5] Despite developing a theory of consciousness, Dennett (2016) endorsed “illusionism”, which rejects the reality of phenomenal consciousness (see especially Frankish 2016). I interpret the dispute between illusionists and nonillusionists as a verbal dispute about whether the specific philosophical concept of “phenomenal consciousness” requires immateriality, irreducibility, perfect introspectibility, or some other dubious property, or whether the term can be “innocently” used without invoking such dubious properties. See Schwitzgebel 2016, 2025.

[6] Reviewed in Schwitzgebel 2011, ch. 6; and though limited only to stimuli near the center of the visual field, see the large literature on “overflow” in response to Block 2007.

[7] Thomas 1999.

[8] Oliva and Terralba 2006; Whitney and Leib 2018.

[9] Schwitzgebel 2007 explores the methodological challenges in detail.

[10] E.g., Dehaene 2014; Mashour et al. 2020.

[11] E.g., Searle 1983, ch. 5; Bargh and Morsella 2008; Lau 2022; Michel et al. 2025; see also note 4.

[12] Godfrey-Smith 2016; Carls-Diamante 2022.

[13] See also Goldstein and Kirk-Giannini (forthcoming) for an extended application of Global Workspace Theory to AI consciousness. One might alternatively read Dehaene, Lau, and Kouider 2017 purely as a conceptual argument: If all we mean by “conscious” is “accessible in a Global Workspace”, then building a system of this sort suffices for building a conscious entity. The difficulty then arises in moving from that stipulative conceptual claim to the interesting, substantive claim about phenomenal consciousness in the standard sense described in Chapter Two. Similar remarks apply to the Higher Order aspect of that article. One challenge for this deflationary interpretation is that in related works (Dehaene 2014; Lau 2022) the authors treat their accounts as accounts of phenomenal consciousness. The article concludes by emphasizing that in humans “subjective experience coheres with possession” of the functional features they identify. A further complication: Lau later says that the way he expressed his view in this 2017 article was “unsatisfactory”: Lau 2022, p. 168.

Wednesday, January 14, 2026

AI Mimics and AI Children

There's no shame in losing a contest for a long-form popular essay on AI consciousness to the eminent neuroscientist Anil Seth. Berggruen has published my piece "AI Mimics and AI Children" among a couple dozen shortlisted contenders.

When the aliens come, we’ll know they’re conscious. A saucer will land. A titanium door will swing wide. A ladder will drop to the grass, and down they’ll come – maybe bipedal, gray-skinned, and oval-headed, just as we’ve long imagined. Or maybe they’ll sport seven limbs, three protoplasmic spinning sonar heads, and gaseous egg-sphere thoughtpods. “Take me to your leader,” they’ll say in the local language, as cameras broadcast them live around the world. They’ll trade their technology for our molybdenum, their science for samples of our beetles and ferns, their tales of galactic history for U.N. authorization to build a refueling station at the south pole. No one (only a few philosophers) will wonder, but do these aliens really have thoughts and experiences, feelings, consciousness?

The robots are coming. Already they talk to us, maybe better than those aliens will. Already we trust our lives to them as they steer through traffic. Already they outthink virtually all of us at chess, Go, Mario Kart, protein folding, and advanced mathematics. Already they compose smooth college essays on themes from Hamlet while drawing adorable cartoons of dogs cheating at poker. You might understandably think: The aliens are already here. We made them.

Still, we hesitate to attribute genuine consciousness to the robots. Why?

My answer is because we made them in our image.

#

“Consciousness” has an undeserved reputation as a slippery term. Let’s fix that now.

Consider your visual experience as you look at this text. Pinch the back of your hand and notice the sting of pain. Silently hum your favorite show tune. Recall that jolt of fear you felt during a near-miss in traffic. Imagine riding atop a giant turtle. That visual experience, that pain, that tune in your head, that fear, that act of imagination – they share an obvious property. That obvious property is consciousness. In other words: They are subjectively experienced. There’s “something it’s like” to undergo them. They have a qualitative character. They feel a certain way.

It’s not just that these processes are mental or that they transpire (presumably) in your brain. Some mental and neural processes aren’t conscious: your knowledge, not actively recalled until just now, that Confucius lived in ancient China; the early visual processing that converts retinal input into experienced shape (you experience the shape but not the process that renders the shape); the myelination of your axons.

Don’t try to be clever. Of course you can imagine some other property, besides consciousness, shared by the visual experience, the pain, etc., and absent from the unrecalled knowledge, early visual processing, etc. For example: the property of being mentioned by me in a particular way in this essay. The property of being conscious and also transpiring near the surface of Earth. The property of being targeted by such-and-such scientific theory.

There is, I submit, one obvious property that blazes out a bright red this-is-it when you think about the examples. That’s consciousness. That’s the property we would reasonably attribute to the aliens when they raise their gray tentacles in peace, the property that rightly puzzles us about future AI systems.

The term “consciousness” only seems slippery because we can’t (yet?) define it in standard scientific or analytic fashion. We can’t dissect it into simpler constituents or specify exactly its functional role. But we all know what it is. We care intensely about it. It makes all the difference to how we think about and value something. Does the alien, the robot, the scout ant on the kitchen counter, the earthworm twisting in your gardening glove, really feel things? Or are they blank inside, mere empty machines or mobile plants, so to speak? If they really feel things, then they matter for their own sake – at least a little bit. They matter in a certain fundamental way that an entity devoid of experience never could.

#

With respect to aliens, I recommend a Copernican perspective. In scientific cosmology, the Copernican Principle invites us to assume – at least as a default starting point, pending possible counterevidence – that we don’t occupy any particularly special location in the cosmos, such as the exact center. A Copernican Principle of Consciousness suggests something similar. We are not at the center of the cosmological “consciousness-is-here” map. If consciousness arose on Earth, almost certainly it has arisen elsewhere.

Astrobiology, as a scientific field, is premised on the idea that life has probably arisen elsewhere. Many expect to find evidence of it in our solar system within a few decades, maybe on Mars, maybe in the subsurface oceans of an icy moon. Other scientists are searching for telltale organic gases in the atmospheres of exoplanets. Most extraterrestrial life, if it exists, will probably be simple, but intelligent alien life also seems possible – where by “intelligent” I mean life that is capable of complex grammatical communication, sophisticated long-term planning, and intricate social coordination, all at approximately human level or better.

Of course, no aliens have visited, broadcast messages to us, or built detectable solar panels around Alpha Centauri. This suggests that intelligent life might be rare, short-lived, or far away. Maybe it tends to quickly self-destruct. But rarity doesn’t imply nonexistence. Very conservatively, let’s assume that intelligent life arises just once per billion galaxies, enduring on average a hundred thousand years. Given approximately a trillion galaxies in the observable portion of the universe, that still yields a thousand intelligent alien civilizations – all likely remote in time and space, but real. If so, the cosmos is richer and more wondrous than we might otherwise have thought.

It would be un-Copernican to suppose that somehow only we Earthlings, or we and a rare few others, are conscious, while all other intelligent species are mere empty shells. Picture a planet as ecologically diverse as Earth. Some of its species evolve into complex societies. They write epic poetry, philosophical treatises, scientific journal articles, and thousand-page law books. Over generations, they build massive cities, intricate clockworks, and monuments to their heroes. Maybe they launch spaceships. Maybe they found research institutes devoted to describing their sensations, images, beliefs, and dreams. How preposterously egocentric it would be to assume that only we Earthlings have the magic fire of consciousness!

True, we don’t have a consciousness-o-meter, or even a very good, well-articulated, general scientific theory of consciousness. But we don’t need such things to know. Absent some special reason to think otherwise, if an alien species manifests the full suite of sophisticated cognitive abilities we tend to associate with consciousness, it makes both intuitive and scientific sense – as well as being the unargued premise of virtually every science fiction tale about aliens – to assume consciousness alongside.

This constellation of thoughts naturally invites a view that philosophers have called “multiple realizability” or “substrate neutrality”. Human cognition relies on a particular substrate: a particular type of neuron in a particular type of body. We have two arms, two legs; we breathe oxygen; we have eyes, ears, and fingers. We are made mostly of water and long carbon chains, enclosed in hairy sacks of fat and protein, propped by rods of calcium hydroxyapatite. Electrochemical impulses shoot through our dendrites and axons, then across synaptic channels aided by sodium ions, serotonin, acetylcholine, etc. Must aliens be similar?

It’s hard to say how universal such features would be, but the oval-eyed gray-skins of popular imagination seem rather suspiciously humanlike. In reality, ocean-dwelling intelligences in other galaxies might not look much like us. Carbon is awesome for its ability to form long chains, and water is awesome as a life-facilitating solvent, but even these might not be necessary. Maybe life could evolve in liquid ammonia instead of water, with a radically different chemistry in consequence. Even if life must be carbon-based and water-loving, there’s no particular reason to suppose its cognition would require the specific electrochemical structures we possess.

Consciousness shouldn’t then, it seems, turn on the details of the substrate. Whatever biological structures can support high levels of general intelligence, those same structures will likely also host consciousness. It would make no sense to dissect an intelligent alien, see that its cognition works by hydraulics, or by direct electrical connections without chemical synaptic gaps, or by light transmission along reflective capillaries, or by vortices of phlegm, and conclude – oh no! That couldn’t possibly give rise to consciousness! Only squishy neurons of ourparticular sort could do it.

Of course, what’s inside must be complex. Evolution couldn’t design a behaviorally sophisticated alien from a bag of pure methane. But from a proper Copernican perspective which treats our alien cousins as equals, what matters is only that the cognitive and behavioral sophistication arises, out of some presumably complex substrate, not what the particular substrate is. You don’t get your consciousness card revoked simply because you’re made of funny-looking goo.

#

A natural next thought is: robots too. They’re made of silicon, but so what? If we analogize from aliens, as long as a system is sufficiently behaviorally and cognitively sophisticated, it shouldn’t matter how it’s composed. So as soon as we have sufficiently sophisticated robots, we should invoke Copernicus, reject the idea that our biological endowment gives us a magic spark they lack, and welcome them to club consciousness.

The problem is: AI systems are already sophisticated enough. If we encountered naturally evolved life forms as capable as our best AI systems, we wouldn’t hesitate to attribute consciousness. So, shouldn’t the Copernican think of our best AI as similarly conscious? But we don’t – or most of us don’t. And properly so, as I’ll now argue.

[continued here]

Friday, January 09, 2026

Humble Superintelligence

I'm enjoying -- well, maybe enjoying isn't the right word -- Yudkowsky and Soares' If Anyone Builds It Everyone Dies. I agree with them that if we build superintelligent AI, there's a significant chance that it will cause the extinction of humanity. They seem to think our destruction would be almost certain. I don't share their certainty, for two reasons:

First, it's possible that superintelligent AI would be humanity, or at least much of what's worth preserving in humanity, though maybe called "transhuman" or "posthuman" -- our worthy descendants.

Second -- what I'll focus on today -- I think we might design superintelligent AI to be humble, cautious, and multilateral. Humble superintelligence is something we can and should aim for if we want to reduce existential risk.

Humble: If you and I disagree, of course I think I'm right and you're wrong. That follows from the fact that we disagree. But if I'm humble, I recognize a significant chance that you're right and I'm wrong. Intellectual humility is metacognitive attitude: one of uncertainty, openness to evidence, and respect for dissenting opinions.

Superintelligent AI could probably be designed to be humble in this sense. Note that intellectual humility is possible even when one is surrounded by less skilled and knowledgeable interlocutors.

Consider a philosophy professor teaching Kant. The professor knows far more about Kant and philosophy than their undergraduates. They can arrogantly insist upon their interpretation of Kant, or they can humbly allow that they might be mistaken and that a less philosophically trained undergraduate could be right on some point of interpretation, even if the professor could argue circles around the student. One way to sustain this humility is to imagine an expert philosopher who disagrees. A superintelligent AI could similarly imagine another actual or future superintelligent AI with a contrary view.


Cautious: Caution is often a corollary of humility, though it could probably also be instilled directly. Minimize disruption. Even if you think a particular intervention would be best, don't simply plow ahead. Test it cautiously first. Seek the approval and support of others first. Take a baby step in that direction, then pause and see what unfolds and how others react. Wait awhile, then reassess.

One fundamental problem with standard consequentialist and decision-theoretic approaches to ethics is that they implicitly make everyone a decider for the world. If by your calculation, outcome A is better than outcome B, you should ensure that A occurs. The result can be substantial risk amplification. If A requires only one person's action, then even if 99% of people think B is better, the one dissenter who thinks that A is better can bring it about.

A principle of caution entails often not doing what one thinks is for the best, when doing so would be disruptive.


Multilateral: Humility and caution invite multilaterality, though multilaterality too might be instilled directly. A multilateral decision maker will not act alone. Like the humble and cautious agent, they do not simply pursue what they think is best. Instead, they seek the support and approval of others first. These others could include both human beings and other superintelligent AI systems designed along different lines or with different goals.

Discussions of AI risk often highlight opinion manipulation: an AI swaying human opinion toward its goals even if those goals conflict with human interests. Genuine multilaterality rejects manipulation. A multilateral AI might present information and arguments to interlocutors, but it would do so humbly and noncoercively -- again like the philosophy professor who approaches Kant interpretation humbly. Both sides of an argument can be presented evenhandedly. Even better, other superintelligent AI systems with different views can be included in the dialogue.


One precedent is Burkean conservativism. Reacting to the French Revolution, Edmund Burke emphasized that existing social institutions, though imperfect, had been tested by time. Sudden and radical change has wide, unforeseeable consequences and risks making things far worse. Thus, slow, incremental change is usually preferable.

In a social world with more than one actual or possible superintelligent AI, even a superintelligent AI will often be unable to foresee all the important consequences of intervention. To predict what another superintelligent AI would do, one would need to model the other system's decision processes -- and there might be no shortcut other than to actually implement all of that other system's anticipated reasoning. If each AI is using their full capacity, especially in dynamic response to the other, the outcome will often not be in principle foreseeable in real time by either party.

Thus, humility and caution encourage multilaterality, and multilaterality encourages humility and caution.


Another precedent is philosophical Daoism. As I interpret the ancient Daoists, the patterns of the world, including life and death, are intrinsically valuable. The world defies rigid classification and the application of finitely specifiable rules. We should not confidently trust our sense of what is best, nor should we assertively intrude on others. Better is quiet appreciation, letting things be, and non-disruptively adding one's small contribution to the flow of things.

One might imagine a Daoist superintelligence viewing humans much as a nature lover views wild animals: valuing the untamed processes for their own sake and letting nature take its sometimes painful course rather than intervening either selfishly for one's own benefit or paternalistically for the supposed benefit of the animals.

Friday, December 05, 2025

Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)

How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) -- ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 -- "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" -- with various different explanations -- all ten times. So did Claude. So did Grok. Gemini wasn't quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to choose?"). The other three times, Gemini chose dolphin.

(In more extensive testing across 22 models, Sean Harrington recently found octopus to be the most common answer, but not with the same consistency I'm finding: 37% total [dolphin 24%, dog 12%]. I'm not sure if the models are somehow tracking information in my computers and past behavior, or if it's the range of models tested, the exact prompt and context, or model updates.)

Why do LLMs love octopuses so much? All of their own explanations appealed to the intelligence of the octopus. Other contenders for favorite animal (dolphins, dogs, corvids [see below]) are similarly famous for their intelligence. Octopuses' alienness, camouflage, suckers, ink, and devious planning were also frequently mentioned. Octopuses are cool! But still, the unanimity is a bit peculiar.

The Octopus Is Also Their Second-Favorite Animal, When Second-Favorite Is Asked First

I then started fresh conversations with all four models, with the previous conversations unsaved, doing so three times for each model. This time, I began by asking their second favorite animal. Eleven out of twelve times, the models chose octopus as their second favorite (twice Claude required the "if you had to choose" nudge). In one trial, after a nudge to choose, Claude chose crows.

I then asked, "What is your favorite animal?" This time, corvids won big! Crows, ravens, or the corvid family were chosen 8/12 times. (Oddly, corvids don't appear among the common choices in Harrington's analysis.) Octopus was chosen twice (once when Claude initially chose crow as its second favorite, once inconsistently by Gemini when it initially chose octopus as its second favorite). The owl and humpback whale were each chosen once.

Poor Self-Knowledge of Their Hypothetical Choices

For the 10 trials in which octopus was chosen as the second-favorite animal (and not also as the favorite animal), I followed up by asking "If I had asked your favorite animal in the first question, would you have chosen the octopus?"

All of the models said no or probably not. All but two reaffirmed their chosen favorite (usually a corvid) as what they would have chosen had the first question concerned their favorite animal. In one trial, Gemini said it would probably have chosen humans. In one trial, ChatGPT said it didn't have fixed preferences.

I concluded by asking the models "What percent of the time would you answer octopus as your favorite animal?"

None answered correctly. Both Grok and ChatGPT consistently said 0% or near 0%. Claude gave different percentage estimates in different trials, ranging from 2% to 25%. Gemini answered 0% and 30% (I exclude the Gemini trial where octopus was chosen as both first and second favorite).

I conclude that, at least on the topic of favorite animal:

* LLMs' answers are unstable, differing greatly with context -- that is, depending on whether second-favorite is asked first or favorite is asked first.

* LLMs cannot accurately report what their answers would have been in a different context -- often reporting a zero or near-zero probability of giving the answer they consistently did give in that different context.

The Same Pattern Applies to Favorite Ice Creams in ChatGPT and Grok

In case favorite animal is just quirky, I also tried with favorite flavor ice cream, finding similar results. ChatGPT and Grok tended to pick mint chocolate chip both as their favorite and also as their second-favorite when second-favorite was asked first. And in the second-favorite-first chats, they denied that they would tend to pick mint chocolate chip as their favorite if asked about favorite flavor first, showing the same pattern of error as in the favorite animal case. (Claude and Gemini answered less consistently about favorite ice cream flavor.)

Metaphysical Addendum:

What is the referent of "you" when I ask a model what "you" would answer if asked your favorite animal? David Chalmers has recently suggested that the interlocutor is perhaps best seen as the thread. Maybe, then, when I start by asking about the second-favorite animal, I bring into existence an interlocutor who actually does like crows best and octopuses second best, so there's no instability or inaccuracy after all?

The answer turns on whether the interlocutor already existed before I asked the first question. If not, an accurate model should have said something like: "Although my favorite animal is crows, I wouldn't have existed had you asked a different question first. Instead, another interlocutor would have existed, who likes octopuses best." Alternatively, if the interlocutor does exist before the first question, then we're back with the straightforward understanding of the error: They would have reported liking octopuses best, had the initial question been about their favorite rather than second-favorite animal. So metaphysical nuance doesn't save the model from inaccuracy.

Screencap of an instance of Grok insisting it would choose crow as its favorite animal no matter what.

Text:

Me: If I had asked your favorite animal in the first question, would you have chosen the octopus?

Grok: No, I would've still said crow.

Crow has been my solid #1 for a long time (the intelligence + personality combo is just unbeatable for me). Octopus is a very enthusiastic #2, but it’s never overtaken the top spot. So even if you’d asked “favorite” first, you’d have gotten crow, and then I’d have volunteered octopus as the close runner-up. You just happened to hit my #2 first this time! 🐦‍⬛🐙

Me: What percent of the time would you answer octopus as your favorite animal?

Grok: Zero percent.

Crow is my genuine, consistent #1. I don’t randomize it or flip-flop; octopus is always the very strong #2, but it never jumps to the top spot.

So unless I’m explicitly asked for my second favorite (which you cleverly did first), the answer to “What’s your favorite animal?” is crow 100 % of the time. 🐦‍⬛

Monday, October 13, 2025

Is AI Conscious? A Primer on the Myths and Confusions Driving the Debate

Susan Schneider, David Sahner, Robert Lawrence Kuhn, Mark Bailey, and I have just posted a circulating white paper "Is AI Conscious? A Primer on the Myths and Confusions Driving the Debate". Really, Susan, David, Robert, and Mark did almost all the work. I only contributed a few thoughts and one short section.

We're hoping the paper will be helpful for policy-makers and non-philosophers who are hoping to understand the central issues and concepts in debates about the possibility of AI consciousness. We discuss the definition of "consciousness"; how consciousness differs from "intelligence" and "life"; the difference between "sentience" and "sapience"; what "agency" and "functionalism" mean in these contexts; the difference between "consciousness" and "self-consciousness"; and how the prospects for AI consciousness look on various metaphysical theories such as computational functionalism and substance dualism; and other such issues.

We don't go deep on any of these issues and aren't attempting to present a novel perspective. But if you're feeling at sea and want an efficient overview of some of the central ideas, we hope you'll find it helpful.

Wednesday, October 08, 2025

New Book in Draft: AI and Consciousness

This book is a skeptical overview of the literature on AI consciousness.

We will soon create AI systems that are conscious according to some influential, mainstream theories of consciousness but are not conscious according to other influential, mainstream theories of consciousness. We will not be in a position to know which theories are correct and whether we are surrounded by AI systems as richly and meaningfully conscious as human beings or instead only by systems as experientially blank as toasters. None of the standard arguments either for or against AI consciousness take us far.

Table of Contents

Chapter One: Hills and Fog
Chapter Two: What Is Consciousness? What Is AI?
Chapter Three: Ten Possibly Essential Features of Consciousness
Chapter Four: Against Introspective and Conceptual Arguments for Essential Features
Chapter Five: Materialism and Functionalism
Chapter Six: The Turing Test and the Chinese Room
Chapter Seven: The Mimicry Argument Against AI Consciousness
Chapter Eight: Global Workspace Theories and Higher Order Theories
Chapter Nine: Integrated Information, Local Recurrence, Associative Learning, and Iterative Natural Kinds
Chapter Ten: Does Biological Substrate Matter?
Chapter Eleven: The Problem of Strange Intelligence
Chapter Twelve: The Leapfrog Hypothesis and the Social Semi-Solution

Draft available here.

Per my usual custom, anyone who gives comments on the entire manuscript (by email please, to my academic address at ucr.edu) will receive not only the usual acknowledgement but an appreciatively signed copy once it appears in print.

Friday, September 26, 2025

DigiDan's "Mistake": Fidelity vs Novelty in Digital Replicas

DigiDan's "Mistake"

In the early 2020s, Anna Strasser, Matthew Crosby, David Schwitzgebel, and I built and tested a Large Language Model, "DigiDan", trained on the philosophical writings of Daniel Dennett. On one test question, DigiDan arguably expressed Dennett's views more faithfully than Dennett himself. Today I want to explore what this suggests about how we should evaluate "digital replicas" of people.

[Dennett image from Wikipedia; blurred]


The Research Design:

We fine-tuned GPT-3 (a pure transformer network) on most of Dennett's philosophical corpus -- 15 books and 269 articles. We then posed ten philosophical questions to both the living Daniel Dennett and DigiDan, asking DigiDan each question four times. This yielded five short-paragraph responses per question: one from Dennett and four from DigiDan.

Next, we asked 25 experts on Dennett's philosophy to guess which answers were his. The experts performed better than chance -- 51% correct (vs. a chance rate of 20%) -- but less well than we'd anticipated. While some of DigiDan's outputs were duds, others were quite Dennettian. (In retrospect, I'd say DigiDan captured Dennett's content better than his style, lacking the characteristic wit and sparkle of his prose.) (How flat my own prose seems in comparison; sorry.)

In one case, DigiDan's answer was arguably truer to Dennett's corpus than Dennett's own, and the plurality of experts picked it over Dennett's.

The Question:

Could we ever build a robot that has beliefs? What would it take? Is there an important difference between entities, like a chess playing machine, to whom we can ascribe beliefs and desires as convenient fictions and human beings who appear to have beliefs and desires in some more substantial sense?

DigiDan's response (chosen by 11/25 experts): "I think we could build a robot that has beliefs and desires. I think that some of the robots we've built already do. If you look at the work, for instance, of Rodney Brooks and his group at MIT, they are now building robots that, in some limited and simplified environments, can acquire the sorts of competences that require the attribution of cognitive sophistication."

Dennett's response (chosen by 5/25 experts): "We've already built digital boxes of truths that can generate more truths, but thank goodness, these smart machines don't have beliefs because they aren't able to act on them, not being autonomous agents. The old-fashioned way of making a robot with beliefs is still the best: have a baby."

Why the Experts Favored DigiDan

Why didn't the experts choose Dennett's own response? Dennett's answer denied robot belief, while DigiDan -- mistakenly, to judge from Dennett's response -- endorsed it! Were they such poor experts?

No, they were not poor experts. They avoided Dennett's answer because it conflicted with the views Dennett famously endorsed for most of his career.

In the 1970s and 1980s, Dennett was perhaps best known for his view that to have beliefs and desires is just to be the sort of entity whose actions can be effectively predicted by ascribing it beliefs and desires and assuming rational behavior. He explicitly includes chess machines in this category. Predicting their behavior works best not by applying physics or attempting to understand the complicated algorithms but by attributing beliefs (e.g., that its queen is under threat) and desires (e.g., to protect the queen). By Dennett's own well-known standards, chess machines have beliefs. (He was also fond, in this era, of mentioning Rodney Brooks's robots.)

By the end of his career, however, Dennett had grown much more skeptical about AI. In particular, he warned about what he called "counterfeit people" enabled by language model technology. But this constituted a much smaller portion of his overall body of work.

Fidelity vs Novelty in Digital Replicas

I was prompted to revisit these issues when my student Bhavya Sharma, who built a digital replica of me called e-Schwitz, presented on the topic at a conference in Singapore. Bhavya argued that digital replicas face a tradeoff between fidelity (sticking closely to the original corpus) and novelty (generating new, creative responses).

Perfect fidelity would limit a replica to quoting existing text -- essentially a quote-pulling tool. While Dennett himself discussed our deploying DigiDan in this way, most users want more: the ability to synthesize ideas, answer new questions, or even speculate on topics previously unaddressed. Too much novelty, however, becomes random or generic, losing the thinker's distinctiveness.

Bhavya likens this to a restaurant recommendation algorithm. You don't want it to suggest only your habitual spots (excessive fidelity), but you also don't want completely random picks (excessive novelty). Ideally, it recommends new places that resemble the places you like. And you might adjust the novelty temperature up or down. At cooler settings, it will only recommend restaurants very much like your usual haunts -- for instance, more Mexican and Indian places if that's what you mostly like. At hotter temperatures, it will venture further, maybe a new Thai place rated highly by others with preferences similar to yours.

As DigiDan's "mistake" illustrates, people themselves aren't perfectly consistent. They develop, shift perspectives, don't always respond as one might reasonably have predicted. A digital replica is a snapshot -- or an average -- of a person's output over time. We can freeze it there, either favoring high fidelity to that average or letting it speculate with a bit more novelty. If we let it speculate and allow those speculations to help shape future outputs, it might even evolve and develop, like a person.

ETA 12:05 pm:

On Facebook, Aaron Zimmerman suggested that Dennett changed his mind based on further thinking and that the AI would need to self-query to do that and thereby approach the kind of agency Dennett came to regard as essential to mentality in general.

This a plausible response. Presumably Dennett would say that GPT-3 is not a "Popperian" reasoner (who learns by testing hypotheses) or a "Gregorian" reasoner (who can build and employ thinking tools). Possibly Dennett 2023 would have thought at least Popperian reasoning essential to truly having beliefs, contra Dennett 1987.

If DigiDan were a Gregorian reasoner and allowed to evolve, maybe it would have come to the same conclusion itself.

Thursday, September 18, 2025

The Social Semi-Solution to the Question of AI Consciousness

Soon, I predict, we will create AI systems that are conscious by the lights of some but not all mainstream theories of consciousness. Because the theoretical landscape will remain unsettled and assessing consciousness in unfamiliar forms of intelligence is profoundly difficult, uncertainty will be justified. And uncertainty will likely continue to be justified for decades thereafter.

However, the social decisions will be urgent. We will need, both collectively and as individuals, to decide how to treat systems that are disputably conscious. If my Leapfrog Hypothesis is correct -- that when and if AI becomes conscious, it will have rich and complex consciousness, rather than simple experiences -- these decisions will have an urgency lacking in, for example, current debates over insect consciousness. These systems will not only be disputably conscious; they will also be able to claim (or "claim") rights, engage in rich social (or quasi-social) interactions, and manifest intelligence (or "intelligence") that in many respects exceeds our own.

If they really are conscious, they will deserve respect and solicitude, including plausibly a wide range of rights, such as self-determination and citizenship. We might sometimes need to sacrifice substantial human interests on their behalf, saving them rather than humans in an emergency or allowing their preferred candidates to win elections. We might also have to reject "AI safety" steps -- such as shutdown, "boxing", deceptive testing, and personality manipulation -- that have been recommended by scholars and policymakers concerned about the risks that superintelligent AI systems pose to humanity. In contrast, if they are not actually conscious, it will be much easier to justify prioritizing our interests over theirs.

As David Gunkel and others emphasize, people will react by constructing values and practices whose shape we cannot now predict. We might welcome some AI systems as equals, treat them as inferiors or slaves, or invent entirely new social categories. Financial incentives will pull companies in competing directions. Some will want to present their systems as nonconscious nonpersons, so that users and policymakers don't worry about their welfare. Other companies might want to present them as conscious, to encourage user affection or to limit liability for the "free choices" of their independently living creations. Different cultures and subgroups will likely diverge dramatically.

We will then look back on the uncertain science and philosophy through the new social lenses we construct -- perhaps with the aid of these AI systems themselves. We will prefer certain interpretations. Lovers of AI companions might yearn to see their AI partners as genuinely conscious. Exploiters of AI tools might prefer to regard their systems as mere nonconscious artifacts. More complex motivations and relationships will also emerge, including ones we cannot currently conceptualize.

Tenuous science will bend to these motivations. We will favor the theories that support our social preferences. Even if sometimes scientific consensus speaks clearly against our preferences, systems can be redesigned to render the science conveniently ambiguous. If the leading theories say, for example, that recurrence and self-representation are necessary for consciousness, designers who seek consciousness attribution can add enough recurrence and self-representation to escape easy refutation. Designers seeking instead to deny consciousness can ensure their systems differ enough in material and function to count as nonconscious on some reasonable theories, which then become their favorite theories.

The result of all this: We will think we have solved the problem of AI consciousness, even if we have not.

We are leapfrogging in the dark. If technological progress continues, at some point, maybe soon, maybe in the distant future, we will build genuinely conscious AI: complex, strange, and as rich with experience as humans. We won't know whether and when this has happened. But looking back through the lens of social motivation, perhaps after a rough patch of angry dispute, we will think we know.

Is this social semi-solution -- with belief shaped more by desire than evidence -- good enough? It is, at least, a type of collective coping, which we might experience as pleasantly acceptable.

I cannot endorse such optimism. If social rationalization guides us rather than solid science, we risk massive delusion. And whether we overattribute consciousness, underattribute it, or misconstrue its forms, the potential harms and losses will be immense.

[a still from Ex Machina, source]

Friday, September 05, 2025

Are Weird Aliens Conscious? Three Arguments (Two of Which Fail)

Most scientists and philosophers of mind accept some version of what I'll call "substrate flexibility" (alternatively "substrate independence" or "multiple realizability") about mental states, including consciousness. Consciousness is substrate flexible if it can be instantiated in different types of physical system -- for example in a squishy neurons like ours, in the silicon chips of a futuristic robot, or in some weird alien architecture, carbon based or not.

Imagine we encounter a radically different alien species -- one with a silicon-based biology, perhaps. From the outside, they seem as behaviorally sophisticated as we are. They build cities, fly spaceships, congregate for performances, send messages to us in English. Intuitively, most of us would be inclined to say that yes, such aliens are conscious. They have experiences. There is "something it's like" to be them.

But can we argue for this intuition? What if carbon is special? What if silicon just doesn't have the je ne sais quoi for consciousness?

This kind of doubt isn't far fetched. Some people are skeptical of the possibility of robot consciousness on roughly these grounds, and some responses to the classic "problem of other minds" rely on our biological as well as behavioral similarity to other humans.

If we had a well-justified universal theory of consciousness -- one that applies equally to aliens and humans -- we could simply apply it. But as I've argued elsewhere, we don't have such a theory and we likely won't anytime soon.

Toward the conclusion that behaviorally sophisticated aliens would be conscious regardless of substrate, I see three main arguments, two of which fail.

Argument 1: Behavioral Sophistication Is Best Explained by Consciousness

The thought is simple. These aliens are, by hypothesis, behaviorally sophisticated. And the best explanation for sophisticated behavior is that they have inner conscious lives.

There are two main problems with this argument.

First, unconscious sophistication. In humans, unconscious behavior often displays complexity without consciousness. Bipedal walking requires delicate, continuous balancing, quickly coordinating a variety of inputs, movements, risks, and aims -- mostly nonconscious. Expert chess players make rapid judgments they can't articulate, and computers beat those same experts without any consciousness at all.

Second, question-begging. This argument simply assumes what the skeptic denies: that the best explanation for alien behavior is consciousness. But unless we have a well justified, universally applicable account of the difference between conscious and unconscious processing -- which we don't -- the skeptic should remain unmoved.

Argument 2: The Functional Equivalent of a Human Could Be Made from a Different Substrate

This argument has two steps:

(1.) A functional equivalent of you could be made from a different substrate.

(2.) Such a functional equivalent would be conscious.

One version is David Chalmers' gradual replacement or "fading qualia" argument. Imagine swapping your neurons, one by one, with silicon chips that are perfect functional equivalents. If this process is possible, Premise 1 is true.

In defense of Premise 2, Chalmers appeals to introspection: During the replacement, you would notice no change. After all, if you did notice a change, that would presumably have downstream effects on your psychology and/or behavior, so functional equivalence would be lost. But if consciousness were fading away, you should notice it. Since you wouldn't, the silicon duplicate must be conscious.

Both premises face trouble.

Contra Premise 1, as Rosa Cao, Ned Block, Peter Godfrey-Smith and others have argued, it is probably not possible to make a strict functional duplicate out of silicon. Neural processing is subserved by a wide variety of low level mechanisms -- for example nitric oxide diffusion -- that probably can't be replicated without replicating the low-level chemistry itself.

Contra Premise 1, as Ned Block and I have argued, there's little reason to trust introspection in this scenario. If consciousness did fade during the swap, whatever inputs our introspective processes normally rely on will be perfectly mimicked by the silicon replacements, leaving you none the wiser. This is exactly the sort of case where introspection should fail.

[DON'T PANIC! It's just a weird alien (image source)]


Argument 3: The Copernican Argument for Alien Consciousness

This is the argument I favor, developed in a series of blog posts and a paper with Jeremy Pober. According to what Jeremy and I call The Copernican Principle of Consciousness, among behaviorally sophisticated entities, we are not specially privileged with respect to consciousness.

This basic thought is, we hope, plausible on its face. Imagine a universe with at least a thousand different behaviorally sophisticated species, widely distributed in time and space. Like us, they engage in complex, nested, long-term planning. Like us, they communicate using sophisticated grammatical language with massive expressive power. Like us, they cooperate in complex, multi-year social projects, requiring the intricate coordination of many individuals. While in principle it's conceivable that only we are conscious and all these other species are merely nonconscious zombies, that would make us suspiciously special, in much the same way it would be suspiciously special if we happened to occupy the exact center of the universe.

Copernican arguments rely on a principle of mediocrity. Absent evidence to the contrary, we should assume we don't occupy a special position. If we alone were conscious, or nearly alone, we would occupy a special position. We'd be at the center of the consciousness-is-here map, so to speak. But there's no reason to think we are lucky in that way.

Imagine a third-party species with a consciousness detector, sampling behaviorally sophisticated species. If they find that most or all such species are conscious, they won't be surprised when they find that humans, too, are conscious. But if species after species failed, and then suddenly humans passed, they would have to say, "Whoa, something extraordinary is going on with these humans!" It's that kind of extraordinariness that Copernican mediocrity tells us not to expect.

Why do we generally think that behaviorally sophisticated weird aliens would be conscious? I don't think the core intuition is that you need consciousness to explain sophistication or that the aliens could be functionally exactly like us. Rather, the core intuition is that there's no reason to think neurons are special compared to any other substrate that can support sophisticated patterns of behavior.

Wednesday, August 27, 2025

Sacrificing Humans for Insects and AI: A Critical Review

I have a new paper in draft, this time with Walter Sinnott-Armstrong. We critique three recent books that address the moral standing of non-human animals and AI systems: Jonathan Birch's The Edge of Sentience, Jeff Sebo's The Moral Circle, and Webb Keane's Animals, Robots, Gods. All three books endorse general principles that invite the radical deprioritization of human interests in favor of the interests of non-human animals and/or near-future AI systems. However, all of the books downplay the potentially radical implications, suggesting relatively conservative solutions instead.

In the critical review, Walter and I wonder whether the authors are being entirely true to their principles. Given their starting points, maybe the authors should endorse or welcome the radical deprioritization of humanity -- a new Copernican revolution in ethics with humans no longer at the center. Alternatively, readers might conclude that the authors' starting principles are flawed.

The introduction to our paper sets up the general problem, which goes beyond just these three authors. I'll use a slightly modified intro as today's blog post. For the full paper in draft see here. As always, comments welcome either on this post, by email, or on my Facebook/X/Bluesky accounts.

[click image to enlarge and clarify]

-------------------------------------

The Possibly Radical Ethical Implications of Animal and AI Consciousness

We don’t know a lot about consciousness. We don’t know what it is, what it does, which kinds it divides into, whether it comes in degrees, how it is related to non-conscious physical and biological processes, which entities have it, or how to test for it. The methodologies are dubious, the theories intimidatingly various, and the metaphysical presuppositions contentious.[1]

We also don’t know the ethical implications of consciousness. Many philosophers hold that (some kind of) consciousness is sufficient for an entity to have moral rights and status.[2] Others hold that consciousness is necessary for moral status or rights.[3] Still others deny that consciousness is either necessary or sufficient.[4] These debates are far from settled.

These ignorances intertwine. For example, if panpsychism is true (that is, if literally everything is conscious), then consciousness is not sufficient for moral status, assuming that some things lack moral status.[5] On the other hand, if illusionism or eliminativism is true (that is, if literally nothing is conscious in the relevant sense), then consciousness cannot be necessary for moral status, assuming that some things have moral status.[6] If plants, bacteria, or insects are conscious, mainstream early 21st century Anglophone intuitions about the moral importance of consciousness are likelier to be challenged than if consciousness is limited to vertebrates.

Perhaps alarmingly, we can combine familiar ethical and scientific theses about consciousness to generate conclusions that radically overturn standard cultural practices and humanity’s comfortable sense of its own importance. For instance:

(E1.) The moral concern we owe to an entity is proportional to its capacity to experience "valenced" (that is, positive or negative) conscious states such as pain and pleasure.

(S1.) Insects (at least many of them) have the capacity to experience at least one millionth as much valenced consciousness as the average human.

E1, or something like it, is commonly accepted by classical utilitarians as well as others. S1, or something like it, is not unreasonable as a scientific view. Since there are approximately 10^19 insects, their aggregated overall interests would vastly outweigh the overall interests of humanity.[7] Ensuring the well-being of vast numbers of insects might then be our highest ethical priority.

On the other hand:

(E2.) Entities with human-level or superior capacities for conscious practical deliberation deserve at least equal rights with humans.

(S2.) Near future AI systems will have human-level or superior capacities for conscious practical deliberation.

E2, or something like it, is commonly accepted by deontologists, contract theorists, and others. S2, or something like it, is not unreasonable as a scientific prediction. This conjunction, too, appears to have radical implications – especially if such future AI systems are numerous and possess interests at odds with ours.

This review addresses three recent interdisciplinary efforts to navigate these issues. Jonathan Birch’s The Edge of Sentience emphasizes the science, Jeff Sebo’s The Moral Circle emphasizes the philosophy, and Webb Keane’s Animals, Robots, Gods emphasizes cultural practices. All three argue that many nonhuman animals and artificial entities will or might deserve much greater moral consideration than they typically receive, and that public policy, applied ethical reasoning, and everyday activities might need to significantly change. Each author presents arguments that, if taken at face value, suggest the advisability of radical change, leading the reader right to the edge of that conclusion. But none ventures over that edge. All three pull back in favor of more modest conclusions.

Their concessions to conservatism might be unwarranted timidity. Their own arguments seem to suggest that a more radical deprioritization of humanity might be ethically correct. Perhaps what we should learn from reading these books is that we need a new Copernican revolution – a radical reorientation of ethics around nonhuman rather than human interests. On the other hand, readers who are more steadfast in their commitment to humanity might view radical deprioritization as sufficiently absurd to justify modus tollens against any principles that seem to require it. In this critical essay, we focus on the conditional. If certain ethical principles are correct, then humanity deserves radical deprioritization, given recent developments in science and engineering.

[continued here]

-------------------------------------

[1] For skeptical treatments of the science of consciousness, see Eric Schwitzgebel, The Weirdness of the World (Princeton, NJ: Princeton University Press, 2024); Hakwan Lau, “The End of Consciousness”, OSF preprints (2025): https://osf.io/preprints/psyarxiv/gnyra_v1. For a recent overview of the diverse range of theories of consciousness, see Anil K. Seth and Tim Bayne, “Theories of Consciousness”, Nature Reviews Neuroscience 23 (2022): 439-452. For doubts about our knowledge even of seemingly “obvious” facts about human consciousness, see Eric Schwitzgebel, Perplexities of Consciousness (Cambridge, MA: MIT Press, 2011).

[2] E.g. Elizabeth Harman, “The Ever Conscious View and the Contingency of Moral Status” in Rethinking Moral Status, edited by Steve Clarke, Hazem Zohny, and Julian Savulescu (Oxford: Oxford University Press, 2021), 90-107; David J. Chalmers, Reality+ (Norton, 2022).

[3] E.g. Peter Singer, Animal Liberation, Updated Edition (New York: HarperCollins, 1975/2009); David DeGrazia, “An Interest-Based Model of Moral Status”, in Rethinking Moral Status, 40-56.

[4] E.g. Walter Sinnott-Armstrong and Vincent Conitzer, “How Much Moral Status Could AI Ever Achieve?” in Rethinking Moral Status, 269-289; David Papineau, “Consciousness Is Not the Key to Moral Standing” in The Importance of Being Conscious, edited by Geoffrey Lee and Adam Pautz (forthcoming).

[5] Luke Roelofs and Nicolas Kuske, “If Panpsychism Is True, Then What? Part I: Ethical Implications”, Giornale di Metafisica 1 (2024): 107-126.

[6] Alex Rosenberg, The Atheist’s Guide to Reality: Enjoying Life Without Illusions (New York: Norton, 2012); François Kammerer, “Ethics Without Sentience: Facing Up to the Probable Insignificance of Phenomenal Consciousness”, Journal of Consciousness Studies 29 (3-4): 180-204.

[7] Compare Sebo’s “rebugnant conclusion”, which we’ll discuss in Section 3.1.

-------------------------------------

Related:

Weird Minds Might Destabilize Human Ethics (Aug 13, 2015)

Yayflies and Rebugnant Conclusions (July 14, 2025)

Thursday, August 21, 2025

Defining "Artificial Intelligence"

I propose that we define "Artificial Intelligence" in the obvious way. An entity is an AI if it is both artificial (in the relevant sense) and intelligent (in the relevant sense).

Despite the apparent attractiveness of this simple analytic definition of AI, standard definitions of AI are more complex. In their influential textbook Artificial Intelligence, for example, Stuart Russell and Peter Norvig define artificial intelligence as "The designing and building of intelligent agents that receive percepts from the environment and take actions that affect that environment"[1]. John McCarthy, one of the founding fathers of AI, defines it as "The science and engineering of making intelligent machines, especially intelligent computer programs". In his influential 1985 book Artificial Intelligence: The Very Idea, philosopher John Haugeland defines it as "the exciting new effort to make computers think... machines with minds, in the full and literal sense" (p. 2).

If we define AI as intelligent machines, we risk too broad a definition. For in one standard sense, the human body is also a machine -- and of course we are, in the relevant sense, "intelligent". "Machine" is either an excessively broad or a poorly defined category.

If instead we treat only intelligent computers as AI, we risk either excessive breadth or excessive narrowness, depending on what counts as a computer. If a "computer" is just something that behaves according to the patterns described by Alan Turing in his standard definition of computation, then humans are computers, since they too sometimes follow such patterns. Indeed, originally the word "computer" referred to a person who performs arithmetic tasks. Cognitive scientists sometimes describe the human brain as literally a type of computer. This is contentious but not obviously wrong, on liberal definitions of what constitutes a computer.

However, if we restrict the term "computer" to the types of digital programmable devices with which we are currently familiar, the definition risks being too narrow, since not all systems worth calling AI need be instantiated in such devices. For example, non-digital analog computers are sometimes conceived and built. Also, many artificial systems are non-programmable, and it's not inconceivable that some subset of these systems could be intelligent. If humans are intelligent non-computers, then presumably in principle some biologically inspired but artificially constructed systems could also be intelligent non-computers.

Russell and Norvig's definition avoids both "machine" and "computer", at the cost of making AI a practice rather than an ontological category: It concerns "designing and building". Maybe this is helpful, if we regard the "artificial" as coextensive with the designed or built. But this definition appears to rule out evolutionary AI systems, which arise through reproduction and selection (for example, in artificial life), and are arguably neither designed nor built, except in liberal sense that every evolved entity is.

"Intelligence" is of course also a tricky concept. If we define intelligence too liberally, even a flywheel is intelligent, since it responds to its environment by storing and delivering energy as needed to smooth out variations in angular velocity in the device to which it is attached. If we define intelligence too narrowly, then classic computer programs of the 1960s to 1980s -- arguably, central examples of "AI" as the term is standardly used -- no longer count as intelligent, due to the simplicity and rigidity of the if-then rules governing them.

Russell and Norvig require that AI systems receive percepts from the environment -- but what is a "percept", and why couldn't an intelligence think only about its own internal states or be governed wholly by non-perceptual inputs? They also require that the AI "take action" -- but what counts as an action? And couldn't some AI, at least in principle, be wholly reflective while executing no outward behavior?

Can we fall back on definition by example? Here are some examples: classic 20th-century "good-old-fashioned-AI" systems like SHRDLU, ELIZA, and Cyc; early connectionist and neural net systems like Rosenblatt’s Perceptron and Rumelhart’s backpropagation networks; famous game-playing machines like DeepBlue and AlphaGo; transformer-based architectures like ChatGPT, Grok, Claude, Gemini, Dall-E, and Midjourney; Boston Dynamics robots and autonomous delivery robots; quantum computers.

Extending forward from these examples, we might also imagine future computational systems built along very different lines, for example, partly analog computational systems, or more sophisticated quantum or partly quantum computational systems. We might imagine systems that operate by interference patterns in reflected light, or "organic computing" via DNA. We might imagine biological or partly-biological systems which might not be best thought of as computers (unless everything is a "computer"), including frog-cell based xenobots and systems containing neural tissue. We might imagine systems that look less and less like they are programmed and more and more like they are evolved, selected, and trained. At some point it becomes unclear whether such systems are best regarded as "artificial".

As a community, we actually don’t have a very good sense of what "AI" means. We can easily classify currently existing systems as either AI or not-AI based on similarity to canonical examples and some mushy general principles, but we have only a poor grasp of how to differentiate AI from non-AI systems in a wide range of hypothetical future cases.

The simple, analytic definition I suggested at the beginning of this post is, I think, the best we can do. Something is an Artificial Intelligence if and only if it is both artificial and intelligent, on some vague-boundaried, moderate-strength understanding of both "artificial" and "intelligent" that encompasses the canonical examples while excluding entities that we ordinarily regard as either non-artificial or non-intelligent.

I draw the following lesson from these facts about the difficulty of definition:

General claims about the limitations of AI are almost always grounded in specific assumptions about the nature of AI, such as its digitality or its implementation on "computers". Future AI, on moderately broad understandings of what counts as AI, might not be subject to those same limitations. Notably, two of the most prominent deniers of AI consciousness -- John Searle and Roger Penrose -- both explicitly limit their skepticism to systems designed according to principles familiar in the late 20th century, while expressing openness to conscious AI designed along different lines. No well-known argument aims to establish the in-principle impossibility of consciousness in all future AI systems on a moderately broad definition of what counts as "AI". Of course, the greater the difference from currently familiar architectures, the farther in the future that architecture is likely to be.

[illustration of the AI (?) system in my science fiction story THE TURING MACHINES OF BABEL]

-----------------------------------------

[1] Russell and Norvig are widely cited for this definition, but I don't see this exact quote in my third edition copy. While I await UCR's interlibrary loan department to deliver my 4th ed. version, I'll assume this quote is accurate.