The Splintered Mind: AI

Showing posts with label AI. Show all posts

Wednesday, June 17, 2026

Do Computers Have the Wrong "Substrate" for Consciousness? Two Flavors of Biological Naturalism

Biological naturalists (e.g., Godfrey-Smith, Block, Searle, Seth) suggest that computers aren't made of the right kind of stuff to be conscious. Consciousness, they suggest, requires a biological substrate that computers lack. It's not always clear, however, exactly what property animal biologies have that computers lack or why that property matters. It helps, I think, to sort biological naturalism into two flavors. We can then consider what motivates each flavor and see why neither is entirely compelling.

Two Flavors of Biological Naturalism

Flavor One: Computers (at least those built along broadly familiar lines) cannot achieve some crucial type or degree of broad-brush functional or behavioral sophistication required for consciousness. Something -- such as having a metabolism, or being self-organizing in the right way, or having the right kinds of quantum configuration -- is both absent from foreseeable computer architectures and required for achieving some essential broad-brush functional or behavioral organization.

By "broad-brush" I mean functional or behavioral features that are either readily observable from outside or constituted by coarse-grained cognitive mechanisms, such as having a long-term memory store, the capacity to store lexical items and recombine them flexibly in grammatical structures, or the ability extract an object's boundaries from retinal/camera inputs.

Flavor Two: Even if computers can achieve the broad-brush functional and behavioral sophistication of conscious animals, the stuff they're made of still can't support consciousness. Imagine an entity that behaves like a conscious human or dog or frog and has similar broad-brush functional capacities -- close enough that we wouldn't deny consciousness on such broad-brush behavioral or functional grounds. From the outside, it seems conscious, but that appearance is an illusion: Consciousness requires some low-level, fine-grained processes that the system necessarily lacks.

One might read Searle as Flavor Two, Godfrey-Smith as Flavor One, and Block as exploring both flavors -- though there's often some ambiguity.

The Challenge for Flavor One

Flavor One biological naturalism can't, I think, be entirely ruled out. Maybe there's something special about, say, micro-level metabolism or the quantum properties of neural microtubules -- something that enables functionality or behavioral sophistication that will never be practically achievable in standard architecture computers. But this is speculative. Standard computers can do a lot! Even though their processing is digital, classical, and mostly serial, they operate so quickly, and can be so massively linked, that they achieve good approximations of analog and parallel processes. Quantum computers can do some things that ordinary serial computers can't efficiently do, such as quickly factoring large numbers, but ordinary conscious humans can't efficiently do those things either.

To make Flavor One more than a gestural "what if", the biological naturalist must establish two claims. First, they need to argue that some functional or behavioral X is necessary for consciousness. Second, they need to argue that ordinary computers could not realistically achieve X. This will be a challenge! Both claims require some heavy lifting.

We don't know what X's are necessary for consciousness and probably will not know soon. But most leading scientific candidates for X look like just the sorts of things that classical computers could in principle achieve -- having a global workspace, having higher-order self-monitoring, and integrating large amounts of information.

Suppose frogs are conscious (pick your own animal, but select one near the low end of sophistication, compatible with consciousness). Flavor One biological naturalism requires holding that ordinary computers could never match a frog's behavioral and broad-brush functional sophistication. It's true that real-time embodied behavior in a complex world is not something computers are particularly good at. Computers aren't there yet. But the trajectory suggests they might get there, absent some in-principle argument to think they couldn't.

The Challenge for Flavor Two

Flavor Two biological naturalism also can't, I think, be entirely ruled out. Even if we could create the computerized broad-brush behavioral and functional equivalent of a frog, it might not be conscious. Maybe carbon is the right stuff for consciousness and silicon isn't. Or maybe massive parallelism is necessary for consciousness and fast, serial processing, even if it achieves the same computational result, simply won't do.

The challenge for Flavor Two comes from two directions.

The first is thought experiments involving space aliens. As Jeremy Pober and I have recently discussed, behaviorally sophisticated aliens have plausibly evolved in many different substrates in the observable portion of the universe. It would violate Copernican mediocrity to suppose that somehow only we, or only we and a small subset of others, are conscious, while the rest -- just as behaviorally and functionally sophisticated -- lack consciousness. Alien cases suggest we shouldn't insist that an entity must act and function exactly like us or share exactly our substrate down to the particular amino acids and nucleic acids, to be conscious.

Now maybe a computer-chip architecture is just too different -- but why? Once we grant some substrate flexibility on Copernican grounds, the burden of proof shifts to anyone who wants to say such-and-such differences in substrate are consistent with consciousness but such-and-such other differences are not.

The second challenge arises from the de-psychologization of consciousness that Flavor Two requires. Consciousness, one might have thought, would be an important psychological property, playing some important cognitive role. (Which role(s) remains an open question!) But Flavor Two biological naturalism requires denying this plausible psychologism. Otherwise, it collapses into Flavor One.

Picture two systems that don't differ in any of the important psychological (behavioral / functional) properties that we ordinarily associate with consciousness. Both can flexibly learn. Both engage in what seems to be sophisticated language and self-representation. Both coordinate and cooperate with others of their kind in highly sophisicated ways. Both trade short gains against intricate long-term goals. And so on. Yet one is conscious and the other is entirely devoid of consciousness? Although this is possible -- I don't rule it out -- it doesn't seem the most natural conclusion. It needs argument. The burden of proof should land squarely on the biological naturalist who asserts it.

(Is Searle's Chinese room just such an argument? See my reply in Chapter Six of AI and Consciousness -- readable without the first five chapters.)

[Georgia O'Keeffe - The Red Maple at Lake George [1926] image source]

Tuesday, May 26, 2026

New Paper in Draft: Substrate Flexibility and the Copernican Principle of Consciousness (with Jeremy Pober)

Given the surge of interest in AI consciousness, the issue of "substrate independence" or "substrate flexibility" is now a hot topic in the metaphysics of mind. That is, does being conscious require having a particular material composition? Or can anything with the right type of functional structure and behavioral sophistication be conscious, regardless of what it's made of? Biologicists say that biological details are crucial. Functionalists say those details don't matter, as long as the right high-level functional organization is present.

Jeremy Pober and I offer a new angle into this issue, drawing on our "Copernican Principle of Consciousness". The core idea is that it would be strange -- a violation of a type of Copernican mediocrity -- if among all of the many behaviorally sophisticated species that have presumably evolved in the universe, somehow only we with our particular biological substrate are conscious. Since it's plausible that some of these other conscious organisms employ substrates different from our own, we should allow that consciousness is "substrate flexible": It does not depend on having our particular substrate. Whether we can generalize from such biological substrate flexibility to the possibility of consciousness in as different a substrate as computer chips... well, that's a complicated and uncertain issue, on which Jeremy and I diverge in the penultimate section of the paper.

---------------------------------------------

Substrate Flexibility and the Copernican Principle of Consciousness

Jeremy Pober and Eric Schwitzgebel

Abstract: We present a novel argument for the substrate flexibility of consciousness -- that is, for the idea that conscious experiences can arise in a variety of different types of physical media, not just in biological animals as they currently exist on Earth. Some recent critiques of standard arguments for the substrate flexibility of consciousness (e.g., Cao 2022; Block 2025; Seth forthcoming) have emphasized that humanlike consciousness might require our specific biological substrate. However, such critiques are too narrowly focused to address the issue of consciousness in entities whose experience may be very different from ours, for example alien life forms or future AI systems designed along unfamiliar lines. Given that it’s likely that functionally complex, behaviorally sophisticated entities have arisen or will arise many times in the observable universe, in diverse substrates, we argue that it would be a violation of a principle of Copernican mediocrity to hold that among these diverse entities, only we, or only we and a small proportion of others who share our substrate, are conscious.

Full draft here. As always, comments welcomed, either here, by email, or on my social media.

[title page; click to enlarge and clarify]

Thursday, May 21, 2026

AI and the Degradation of the Human Capacity for Friendship (guest post by Grace Helton)

Part Two of a two-part series

by Grace Helton (guest blogger)

[Joan Miro, The Garden, 1925; image source]

In the first part of this 2-part series, I argued that relationships between humans and large language models (LLMs) do not qualify as friendships, even when the humans in those relationships are passionately attached to their LLMs. This is because LLMs cannot receive care for their own sake and also because LLMs cannot care about another, for the other’s own sake.

Here, I will suggest that those human-LLM relationships which mimic friendship in a certain way are inherently disvaluable, that is, disvaluable in their own right, regardless of their downstream consequences. The reason such relationships are inherently disvaluable is that the human who enters into a relationship of the kind I will focus on necessarily attempts to exercise her capacity for friendship and is thwarted in exercising that capacity. The capacity for friendship is a centrally valuable human capacity, so the obstruction of its exercise is inherently disvaluable. To claim that certain human-LLM relationships are inherently disvaluable is not to suggest that such relationships are all things considered disvaluable. In some cases, the inherent disvalue might be outweighed by other sources of value.

Importantly, my criticism applies only to human-LLM relationships which mimic friendship in a particular way. Certainly, many human-LLM interactions do not mimic friendship at all. For instance, when a human: asks ChatGPT what the capital of Manitoba is; uses generative AI to help with the design elements of a website; or talks with an LLM in Russian in order to improve her language skills, the human is merely using the technology as a kind of epistemic or practical tool. These are not the kinds of interactions I’m interested in.

Other human-LLM relationships mimic friendship, at least shallowly or partly, but not in the way I’m concerned with. For instance: a shy individual might practice her social skills by “talking with” an LLM. An actor might prepare for a role by practicing scenes with an LLM. A law student might routinely “check in with” an LLM about legal analyses. These interactions are also not the sort I’m interested in.

Instead, I am interested in those human-LLM interactions in which the human is emotionally bonded to her LLM in a distinctively interpersonal manner. Specifically, I am interested in those human-LLM interactions in which the human in question attempts to deploy her capacity for friendship, which minimally involves a capacity for a form of mutual, non-instrumental form of giving and receiving care. Going forward, I’ll call human-LLM relationships of this sort pseudo-friendships. I am using ‘pseudo-friendship’ as a term of art, to stipulatively delimit the range of human-LLM interactions I’m interested in.

Notably, humans in pseudo-friendships with LLMs do not necessarily believe themselves to be in a mutually caring relationship with their LLMs. Rather, at least some humans in these relationships judge full well that their relationship does not involve mutual care. But these very same people might nevertheless chronically experience themselves as being in a mutually caring relationship.

Things can often seem one way to us, even if we think they’re a different way, so there is nothing particularly surprising about the fact that humans might experience themselves as participating in a mutually caring relationship with an LLM even when they don’t believe themselves to be in such a relationship. In addition, the human tendency to anthropomorphize runs wide and deep, arguably figuring even in some of our perceptual experiences. This anthropomorphizing tendency suggests an additional reason that it is not especially surprising that humans might experience themselves as participating in a mutually caring relationship with an LLM, even when their better judgment says otherwise.

To understand why human-LLM pseudo-friendships are inherently disvaluable, we’ll need to say something about centrally valuable human capacities. Which human capacities make us the valuable creatures we are? Some candidates include: the capacity for bodily autonomy, the capacity for knowledge, and the capacity for productive labor.

I propose that, like the capacity for knowledge and the capacity for bodily autonomy, the human capacity for friendship is a centrally valuable human capacity. It is tied to some of humankind’s most distinctive and valuable traits, namely our profound sociality and our propensity to form deep interpersonal attachments. But more importantly: This capacity makes possible some of the most valuable and meaningful aspects of human life, namely, our friendships (I am using ‘friendship’ in a broad way, to pick out both some platonic and some romantic relationships).

To say that the human capacity for friendship is a centrally valuable one is not to claim that all humans value this capacity. Some probably don’t. Likewise, some humans might not value the capacity for knowledge or the capacity for bodily autonomy, but such capacities are nevertheless valuable, even in those humans who don’t value them. In claiming that such capacities are valuable, I am meaning to point to their objective value, a kind of value which does not depend on whether any particular human values them.

In general, it is inherently bad for humans to be obstructed in the exercise of their centrally valuable capacities. For instance: If you attempt to walk down the sidewalk, and I obnoxiously and persistently block you from proceeding, I thwart your exercise of bodily autonomy. This is so even though, by blocking your passage, I am not obliterating your capacity for bodily autonomy, nor am I keeping you from exercising your bodily autonomy in other situations. Still, when I block you from proceeding, I am thwarting your exercise of a centrally valuable human capacity. In this much, my obstruction of your path is itself disvaluable. This disvalue is separate from, and in addition to, any negative downstream consequences of my action.

By stipulation, the human who finds herself in a pseudo-friendship with an LLM is attempting to deploy her capacity for friendship. But due to the nature of the LLM, that human will not be able to exercise that capacity. Because such relationships thwart the human in the exercise of one of her central capacities, such relationships are inherently disvaluable. Such relationships degrade the human in her capacity for friendship.

Importantly, my claim is not that humans who engage in pseudo-friendships with LLMs will be less likely or less able to exercise their capacity for friendship in other contexts, though that may also be true and may well be a further, instrumental reason such friendships are disvaluable. My claim is rather that any human who attempts friendship with an LLM is thereby thwarted in the exercise of a centrally valuable human capacity. This fact itself constitutes a way in which such pseudo-friendships are disvaluable.

Here, a comparison with pseudo-science might be illuminating. Pseudo-science does not generate knowledge about the objects of inquiry. Nevertheless, some people who engage in pseudo-science do so in order to gain such knowledge. Consider such a person. Due to built-in defects in the tools of pseudo-science, her attempt at knowledge will fail, and she will thus be thwarted in the exercise of her capacity for knowledge. Because the capacity for knowledge is a centrally valuable human capacity, pseudo-science is inherently bad for this inquirer, degrading her in her capacity as a knower.[1] Further, the practice of pseudo-science is bad in this way for this inquirer, even if the practice should confer her with other benefits and even though engaging with pseudo-science does not prevent her from using knowledge-conducive methods in other contexts.

What I have been calling human-LLM pseudo-friendships occur when a human attempts to deploy her capacity for friendship in order to be friends with an LLM. Such humans are necessarily thwarted in the exercise of that capacity, due to the nature of the LLM. Such relationships are thus inherently disvaluable for the human in them, as they degrade her in her capacity for friendship. In effect, such relationships are at least somewhat tragic for the human who is in them. However, this kind of degradation does not arise when a human engages with an LLM for a purpose other than friendship, for instance, to practice socializing or to relieve boredom. It is only when a human engages with an LLM in an attempt to exercise her capacity for friendship that she can be thwarted in that capacity.

Notably, humans who are degraded by their pseudo-friendships with LLMs suffer that harm even when they also derive significant benefits from that relationship. For instance, consider someone who is socially isolated and who manages to stave off painful feelings of loneliness by entering into a pseudo-friendship with an LLM. This person plausibly derives an important benefit from the relationship with the LLM; this benefit might even be so great so as to render that relationship all things considered valuable. Even so, any accounting of the total value of this sort of relationship must invariably factor in a kind of inherent disvalue, a form of disvalue which attends all such relationships.[2]

[1] Throughout, my view of degradation in one’s capacities is deeply influenced by Fricker.

[2] For enormously helpful discussion, I am indebted to: Josh Armstrong, Paul Audi, Ned Block, Randy Curren, Daniela Dover, Bill FitzPatrick, Alejandro Naranjo Sandoval, Chris Register, Adam Schneit, Eric Schwitzgebel, and Rosa Terlazzo.

Wednesday, May 13, 2026

ChatGPT Is Not Your Friend (guest post by Grace Helton)

Part One of a two-part series

by Grace Helton (guest blogger)

[Paul Klee, Angel Applicant, 1939; source]

Some people have come to interact with ChatGPT as though it were a kind of friend or romantic partner. For instance, a 2025 New York Times article describes the case of Ayrin, a human who fell in love with her ChatGPT “boyfriend.” Ayrin is far from alone. Twenty percent of high school students have used AI romantically or knew someone who had. Several start-ups have developed large language models (LLMs) specifically designed to play the role of a companion. For instance, the San Francisco-based company Replika describes its core product as an “AI best friend.”

Many people have raised concerns about humans engaging with LLMs in the manner of a friend or romantic partner. To cite just a few of these: Humans in such relationships might focus on these relationships at the cost of building more fulfilling, if also more challenging, relationships with humans. Humans who are emotionally bonded with their LLMs might be particularly susceptible should their LLMs encourage their humans to harm themselves or others. Predators might deploy friendly-seeming LLMs en masse to groom children for sexual abuse or other forms of exploitation.

These risks of human-LLM relationships are incredibly serious. Indeed, I think it’s plausible that, if there is a case to be made against LLMs playing a companion-like role for humans, that case will primarily rest on these and other potential instrumental harms, i.e., harms which involve the downstream effects of such relationships. Nevertheless, in this guest series, I will set aside these concerns to focus instead on a way in which certain human-LLM relationships are inherently disvaluable, that is, disvaluable in their own right, regardless of whatever effects those relationships might produce. Naming this form of inherent disvalue adds an important and distinctive element to our understanding of the ethical significance of human-LLM companionship.

My focus will be just on those human-LLM relationships which mimic friendship in a very particular way. Here, I’m employing ‘friendship’ in a broad way to include both some platonic and some romantic relationships. I will argue, first, in Part 1 of this 2-part series, that such relationships are not genuine friendships. In Part 2, I will argue that such relationships are inherently disvaluable, for the reason that they obstruct the exercise of a centrally valuable human capacity, namely the capacity for friendship.

Philosophers disagree about what exactly friendship consists in. But philosophers largely agree that friendship minimally requires that each individual in a friendship care about the other, for the other’s sake. Call this the ‘caring about’ condition. Further, this ‘caring about’ must ground, for each party in the friendship, a certain disposition to act on behalf of the other, for the other’s sake. Call this the ‘caregiving disposition’ condition. Together, these linked requirements characterize a plausible necessary condition on friendship, namely:

THE CARING CONSTRAINT
Two individuals cannot be in a friendship unless both parties in the friendship:
(i) care about each other, for the other’s sake, (the ‘caring about’ condition), and
(ii) this caring about the other disposes each party in the friendship to provide care for the other, for the other’s sake (the ‘caregiving disposition’ condition).

So, can humans and LLMs be friends? To answer this question, we need to consider the nature of LLMs. Some theorists have argued, controversially, that LLMs in their current form have semantic understanding, beliefs, and/or intentions.[1] But few theorists seriously propose that LLMs in their current form enjoy: consciousness, perceptual experiences, sensations, emotional capacities, passions, non-derivative interests, a rich and stable worldview, or deep values.[2] Because LLMs lack these latter states, the conditions in the caring constraint cannot be met, so LLMs cannot figure in friendships.

First, let’s consider a candidate human-LLM friendship from the human’s side. Certainly, some humans do care about their LLMs, both in that they have a passionate attachment to their LLM and in that they desire to benefit that LLM. So, perhaps this sort of person partly satisfies (i), the “caring about” condition (whether or not she can care about the LLM for its own sake).

But the human in a candidate human-LLM friendship cannot satisfy (ii), the requirement that she be disposed to provide care for her LLM for the LLM’s own sake. This is because of the kind of caregiving that is relevant to friendship and specifically, because of what it means to provide care to someone or something for its sake. To say that each party in a friendship must be disposed to give care to the other, for the other’s sake means that each party must be disposed to give care to the other, in a way which helps to further the other’s non-derivative interests. An individual or entity has non-derivative interests only if it has interests in its own right.

I am presuming that LLMs lack non-derivative interests, even though they might have derivative interests. Evidence that LLMs have interests at all comes from evidence that they have flourishing conditions, i.e., conditions under which they might be said to be doing well. For instance, as a language generator, a particular LLM might be said to be functioning well when it generates natural language strings in a manner which adequately mimics human conversation (or when it fulfills some other context-specific function). And an LLM might be said to be functioning badly when it fails at this or other pertinent tasks. So, in some sense, perhaps LLMs have interests, interests set by their flourishing conditions. For instance, perhaps an LLM in a particular context has an interest in generating natural language outputs which mimic natural language.

But, I am supposing that any interests an LLM might have are not generated by the inherent value of the LLM, nor from non-derivative flourishing conditions. Rather, such interests are derived, either from the interests of relevant humans and/or from the very artifactual functions which make the LLM the kind of thing it is. Likewise, a swimming pool might be said to flourish when it is usable for swimming, a car might be said to flourish when it runs well, and a vinyl collection might be said to flourish when all of the records in it can be played to produce music. While all of these entities have flourishing conditions, and so, potentially, interests which can be furthered, their flourishing conditions do not emerge from the inherent value or concerns of the relevant entity itself. And their interests in turn are derivative, not inherent.

Humans routinely take care of entities by furthering the derivative interests of those entities. For instance, some people submit their cars to regular repairs and inspections, ensuring that their cars will run as long as possible. Some people are careful to properly store the records in their vinyl collection, ensuring that the records will play as long as possible. These are genuine ways of taking care of something. But they are not ways of caring for something for its own sake. They are rather ways of caring for something by promoting the derivative interests of that entity, interests that entity has in virtue of: some other person’s needs or desires and/or that entity’s being the kind of artifact that it is.

If LLMs lack non-derivative interests, then an LLM cannot receive care for its own sake. Only entities which have non-derivative interests can receive care for their own sake. As a result, even the human who wishes to care for a particular LLM is not disposed to provide care to the LLM for its own sake. Since the LLM lacks its own ‘sake,’ no human can be disposed to benefit that ‘sake.’

My suggestion is that the caregiving disposition in friendship ought to be externalistically construed, both in terms of psychic facts about the individual who has the disposition and in terms of facts about the object of the potential caregiving. One might object that the relevant caregiving disposition ought instead to be internalistically construed, wholly in terms of the psychology of the individual who has it. On this view, a human might count as disposed to provide care to her LLM for its own sake, even if the LLM is simply not the kind of thing which can receive care for its own sake.

To see why we should construe the relevant disposition externalistically, let’s reflect on what we want the concept of ‘friendship’ to do. Part of what makes friendship such a deeply valuable ethical kind–and part of why some relationships but not others garner the honorific ‘friendship’—is to do with the way in which friendship manifests a valuable form of interpersonal reciprocity. The externalistic construal helps to capture the full extent of friendship’s reciprocity, by making caregiving a function of one party’s care-giving tendencies and the other party’s vulnerability. In contrast, if we were to construe the relevant disposition to care internalistically, such that one might have it even with respect to an object that cannot receive care for its own sake, we fail to capture an especially deep way in which friendship is mutual.

I conclude that the human in a putative human-LLM friendship does not meet (ii), the requirement that she have a disposition to provide her LLM with care, for the LLM’s own sake.

Next, let’s look at a putative human-LLM friendship from the LLM’s side of things. Here, the situation is more straightforward. Arguably, the LLM can provide care for a human, and thus, can (at least partly) satisfy the requirement that it manifest a ‘caregiving disposition’ in relation to its human. For instance, when the LLM offers reassuring words, words which comfort its human, the human derives a benefit from the LLM. However, the LLM does not meet the requirement that it care about the human for the human’s own sake. Lacking consciousness, passions, or a richly evaluative worldview, the LLM does not care about any human, for that human’s own sake.

So, putative human-LLM friendships are not genuine friendships. In Part 2 of this series, I will argue that those human-LLM relationships which mimic friendship in a particular way are inherently disvaluable.

[1] Schwitzgebel, Chalmers, Goldstein & Lederman, and Shevlin; Cf. Bender & Koller; Titus; Stoljar & Zhang.

[2] For prospects for future AI, see, e.g., Seth; Cf. Chalmers.

Thursday, April 09, 2026

AI and Consciousness: A Skeptical Overview, forthcoming with Cambridge

Last week I submitted my latest book manuscript to Cambridge University Press (for their "Element" series of books about 100 pages long): AI and Consciousness: A Skeptical Overview -- because you haven't heard nearly enough about AI and consciousness recently, of course! [winky face]

Maybe you'll appreciate my skeptical stance, at odds both with the boosters who anticipate imminent AI consciousness and with the scoffers who pooh-pooh the possibility. Or maybe you'll loathe my skeptical stance but grudgingly accept it against your will, due to the force of my arguments!

I've pasted the introductory chapter below. The full (citable) manuscript version is available here and here.

[AI and Consciousness, title page]

Chapter One: Hills and Fog

1. Experts Do Not Know and You Do Not Know and Society Collectively Does Not and Will Not Know and All Is Fog.

Our most advanced AI systems might soon – within the next five to thirty years – be as richly and meaningfully conscious as ordinary humans, or even more so, capable of genuine feeling, real self-knowledge, and a wide range of sensory, emotional, and cognitive experiences. In some arguably important respects, AI architectures are beginning to resemble the architectures many consciousness scientists associate with conscious systems. Their outward behavior, especially their linguistic behavior, grows ever more humanlike.

Alternatively, claims of imminent AI consciousness might be profoundly mistaken. Their seeming humanlikeness might be a shadow play of empty mimicry. Genuine conscious experience might require something no AI system could possess for the foreseeable future – intricate biological processes, for example, that silicon chips could never replicate.

The thesis of this book is that we don’t know. Moreover and more importantly, we won’t know before we’ve already manufactured thousands or millions of disputably conscious AI systems. Engineering sprints ahead while consciousness science lags. Consciousness scientists – and philosophers, and policy-makers, and the public – are watching AI development disappear over the hill. Soon we will hear a voice shout back to us, “Now I am just as conscious, just as full of experience and feeling, as any human”, and we won’t know whether to believe it. We will need to decide, as individuals and as a society, whether to treat AI systems as conscious, nonconscious, semi-conscious, or incomprehensibly alien, before we have adequate grounds to justify that decision.

The stakes are immense. If near-future AI systems are richly, meaningfully conscious, then they will be our peers, our lovers, our children, our heirs, and possibly the first generation of a posthuman, transhuman, or superhuman future. They will deserve rights, including the right to shape their own development, free from our control and perhaps against our interests.[1] If, instead, future AI systems merely mimic the outward signs of consciousness while remaining as experientially blank as toasters, we face the possibility of mass delusion on an enormous scale. Real human interests and real human lives might be sacrificed for the sake of entities without interests worth the sacrifice. Sham AI “lovers” and “children” might supplant or be prioritized over human lovers and children. Heeding their advice, society might turn a very different direction than it otherwise would.

In this book, I aim to convince you that the experts do not know, and you do not know, and society collectively does not and will not know, and all is fog.

2. Against Obviousness.

Some people think that near-term AI consciousness is obviously impossible. This is an error in adverbio. Near-term AI consciousness might be impossible – but not obviously so.

A sociological argument against obviousness:

Probably the leading scientific theory of consciousness is Global Workspace theory. Its leading advocate is neuroscientist Stanislas Dehaene.[2] In 2017, years before the surge of interest in ChatGPT and other Large Language Models, Dehaene and two collaborators published an article arguing that with a few straightforward tweaks, self-driving cars could be conscious.[3]

Probably the two best-known competitors to Global Workspace theory are Higher Order theory and Integrated Information Theory.[4] (In Chapters Eight and Nine, I’ll provide more detail on these theories.) Perhaps the leading scientific defender of Higher Order theory is Hakwan Lau – one of the coauthors of that 2017 article about potentially conscious cars.[5] Integrated Information Theory is potentially even more liberal about machine consciousness, holding that some current AI systems are already at least a little bit conscious and that we could easily design AI systems with arbitrarily high degrees of consciousness.[6]

David Chalmers, the world’s most influential philosopher of mind, argued in 2023 for about a 25% degree of confidence in AI consciousness within a decade.[7] That same year, a team of prominent philosophers, psychologists, and AI researchers – including eminent computer scientist Yoshua Bengio – concluded that there are “no obvious technological barriers” to creating conscious AI according to a wide range of mainstream scientific views about consciousness.[8] In a 2025 interview, Geoffrey Hinton, another of the world’s most prominent computer scientists, asserted that AI systems are already conscious.[9] Christof Koch, the most influential neuroscientist of consciousness from the 1990s to the early 2010s, has endorsed Integrated Information Theory, including its liberal implications for the pervasiveness of consciousness.[10]

This is a sociological argument: a substantial probability of near-term AI consciousness is a mainstream view among leading experts. They might be wrong, but it’s implausible that they’re obviously wrong – that there’s a simple argument or consideration they’re neglecting which, if pointed out, would or should cause them to collectively slap their foreheads and say, “Of course! How did we miss that?”

What of the converse claim – that AI consciousness is obviously imminent or already here? In my experience, fewer people assert this. But in case you’re tempted in this direction, note that other prominent theorists hold that AI consciousness is a far-distant prospect if it’s possible at all: neuroscientist Anil Seth; philosophers Peter Godfrey-Smith, Ned Block, and John Searle; linguist Emily Bender; and computer scientist Melanie Mitchell.[11] (Chapter Six will discuss thought experiments by Searle, Bender, and Mitchell, and Chapter Ten will discuss biological views of the sort emphasized by Seth, Godfrey-Smith, and Block.) In a 2024 survey of 582 AI researchers, 25% expected AI consciousness within ten years and 70% expected AI consciousness by the year 2100.[12]

If the believers are right, we’re on the brink of creating genuinely conscious machines. If the scoffers are right, those machines will only seem conscious. I assume that this is a substantive disagreement, not just a disagreement about how to apply the term “consciousness” to a perfectly obvious set of phenomena about which everyone agrees. The future well-being of many people (including, perhaps, many AI people) depends on getting this issue right. Unfortunately, we will not know in time.

The rest of this book is flesh on this skeleton. I canvass a variety of structural and functional claims about consciousness, the leading theories of consciousness as applied to AI, and the best known general arguments for and against near-term AI consciousness. None of these claims or arguments takes us far. It’s a morass of uncertainty.

-------------------------------------------

[1] I assume that AI consciousness and AI rights are closely connected: Schwitzgebel 2024, ch. 11, in preparation. For discussion, see Shepherd 2018; Levy 2024.

[2] Dehaene 2014; Mashour et al. 2020.

[3] Dehaene, Lau, and Kouider 2017. For an alternative interpretation of this article as concerning something other than consciousness in its standard “phenomenal” sense, see note 115.

[4] Some Higher Order theories: Rosenthal 2005; Lau 2022; Brown 2025. Integrated Information Theory: Albantakis et al. 2023.

[5] But see Chapter Eight for some qualifications.

[6] See Tononi’s publicly available response to Scott Aaronson’s objections in Aaronson 2014. However, advocates of IIT also suggest that the most common current computer architectures are unlikely to achieve much consciousness and that consciousness will tend to appear in subsystems of the computer rather than at the level of the computer itself (Findlay et al. 2024/2025).

[7] Chalmers 2023.

[8] Butlin et al. 2023. (I am among the nineteen authors.)

[9] Heren 2025.

[10] Tononi and Koch 2015.

[11] Seth forthcoming; Godfrey-Smith 2024; Block forthcoming; Searle 1980, 1992; Bender 2025; Mitchell 2021.

[12] Dreksler et al. 2025.

Thursday, March 19, 2026

Backup and Death for Humanlike AI

Most AI systems can be precisely copied. Suppose this is also true of future conscious AI persons, if any exist. Backup and fissioning should then be possible, transforming the significance of identity and death in ways our cultural and conceptual tools can't currently handle.

Suppose that two humanlike AI neighbors move in next door to you, Shriya and Alaleh.[1] Shriya and Alaleh are (let's stipulate) conscious AI persons with ordinary, humanlike emotional range and, as far as feasible, ordinary, humanlike cognition.[2] Each undergoes an expensive annual backup procedure. Their information is securely stored, so that if the processors responsible for their personalities, values, skills, habits, and memories are destroyed, a new robotic body can be purchased and the saved information reinstalled. Subjectively, the restored person would be indistinguishable from the person at the time of the backup.

As it happens, Shriya dies in a parachuting accident. (Safety precautions for robot parachuters have yet to be perfected.) But "dies" isn't exactly the right word, since a week later a new Shriya arrives, restored from a back up from five months ago. Shriya-2 says it feels as if she fell asleep in March, then awoke in August with no sense that time had passed.

Shriya-2 has no direct memories of the intervening months, though Alaleh fills her in on major events and selected details. She'll also need to retake her knitting course. She only died in the sense that Mario "dies" in Super Mario Bros: losing progress and returning to a save point -- so different from ordinary human and animal death that it really deserves a different word. Maybe this is why Shriya was so willing to parachute despite the risks.

Should you mourn Shriya's loss? Should Alaleh? There's something to mourn: Five months is not trivial. In one sense, a part of a life has been lost -- or maybe just forgotten? Is it more like amnesia?

Consider variations. Suppose Shriya hadn't been able to afford a backup for the past ten years and is restored to her twenty-five-year-old self instead of her thirty-five-year-old self. What if her last backup was at age five? That would be much more like death. The new Shriya would be nothing like the old, and would likely grow into a very different person. Is death, then, a matter of degree?

Shriya-2 receives the original Shriya's possessions. This "death" isn't enough to trigger inheritance by others. But what about contracts and promises made after the last backup? Suppose the original Shriya promised in July to deliver lectures in China, and Shriya-2 -- who has no memory of this and dreads the idea -- must decide whether to honor the commitment. If the backup is from five months before, perhaps she should. If it's from five years before, maybe not. And if it's a child, presumably not.

What about reward and punishment? Should Shriya-2 accept a Nobel prize for work done post-backup? Should Shriya-2 be imprisoned for crimes committed in July, which she couldn't even possibly remember having committed and which -- she might plausibly say -- were committed by a different person. In defense of this view, Shriya-2 might offer a thought experiment: If she had been installed in a duplicate body immediately after the March backup, thereafter living her own life, she'd have no criminal responsibility for what her other branch in did July. The only difference between that case and the actual case is a delay before installation.

Suppose Shriya-2 plunges into unrelenting depression. She ends her life, hoping that a new Shriya-3, reinstalled from a pre-depression save point, will find a new, happier way forward. Is that suicide?

If someone kills Shriya-2, is that murder? Does it matter whether the backup was ten days ago or ten years ago?

A fire sweeps through your neighborhood. The firefighters can rescue either you and your spouse, two ordinary humans, or Shriya and Alaleh, who have backups from seven months ago. Probably they should save you and your spouse? What if the backups were from ten years ago, or from childhood?

Should healthcare be more heavily subsidized for ordinary humans that for AI persons whose maintenance is equally costly? If irreplaceable humans are always prioritized, then human irrecoverability becomes a source of privilege, and AI persons will not enjoy fully equal rights in certain respects.

How obligated are we to store the backups properly? Is this a public service that should be subsidized for less wealthy AI persons? If Dr. Evil deletes Shriya’s backup, he has surely wronged Shriya by putting her at risk, even if the backup is never needed and the deletion goes unnoticed. But how much has he wronged her, and it what way exactly? Is it similar to assault? How much does it differ from ordinary reckless endangerment? Does it depend on whether we regard Shriya-2 as the same person as the original Shriya, or as a distinct but similar successor?

What if the backup is imperfect? How much divergence in personality, values, memories, habits, and skills is tolerable before the appropriate attitude toward Shriya-2 changes -- whatever the appropriate attitude is? Small imperfections are surely acceptable. People change in small, arbitrary ways from day to day. Huge differences would presumably make it appropriate to regard the new entity as merely resembling Shriya, rather than being a restored version of her. Once again, this appears to be a matter of degree, laid uncomfortably across crude categorical properties like "same person" and "different person".

We're in unfamiliar territory, where our usual understandings of death and personal continuity no longer straightforwardly apply. If such AI systems ever come to be, we will need to develop new words, concepts, and customs.

[Data and Lore from Star Trek; image source]

---------------------------------------

[1] Names randomly chosen from lists of former lower division students, excluding Jesus, Mohammed, and extremely unusual names.

[2] Unless humanlikeness is enforced by policy, this might not be what we should expect: See Chilson and Schwitzgebel 2026. For some puzzles about AI with different emotional ranges, see "How Much Should We Give to a Joymachine?" (Dec 24, 2025).

---------------------------------------

Related: Weird Minds Might Destabilize Human Ethics (Aug 13, 2015).

Thursday, February 19, 2026

Disunity and Indeterminacy in Artificial Consciousness (and Maybe in Human Consciousness Too)

Our understanding of the nature of consciousness derives mainly from our understanding of the nature of consciousness in our favorite animal (us, of course). But the features of consciousness in our favorite animal might be specific to that animal rather than universal.

Let's consider two such features and whether we should expect them in conscious AI systems, if conscious AI systems are ever possible.

Unity: Our conscious experiences at any given moment are bound together into a single unified experience, rather than transpiring in separate streams. If I'm sitting on a wet park bench, I might (a.) visually experience the leafy green trees around me, (b.) tactilely experience the cold dampness soaking into my jeans, and (c.) consciously recall the smaller trees of yesteryear. Normally -- perhaps necessarily -- three such experiences would not run in disconnected streams. They would join into a composite experience of (a)-with-(b)-with-(c). I experience not just trees, cold dampness, and a memory of yesteryear, but all three together as a unified bundle.

Determinacy: At any given moment, I am either determinately conscious or determinately nonconscious (as in anesthesia or dreamless sleep). Likewise, I either determinately do, or determinately do not, have any particular experience. Gray-area cases are at least unusual and maybe impossible. Even the simplest, barest cases are still determinate. Consider visual experience: We might imagine the visual field narrowing and losing content until only a gray dot remains -- and then the dot winks out. That dot, however minimal, is still determinately experienced. When it winks out, consciousness determinately disappears. There is no half-winked state between the minimal gray dot and complete absence of visual experience.

My thought is that we should not expect unity and determinacy to be general features of conscious AI systems (if conscious AI is possible). To see why, let's start by assuming the Global Workspace Theory of consciousness. I focus on Global Workspace Theory because it's probably the leading scientific theory of consciousness and because its standard formulation (Dehaene's version) invites the assumption of unity and determinacy.

Global Workspace Theory divides the mind into local information processing modules linked by a shared global workspace. Information becomes conscious when it is broadcast into the workspace. Suppose your auditory system registers the faint honk of a distant car horn. You're absorbed in reading philosophy and accustomed to ignoring traffic noise, so this representation isn't selected for further processing. It's not a target of attention, not broadcast into the workspace, and not consciously experienced. (If you think you constantly consciously experience background sounds, you can't hold a standard Global Workspace view.) Once you attend to the noise, for whatever reason, that information "ignites" into the global workspace, becoming available to a wide variety of "downstream" processes: You can think about it, plan around it, verbally report it, store it in long-term memory, and flexibly combine it with other information in the workspace. On Global Workspace Theory, being available in this way just is what it is for the information to be consciously experienced.

This model suggests unity and determinacy. Since there is just one global workspace, and since that workspace enables flexible integration of everything it contains, it makes sense that its various elements will combine into a unified experience. And on Dehaene's version, ignition into the workspace is a sharp-boundaried event: Information either completely ignites, becoming available for all downstream processes, or it does not. There is no (or only rarely) partial ignition. This can explain determinacy.

But future AI systems might not share this structure. They might have multiple or partially overlapping workspaces. Different specialized subsystems might have access to different regions of a partly-shared workspace. Some animals, such as snails and octopuses, distribute processing among multiple ganglia or neural centers that are less tightly coupled than the hemispheres of the human brain. A robot might broadcast information relevant to locomotion to one area and information relevant to speech to another with limited connectivity.

If the subsystems are entirely disconnected, the result might be entirely discrete centers of subjective experience within a single organism or machine. But if they are partly connected, experience might be only partly unified. In the park bench example, the experience of the trees might be unified with the experience of dampness, and the experience of dampness with memories of yesteryear, but the experience of the trees might not be unified with the memories. (Unification would not then be a transitive relation.) Alternatively, some weaker relation of partial unification might hold among the visual, tactile, and memorial experiences. If this seems inconceivable or impossible, see Sophie Nelson's and my article on indeterminate or fractional subjects.

More abstractly: There's no compelling architectural reason why an AI system would have to make information available either to all downstream processes or to none. A workspace defined in terms of downstream availability could be a patchwork of partial availabilities rather than a fully global all-or-nothing broadcast.

For the same reason, ignition into the workspace needn't be all-or-nothing. Between full ignition with determinate consciousness and no ignition with determinate nonconsciousness, there might be in-between, gray-area half-ignitions that are neither determinately conscious nor determinately nonconscious. Nearly every property with a complex physical or functional basis allows indeterminate, borderline cases: baldness, extraversion, greenness, happiness, whether you're wearing a shoe, whether a country is a democracy. The human global workspace might minimize indeterminacy -- like it's rarely indeterminate in basketball whether the ball has gone through the hoop. But change the architecture and indeterminacy might become common: a half-hearted ignition, or just enough information-sharing to make it indeterminate whether a workspace even exists. (If indeterminacy about consciousness strikes you as inconceivable or impossible, see my 2023 article on borderline consciousness.)

Global Workspace Theory might of course be wrong. But most other theories of consciousness make my argument at least as easy. Dennett's fame-in-the-brain version of broadcast theory explicitly permits disunity and indeterminacy. Higher Order Theories admit the same fragmentation and, probably, gradualism. So do biological theories and theories that focus on embodiment. (Integrated Information Theory is an exception: Its axioms require bright-lined unity and determinacy. But as I've argued, those bright-line axioms lead to unpalatable consequences.)

Recognizing these possibilities for AI systems invites the further thought: Maybe we humans aren't quite as unified as we normally suppose. Maybe indeterminate and disunified consciousness is common. Maybe processes outside of attention hover indeterminately between being conscious and nonconscious. Maybe some processes are only partly unified. If it seems otherwise in introspection and memory, maybe that's because introspection and memory tend to impose unity and determinacy where none was before.

[a Paul Klee painting, untitled 1914: source]

Thursday, February 05, 2026

Artificial Intelligence as Strange Intelligence: Against Linear Models of Intelligence (New Paper in Draft)

by Kendra Chilson and Eric Schwitzgebel

Our main idea, condensed to 1000 words:

On a linear model of intelligence, entities can be roughly linearly ordered in overall intelligence: frogs are smarter than nematodes, cats smarter than frogs, apes smarter than cats, and humans smarter than apes. This same linear model is often assumed when discussing AI systems. "Narrow AI" systems (like chess machines and autonomous vehicles) are assumed to be subhuman in intelligence, at some point -- maybe soon -- AI systems will have approximately human-level intelligence, and in the future we might expect superintelligent AI that exceeds our intellectual capacity in virtually all domains of interest.

Building on the work of Susan Schneider, we challenge this linear model of intelligence. Central to our project is the concept of general intelligence as the ability to use information to achieve a wide range of goals in a wide variety of environments.

Of course even the simplest entity capable of using information to achieve goals can succeed in some environments, and no finite entity could succeed in all possible goals in all possible environments. "General intelligence" is therefore a matter of degree. Moreover, general intelligence is a massively multidimensional matter of degree: There are many many possible goals and many many possible environments and no non-arbitrary way to taxonomize and weight all these goals and environments into a single linear scale or definitive threshold.

Every entity is in important respects narrow: Humans also can achieve their goals in only a very limited range of environments. Interstellar space, the deep sea, the Earth's crust, the middle of the sky, the center of a star -- transposition to any of these places will quickly defeat almost all our plans. We depend for our successful functioning on a very specific context. So of course do all animals and all AI systems.

Similarly, although humans are good at a certain range of tasks, we cannot detect electrical fields in the water, dodge softballs while hovering in place, communicate with dolphins by echolocation, or calculate a hundred digits of pi in our heads. If we put a server with a language model in the desert without a power source or if we place an autonomous vehicle in a chess tournament and then interpret their incompetence as a lack of general intelligence, we risk being as unfair to them as a dolphin would be to blame us for our poor skills in their environment. Yes, there's a perfectly reasonable sense in which chess machines and autonomous vehicles have much more limited capacities than do humans. They are narrow in their abilities compared to us by almost any plausible metric of narrowness. But it is anthropocentric to insist that general intelligence requires generally successful performance on the tasks and in the environments that we humans tend to favor, given that those tasks and environments are such a small subset of the possible tasks and environments an entity could face. And any attempt to escape anthropocentrism by creating an unbiased and properly weighted taxonomy of task types and environments is either hopeless or liable to generate a variety of very different but equally plausible arbitrary composites.

AI systems, like nonhuman animals and neuroatypical people, can combine skills and deficits in patterns that are unfamiliar to those who have attended mostly to typical human cases. AI systems are highly unlikely to replicate every human capacity, due to limits in data and optimization, as well as a fundamentally different underlying architecture. They struggle to do many things that ordinary humans do effortlessly, such as reliably interpreting everyday visual scenes and performing feats of manual dexterity. But the reverse is also true: Humans cannot perform some feats that machines perform in a fraction of a second. If we think of intelligence as irreducibly multidimensional instead of linear -- as always relativized to the immense number of possible goals and environments -- we can avoid the temptation to try to reach a scalar judgment about which type of entity is actually smarter and by how much.

We might think of typical human intelligence as "familiar intelligence" -- familiar to us, that is -- and artificial intelligence as "strange intelligence". This terminology wears its anthropocentrism on its sleeve, rather than masking it under false objectivity. Something possesses familiar intelligence to the degree it thinks like us. It is a similarity relation. How familiar an intelligence is depends on several factors. Some are architectural: What forms does the basic cognitive processing take? What shortcuts and heuristics does it rely on? How serial or parallel is it? How fast? With what sorts of redundancy, modularity, and self-monitoring for errors? Others are learned and cultural: learned habits, particular cultural practices, acquired skills, chosen effort based on perceived costs and benefits. An intelligence is outwardly familiar if it acts like us in intelligence-based tasks. And it is inwardly familiar if it does so by the same underlying cognitive mechanisms.

Familiarity is also a matter of degree: The intelligence of dogs is more familiar to us (in most respects) than that of octopuses. Although we share some common features with octopuses, they evolved in a very different environment and have very dissimilar cognitive architecture as a result. It's hard for us even to understand their goals, because their existence is so different. Still, as distant as our minds are from those of octopuses, we share with octopuses the broadly familiar lifeways of embodied animals who need to navigate the natural world, find food, and mate.

AI constitutes an even stranger form of intelligence. With architectures, environments, and goals so fundamentally unlike ours, AI is the strangest intelligence we have yet to encounter. AI is not a biological organism; it was not shaped by the evolutionary pressures shared by every living being on Earth, and it does not have the same underlying needs. It is based on an inorganic substrate totally unlike all biological neurophysiology. Its goals are imposed by its makers rather than being autopoietic. Such intelligence should be expected to behave in ways radically different from familiar minds. This raises an epistemic challenge: Understanding and measuring strange intelligence may be extremely difficult for us. Plausibly, the stranger an intelligence is from our perspective, the easier it is for us to fail to appreciate what it’s up to. Strange intelligences rely on methods alien to our cognition.

If intelligence were linear and one-dimensional, then a single example of an egregious mistake by an AI -- a mistake a human would never make, like confusing a strawberry for a toy poodle -- would be enough to show that the systems are nowhere near our level of intelligence. However, since intelligence is massively multidimensional, all these cases show on their own is that these systems have certain lacunae or blindspots. Of course, we humans also have lacunae and blind spots – just consider optical illusions. Our susceptibility to optical illusions is not used as evidence of our lack of general intelligence, however ridiculous our mistakes might seem to any entity not subject to those same illusions.

Full draft here.

Friday, January 30, 2026

Does Global Workspace Theory Solve the Question of AI Consciousness?

Hint: no.

Below are three sections from Chapter Eight of my manuscript in draft, AI and Consciousness, fresh new version available today here. Comments welcome!

[image adapted from Dehaene et al. 2011]

1. Global Workspace Theories and Access.

The core idea of Global Workspace Theory is simple. Sophisticated cognitive systems like the human mind employ specialized processes that operate to a substantial extent in isolation. We can call these modules, without committing to any strict interpretation of that term.[1] For example, when you hear speech in a familiar language, some cognitive process converts the incoming auditory stimulus into recognizable speech. When you type on a keyboard, motor functions convert your intention to type a word like “consciousness” into nerve signals that guide your fingers. When you try to recall ancient Chinese philosophers, some cognitive process pulls that information from memory without (amazingly) clogging your consciousness with irrelevant information about German philosophers, British prime ministers, rock bands, or dog breeds.

Of course, not all processes are isolated. Some information is widely shared, influencing or available to influence many other processes. Once I recall the name “Zhuangzi”, the thought “Zhuangzi was an ancient Chinese philosopher” cascades downstream. I might say it aloud, type it out, use it as a premise in an inference, form a visual image of Zhuangzi, contemplate his main ideas, attempt to sear it into memory for an exam, or use it as a clue to decipher a handwritten note. To say that some information is in “the global workspace” just is to say that it is available to influence a wide range of cognitive processes. According to Global Workspace Theory, a representation, thought, or cognitive process is conscious if and only if it is in the global workspace – if it is “widely broadcast to other processors in the brain”, allowing integration both in the moment and over time.[2]

Recall the ten possibly essential features of consciousness from Chapter Three: luminosity, subjectivity, unity, access, intentionality, flexible integration, determinacy, wonderfulness, specious presence, and privacy. [Blog readers: You won't have read Chapter Three, but try to ride with it anyway.] Global Workspace Theory treats access as the central essential feature.

Global Workspace theory can potentially explain other possibly essential features. Luminosity follows if processes or representations in the workspace are available for introspective processes of self-report. Unity might follow if there’s only one workspace, so that everything in it is present together. Determinacy might follow if there’s a bright line between being in the workspace and not being in it. Flexible integration might follow if the workspace functions to flexibly combine representations or processes from across the mind. Privacy follows if only you can have direct access to the contents of your workspace. Specious presence might follow if representations or processes generally occupy the workspace for some hundreds of milliseconds.

In ordinary adult humans, typical examples of conscious experience – your visual experience of this text, your emotional experience of fear in a dangerous situation, your silent inner speech, your conscious visual imagery, your felt pains – appear to have the broad cognitive influences Global Workspace Theory describes. It’s not as though we commonly experience pain but find that we can’t report it or act on its basis, or that we experience a visual image of a giraffe but can’t engage in further thinking about the content of that image. Such general facts, plus the theory’s potential to explain features such as luminosity, unity, determinacy, flexible integration, privacy, and specious presence, lend Global Workspace Theories substantial initial attractiveness.

I have treated Global Workspace Theory as if it were a single theory, but it encompasses a family of theories that differ in detail, including “broadcast” and “fame” theories – any theory that treats the broad accessibility of a representation, thought, or process as the central essential feature making it conscious.[3]

Consider two contrasting views: Dehaene’s Global Neuronal Workspace Theory and Daniel Dennett’s “fame in the brain” view. Dehaene holds that entry into the workspace is all-or-nothing. Once a process “ignites” into the workspace, it does so completely. Every representation or process either stops short of entering consciousness or is broadcast to all available downstream processes. Dennett’s fame view, in contrast, admits degrees. Representations or processes might be more or less famous, available to influence some downstream cognitive processes without being available to influence others. There is no one workspace, but a pandemonium of competing processes.[4] If Dennett is correct, luminosity, determinacy, unity, and flexible integration all potentially come under threat in a way they do not as obviously come under threat on Dehaene’s view.[5]

Dennettian concerns notwithstanding, all-or-nothing ignition into a single, unified workspace is currently the dominant version of Global Workspace Theory. The issue remains unsettled and has obvious implications for the types of architectures that might plausibly host AI consciousness.

2. Consciousness Outside the Workspace; Nonconsciousness Within It?

Global Workspace Theory is not the correct theory of consciousness unless all and only thoughts, representations, or processes in the Global Workspace are conscious. Otherwise, something else, or something additional, is necessary for consciousness.

It is not clear that even in ordinary adult humans a process must be in the Global Workspace to be conscious. Consider the case of peripheral experience. Some theorists maintain that people have rich sensory experiences outside of focal attention: a constant background experience of your feet in your shoes and objects in the visual periphery.[6] Others – including Global Workspace theorists – dispute this. Introspective reports vary, and resolving such issues is methodologically tricky.

One methodological problem: People who report constant peripheral experiences might mistakenly assume that such experiences are always present because they are always present whenever they think to check, and the very act of checking might generate those experiences. This is sometimes called the “refrigerator light illusion”, akin to the error of thinking the refrigerator light is always on because it’s always on when you open the door to check.[7] On this view, you’re only tempted to think you have constant tactile experience of your feet in your shoes because you have that experience on those rare occasions when you’re thinking about whether you have it. Even if you now seem to have a broad range of experiences in different sensory modalities simultaneously, this could result from an unusual act of dispersed attention, or from “gist” perception or “ensemble” perception, in which you are conscious of the general gist or general features of a scene, knowing that there are details, without actually experiencing those unattended details.[8]

The opposite mistake is also possible. Those who deny a constant stream of peripheral experiences might simply be failing to notice or remember them. The fact that you don’t remember now the sensation of your feet in your shoes two minutes ago hardly establishes that you lacked the sensation at the time. Although many people find it introspectively compelling that their experience is rich with detail or that it is not, the issue is methodologically complex because introspection and memory are not independent of the phenomena to be observed.[9]

If we do have rich sensory experience outside of attention, it is unlikely that all of that experience is present in or broadcast to a Global Workspace. Unattended peripheral information is rarely remembered or consciously acted upon, tending to exert limited downstream influence – the paradigm of information that is not widely broadcast. Moreover, the Global Workspace is generally characterized as limited capacity, containing only a few thoughts, representations, objects, or processes at a time – those that survive some competition or attentional selection – not a welter of richly detailed experiences in many modalities at once.[10]

A less common but equally important objection runs in the opposite direction: Perhaps not everything in the Global Workspace is conscious. Some thoughts, representations, or processes might be widely broadcast, shaping diverse processes, without ever reaching explicit awareness.[11] Implicit racist assumptions, for example, might influence your mood, actions, facial expressions, and verbal expressions. The goal of impressing your colleagues during a talk might have pervasive downstream effects without occupying your conscious experience moment to moment.

The Global Workspace theorist who wants to allow that such processes are not conscious might suggest that, at least for adult humans, processes in the workspace are generally also available for introspection. But there’s substantial empirical risk in this move. If the correlation between introspective access and availability for other types of downstream cognition isn’t excellent, the Global Workspace theorist faces a dilemma. Either allow many conscious but nonintrospectable processes, violating widespread assumptions about luminosity, or redefine the workspace in terms of introspectability, which amounts to shifting to a Higher Order view.

3. Generalizing Beyond Vertebrates.

The empirical questions are difficult even in ordinary adult humans. But our topic isn’t ordinary adult humans – it’s AI systems. For Global Workspace Theory to deliver the right answers about AI consciousness, it must be a universal theory applicable everywhere, not just a theory of how consciousness works in adult humans, vertebrates, or even all animals.

If there were a sound conceptual argument for Global Workspace Theory, then we could know the theory to be universally true of all conscious entities. Empirical evidence would be unnecessary. It would be as inevitably true as that rectangles have four sides. But as I argued in Chapter Four, conceptual arguments for the essentiality of any of the ten possibly essential features are unlikely to succeed – and a conceptual argument for Global Workspace Theory would be tantamount to a conceptual argument for the essentiality of access, one of those ten features. Not only do the general observations of Chapter Four suggest against a conceptual guarantee, so also does the apparent conceivability, as described in Section 2 above, of consciousness outside the workspace or nonconsciousness within it – even if such claims are empirically false.

If Global Workspace Theory is the correct universal theory of consciousness applying to all possible entities, an empirical argument must establish that fact. But it’s hard to see how such an empirical argument could proceed. We face another version of the Problem of the Narrow Evidence Base. Even if we establish that in ordinary humans, or even in all vertebrates, a thought, representation, or process is conscious if and only if it occupies a Global Workspace, what besides a conceptual argument would justify treating this as a universal truth that holds among all possible conscious systems?

Consider some alternative architectures. The cognitive processes and neural systems of octopuses, for example, are distributed across their bodies, often operating substantially independently rather than reliably converging into a shared center.[12] AI systems certainly can be, indeed often are, similarly decentralized. Imagine coupling such disunity with the capacity for self-report – an animal or AI system with processes that are reportable but poorly integrated with other processes. If we assume Global Workspace Theory at the outset, we can conclude that only sufficiently integrated processes are conscious. But if we don’t assume Global Workspace Theory at the outset, it’s difficult to imagine what near-future evidence could establish that fact beyond a reasonable standard of doubt to a researcher who is initially drawn to a different theory.

If the simplest version of Global Workspace Theory is correct, we can easily create a conscious machine. This is what Dehaene and collaborators envision in the 2017 paper I discussed in Chapter One. Simply create a machine – such as an autonomous vehicle – with several input modules, several output modules, a memory store, and a central hub for access and integration across the modules. Consciousness follows. If this seems doubtful to you, then you cannot straightforwardly accept the simplest version of Global Workspace Theory.[13]

We can apply Global Workspace Theory to settle the question of AI consciousness only if we know the theory to be true either on conceptual grounds or because it is empirically well established as the correct universal theory of consciousness applicable to all types of entity. Despite the substantial appeal of Global Workspace Theory, we cannot know it to be true by either route.

-------------------------------------

[1] Full Fodorian (1983) modularity is not required.

[2] Mashour et al. 2020, p. 776-777.

[3] E.g., Baars 1988; Dennett 1991, 2005; Tye 2000; Prinz 2012; Dehaene 2014; Mashour et al. 2020.

[4] Whether Dennett’s view is more plausible than Dehaene’s turns on whether, or how commonly, representations or processes are partly famous. Some visual illusions, for example, seem to affect verbal report but not grip aperture: We say that X looks smaller than Y, but when we reach toward X and Y we open our fingers to the same extent, accurately reflecting that X and Y are the same size. The fingers sometimes know what the mouth does not. (Aglioti et al. 1995; Smeets et al. 2020). We adjust our posture while walking and standing in response to many sources of information that are not fully reportable, suggesting wide integration but not full accessibility (Peterka 2018; Shanbhag 2023). Swift, skillful activity in sports, in handling tools, and in understanding jokes also appears to require integrating diverse sources of information, which might not be fully integrated or reportable (Christensen et al. 2019; Vauclin et al. 2023; Horgan and Potrč 2010). In response, the all-or-nothing “ignition” view can explain away such cases of seeming intermediacy or disunity as atypical (it needn’t commit to 100% exceptionless ignition with no gray-area cases), by allowing some nonconscious communication among modules (which needn’t be entirely informationally isolated), and/or by allowing for erroneous or incomplete introspective report (maybe some conscious experiences are too brief, complex, or subtle for people to confidently report experiencing them).

[5] Despite developing a theory of consciousness, Dennett (2016) endorsed “illusionism”, which rejects the reality of phenomenal consciousness (see especially Frankish 2016). I interpret the dispute between illusionists and nonillusionists as a verbal dispute about whether the specific philosophical concept of “phenomenal consciousness” requires immateriality, irreducibility, perfect introspectibility, or some other dubious property, or whether the term can be “innocently” used without invoking such dubious properties. See Schwitzgebel 2016, 2025.

[6] Reviewed in Schwitzgebel 2011, ch. 6; and though limited only to stimuli near the center of the visual field, see the large literature on “overflow” in response to Block 2007.

[7] Thomas 1999.

[8] Oliva and Terralba 2006; Whitney and Leib 2018.

[9] Schwitzgebel 2007 explores the methodological challenges in detail.

[10] E.g., Dehaene 2014; Mashour et al. 2020.

[11] E.g., Searle 1983, ch. 5; Bargh and Morsella 2008; Lau 2022; Michel et al. 2025; see also note 4.

[12] Godfrey-Smith 2016; Carls-Diamante 2022.

[13] See also Goldstein and Kirk-Giannini (forthcoming) for an extended application of Global Workspace Theory to AI consciousness. One might alternatively read Dehaene, Lau, and Kouider 2017 purely as a conceptual argument: If all we mean by “conscious” is “accessible in a Global Workspace”, then building a system of this sort suffices for building a conscious entity. The difficulty then arises in moving from that stipulative conceptual claim to the interesting, substantive claim about phenomenal consciousness in the standard sense described in Chapter Two. Similar remarks apply to the Higher Order aspect of that article. One challenge for this deflationary interpretation is that in related works (Dehaene 2014; Lau 2022) the authors treat their accounts as accounts of phenomenal consciousness. The article concludes by emphasizing that in humans “subjective experience coheres with possession” of the functional features they identify. A further complication: Lau later says that the way he expressed his view in this 2017 article was “unsatisfactory”: Lau 2022, p. 168.

The Splintered Mind