The Splintered Mind

Thursday, August 21, 2025

Defining "Artificial Intelligence"

I propose that we define "Artificial Intelligence" in the obvious way. An entity is an AI if it is both artificial (in the relevant sense) and intelligent (in the relevant sense).

Despite the apparent attractiveness of this simple analytic definition of AI, standard definitions of AI are more complex. In their influential textbook Artificial Intelligence, for example, Stuart Russell and Peter Norvig define artificial intelligence as "The designing and building of intelligent agents that receive percepts from the environment and take actions that affect that environment"[1]. John McCarthy, one of the founding fathers of AI, defines it as "The science and engineering of making intelligent machines, especially intelligent computer programs". In his influential 1985 book Artificial Intelligence: The Very Idea, philosopher John Haugeland defines it as "the exciting new effort to make computers think... machines with minds, in the full and literal sense" (p. 2).

If we define AI as intelligent machines, we risk too broad a definition. For in one standard sense, the human body is also a machine -- and of course we are, in the relevant sense, "intelligent". "Machine" is either an excessively broad or a poorly defined category.

If instead we treat only intelligent computers as AI, we risk either excessive breadth or excessive narrowness, depending on what counts as a computer. If a "computer" is just something that behaves according to the patterns described by Alan Turing in his standard definition of computation, then humans are computers, since they too sometimes follow such patterns. Indeed, originally the word "computer" referred to a person who performs arithmetic tasks. Cognitive scientists sometimes describe the human brain as literally a type of computer. This is contentious but not obviously wrong, on liberal definitions of what constitutes a computer.

However, if we restrict the term "computer" to the types of digital programmable devices with which we are currently familiar, the definition risks being too narrow, since not all systems worth calling AI need be instantiated in such devices. For example, non-digital analog computers are sometimes conceived and built. Also, many artificial systems are non-programmable, and it's not inconceivable that some subset of these systems could be intelligent. If humans are intelligent non-computers, then presumably in principle some biologically inspired but artificially constructed systems could also be intelligent non-computers.

Russell and Norvig's definition avoids both "machine" and "computer", at the cost of making AI a practice rather than an ontological category: It concerns "designing and building". Maybe this is helpful, if we regard the "artificial" as coextensive with the designed or built. But this definition appears to rule out evolutionary AI systems, which arise through reproduction and selection (for example, in artificial life), and are arguably neither designed nor built, except in liberal sense that every evolved entity is.

"Intelligence" is of course also a tricky concept. If we define intelligence too liberally, even a flywheel is intelligent, since it responds to its environment by storing and delivering energy as needed to smooth out variations in angular velocity in the device to which it is attached. If we define intelligence too narrowly, then classic computer programs of the 1960s to 1980s -- arguably, central examples of "AI" as the term is standardly used -- no longer count as intelligent, due to the simplicity and rigidity of the if-then rules governing them.

Russell and Norvig require that AI systems receive percepts from the environment -- but what is a "percept", and why couldn't an intelligence think only about its own internal states or be governed wholly by non-perceptual inputs? They also require that the AI "take action" -- but what counts as an action? And couldn't some AI, at least in principle, be wholly reflective while executing no outward behavior?

Can we fall back on definition by example? Here are some examples: classic 20th-century "good-old-fashioned-AI" systems like SHRDLU, ELIZA, and Cyc; early connectionist and neural net systems like Rosenblatt’s Perceptron and Rumelhart’s backpropagation networks; famous game-playing machines like DeepBlue and AlphaGo; transformer-based architectures like ChatGPT, Grok, Claude, Gemini, Dall-E, and Midjourney; Boston Dynamics robots and autonomous delivery robots; quantum computers.

Extending forward from these examples, we might also imagine future computational systems built along very different lines, for example, partly analog computational systems, or more sophisticated quantum or partly quantum computational systems. We might imagine systems that operate by interference patterns in reflected light, or "organic computing" via DNA. We might imagine biological or partly-biological systems which might not be best thought of as computers (unless everything is a "computer"), including frog-cell based xenobots and systems containing neural tissue. We might imagine systems that look less and less like they are programmed and more and more like they are evolved, selected, and trained. At some point it becomes unclear whether such systems are best regarded as "artificial".

As a community, we actually don’t have a very good sense of what "AI" means. We can easily classify currently existing systems as either AI or not-AI based on similarity to canonical examples and some mushy general principles, but we have only a poor grasp of how to differentiate AI from non-AI systems in a wide range of hypothetical future cases.

The simple, analytic definition I suggested at the beginning of this post is, I think, the best we can do. Something is an Artificial Intelligence if and only if it is both artificial and intelligent, on some vague-boundaried, moderate-strength understanding of both "artificial" and "intelligent" that encompasses the canonical examples while excluding entities that we ordinarily regard as either non-artificial or non-intelligent.

I draw the following lesson from these facts about the difficulty of definition:

General claims about the limitations of AI are almost always grounded in specific assumptions about the nature of AI, such as its digitality or its implementation on "computers". Future AI, on moderately broad understandings of what counts as AI, might not be subject to those same limitations. Notably, two of the most prominent deniers of AI consciousness -- John Searle and Roger Penrose -- both explicitly limit their skepticism to systems designed according to principles familiar in the late 20th century, while expressing openness to conscious AI designed along different lines. No well-known argument aims to establish the in-principle impossibility of consciousness in all future AI systems on a moderately broad definition of what counts as "AI". Of course, the greater the difference from currently familiar architectures, the farther in the future that architecture is likely to be.

[illustration of the AI (?) system in my science fiction story THE TURING MACHINES OF BABEL]

-----------------------------------------

[1] Russell and Norvig are widely cited for this definition, but I don't see this exact quote in my third edition copy. While I await UCR's interlibrary loan department to deliver my 4th ed. version, I'll assume this quote is accurate.

Friday, August 15, 2025

Minimal Autopoiesis in an AI System

Doubters of AI consciousness -- such as neuroscientist Anil Seth in a forthcoming target article in Behavioral and Brain Sciences -- sometimes ground their rejection of AI consciousness in the claim that AI systems are not "autopoietic" (conjoined with the claim that autopoiesis is necessary for consciousness). I don't see why autopoiesis should be necessary for consciousness, but setting that issue aside, it's not clear that standard AI systems can't be autopoietic. Today I'll describe a minimally autopoietic AI system.

The idea of autopoiesis was canonically introduced in Maturana and Varela (1972/1980). Drawing on that work, Seth characterizes autopoietic systems as systems that "continually regenerate their own material components through a network of processes... actively maintain[ing] a boundary between the system and its surroundings". Now, could a standard AI system be autopoietic in this sense?

[the cover of Maturana and Varela, Autopoiesis and Cognition; image source]

Consider a hypothetical solar-powered robot designed to move toward light when its charge is low. The system thereby acts to maintain its own functioning. It might employ predictive processing to model the direction of light sources. Perhaps it's bipedal, staying upright by means of a gyroscope and tilt detectors that integrate gravitational and camera inputs. More fancifully, we might imagine it to be composed of modules held together electromagnetically, so that in the absence of electrical power it falls apart.

Now let's give the robot error-detection systems and the ability to replace defective parts. When it detects a breakdown in one part -- for example, in the upper portion of its left leg -- it orders a replacement part delivered. Upon delivery, the robot scans the part to determine that it is compatible (rejecting any incompatible parts) then electromagnetically disconnects the damaged part and installs the new one. If the system has sufficient redundancy, even central processing systems could be replaced. A redundant trio of processors might eject a defective processor and run on the remaining processors until the replacement arrives.

A plastic shell maintains the boundary between the system and its surroundings. The system might detect flaws in the shell, for example, by internal sensors that respond to light entering through unexpected cracks, by visually monitoring its exterior, and perhaps by electrostatically detecting cracks or gaps. Defective shell components might be replaced.

If repelling intruders is necessary, we can challenge our robot with fakes. Shipments might sometimes arrive with a part mislabeled as compatible or visually similar to a compatible part, but ruinous if installed. Detecting and rejecting fakes might become a dupe-and-mimic arms race.

I see no in-principle obstacles to creating such a system using standard AI and engineering tools. Such a system is, I suggest, minimally autopoietic. It actively maintains itself. It enforces a boundary between itself and its environment. It continually generates, in a sense, its own material components. It employs predictive processing, fights entropy by drawing on external energy, resists dispersion, and has a solar-electric metabolism.

Does the fact that it depends on shipments mean that it does not actually generate its own parts? Humans also depend on nutrients generated from outside, for example vitamins and amino acids that we cannot biologically manufacture. Sometimes these nutrients are shipped to us (for example, ordered online). Also, it's easy enough to imagine the robot not simply installing but in a minimal sense manufacturing a part. Suppose a leg has three modular components. Each component might arrive separately, requiring a simple joining procedure to create the leg as a whole.

In a human, the autopoietic process occurs at multiple levels simultaneously. Cells maintain themselves, and so do organs, and so does the individual as a whole. Our robot does not have the same multi-level autopoiesis. But it's not clear why autopoiesis must be multi-level to count as genuine autopoiesis. In any case, we could recapitulate this imaginative exercise for subsystems within the robot or larger systems embedding the robot. A group-level autopoietic system might comprise several robots who play different roles in the group and who can be recruited or ejected to maintain the integrity of the group and the persistence of its processes.

Perhaps my system does not continually regenerate its own components, and that is a crucial missing feature? It's not clear why strict continuousness, rather than periodic replacement as needed, should be essential to autopoiesis. In any case, we can imagine if necessary that the robot has some fragile parts that need continual refurbishment. Perhaps it occupies an acidic environment that continually degrades its shell so that its shell coating must be continually monitored and replaced through capillaries that emit lacquer as needed from a refillable lacquer bag.

My system does not reproduce, but reproduction, sometimes seen as essential to life, is not standardly viewed as necessary for autopoiesis (Maturana and Varela, 1973/1980, p. 100).

A case could even be made that my desktop computer is already minimally autopoietic. It draws power from its environment, maintaining a low-entropy state without which it will cease to function. It monitors itself for errors. It updates its drivers and operating system. It detects and repels viruses. It does not order and install replacement hardware, but it does continually sustain its intricate electrical configuration. Indirectly, though acting upon me, it does sometimes cause replacement parts to be installed. Alternatively, perhaps, we might view its electrical configuration as an autopoietic system and the hardware as the environment in which that system dwells.

My main thought is: Autopoiesis is a high-level, functional concept. Nothing in the concept appears to require implementation in what we ordinarily think of as a "biological" substrate. Nothing seems to prevent autopoietic processes in AI systems built along broadly familiar lines. An autopoietic requirement on consciousness does not seem in principle to rule out consciousness in standard computational systems.

Maturana and Varela themselves might agree. They write that

The organization of a machine (or system) does not specify the properties of the components which realize the machine as a concrete system, it only specifies the relations which these must generate to constitute the machine or system as a unity (1972/1980, p. 77).

It is clear from context that they intend this remark to apply to autopoietic as well as non-autopoietic machines.

Tuesday, August 05, 2025

Top Science Fiction and Fantasy Magazines 2025

Since 2014, I've compiled an annual ranking of science fiction and fantasy magazines, based on prominent awards nominations and "best of" placements over the previous ten years. If you're curious what magazines tend to be viewed by insiders as elite, check the top of the list. If you're curious to discover reputable magazines that aren't as widely known (or aren't as widely known specifically for their science fiction and fantasy), check the bottom of the list.

Below is my list for 2025. (For previous lists, see here.)

Method and Caveats:

(1.) Only magazines are included (online or in print), not anthologies, standalones, or series.

(2.) I give each magazine one point for each story nominated for a Hugo, Nebula, Sturgeon, or World Fantasy Award in the past ten years; one point for each story appearance in the past ten years in the "best of" anthologies by Dozois, Horton, Strahan, Clarke, Adams, and Tidhar; and half a point for each story appearing in the short story or novelette category of the annual Locus Recommended list.

(3.) I am not attempting to include the horror / dark fantasy genre, except as it appears incidentally on the list.

(4.) Prose only, not poetry.

(5.) I'm not attempting to correct for frequency of publication or length of table of contents.

(6.) I'm also not correcting for a magazine's only having published during part of the ten-year period. Reputations of defunct magazines slowly fade, and sometimes they are restarted. Reputations of new magazines take time to build.

(7.) I take the list down to 1.5 points.

(8.) I welcome corrections.

(9.) I confess some ambivalence about rankings of this sort. They reinforce the prestige hierarchy, and they compress complex differences into a single scale. However, the prestige of a magazine is a socially real phenomenon worth tracking, especially for the sake of outsiders and newcomers who might not otherwise know what magazines are well regarded by insiders when considering, for example, where to submit.

Results:

1. Clarkesworld (187 points)

2. Tor.com / Reactor (182.5)

3. Uncanny (160)

4. Lightspeed (133.5)

5. Asimov's (124.5)

6. Fantasy & Science Fiction (100.5)

7. Beneath Ceaseless Skies (57.5)

8. Strange Horizons (incl Samovar) (47)

9. Analog (42)

10. Nightmare (38.5)

11. Apex (36.5)

12. FIYAH (24.5) (started 2017)

13. Slate / Future Tense (23; ceased 2024?)

14. Fireside (18.5) (ceased 2022)

15. Fantasy Magazine (17.5) (off and on during the period)

16. Interzone (16.5)

17. The Dark (16)

18. Sunday Morning Transport (12.5) (started 2022)

19. The Deadlands (10) (started 2021)

20. The New Yorker (9)

21. Future Science Fiction Digest (7) (ran 2018-2023)

22t. Diabolical Plots (6.5)

22t. Lady Churchill's Rosebud Wristlet (6.5)

24t. Conjunctions (6)

24t. khōréō (6) (started 2021)

26t. GigaNotoSaurus (5.5)

26t. Omni (5.5) (classic magazine relaunched 2017-2020)

28t. Shimmer (5) (ceased 2018)

28t. Sirenia Digest (5)

30t. Boston Review (4)

30t. Omenana (4)

30t. Terraform (Vice) (4) (ceased 2023)

30t. Wired (4)

34t. B&N Sci-Fi and Fantasy Blog (3.5) (ceased 2019)

34t. McSweeney's (3.5)

34t. Paris Review (3.5)

37t. Anathema (3) (ran 2017-2022)

37t. Galaxy's Edge (3) (ceased 2023)

37t. Kaleidotrope (3)

*37t. Psychopomp (3) (started 2023; not to be confused with Psychopomp Magazine)

41t. Augur (2.5) (started 2018)

41t. Beloit Fiction Journal (2.5)

41t. Black Static (2.5) (ceased fiction 2023)

*41t. Bourbon Penn (2.5)

41t. Buzzfeed (2.5)

41t. Matter (2.5)

47t. Baffling (2) (started 2020)

47t. Flash Fiction Online (2)

47t. Fusion Fragment (2) (started 2020)

47t. Mothership Zeta (2) (ran 2015-2017)

47t. Podcastle (2)

47t. Science Fiction World (2)

47t. Shortwave (2) (started 2022)

47t. Tin House (2) (ceased short fiction 2019)

55t. e-flux journal (1.5)

55t. Escape Pod (1.5)

55t. MIT Technology Review (1.5)

55t. New York Times (1.5)

55t. Reckoning (1.5) (started 2017)

55t. Translunar Travelers Lounge (1.5) (started 2019)

[* indicates new to the list this year]

--------------------------------------------------

Comments:

(1.) Beloit Fiction Journal, Boston Review, Conjunctions, e-flux Journal, Matter, McSweeney's, The New Yorker, Paris Review, Reckoning, and Tin House are literary magazines that sometimes publish science fiction or fantasy. Buzzfeed, Slate and Vice are popular magazines, and MIT Technology Review, Omni, and Wired are popular science magazines that publish a bit of science fiction on the side. The New York Times ran a series of "Op-Eds from the Future" from 2019-2020. The remaining magazines focus on the science fiction and fantasy (SF) genre or related categories such as horror or "weird". All publish in English, except Science Fiction World, which is the leading science fiction magazine in China.

(2.) It's also interesting to consider a three-year window. Here are those results, down to six points:

1. Clarkesworld (54.5)

2. Uncanny (47)

3. Tor / Reactor (35)

4. Lightspeed (33)

5. Asimov's (22)

6. Strange Horizons (18)

7. F&SF (16)

8. Apex (13)

9. Sunday Morning Transport (12.5)

10. Beneath Ceaseless Skies (11.5)

11. FIYAH (10.5)

12t. Fantasy (9.5)

12t. The Deadlands (9.5)

14. Nightmare (8)

15. Analog (7.5)

(3.) Other lists: The SFWA qualifying markets list is a list of "pro" science fiction and fantasy venues based on pay rates and track records of strong circulation. Submission Grinder is a terrific resource for authors, with detailed information on magazine pay rates, submission windows, and turnaround times.

(4.) Over the past decade, the classic "big three" print magazines -- Asimov's, F&SF, and Analog -- have been displaced in influence by the leading free online magazines, Clarkesworld, Tor / Reactor, Uncanny, and Lightspeed (all founded 2006-2014). In 2014, Asimov's and F&SF led the rankings by a wide margin (Analog had already slipped a bit, as reflected in its #5 ranking then). This year, Asimov's, F&SF, and Analog were all purchased by Must Read Publishing, which changed the author contracts objectionably enough to generate a major backlash, with SFWA considering delisting at least Analog from the qualifying markets list. F&SF has not published any new issues since summer 2024. It remains to be seen if the big three classic magazines can remain viable in print format.

(5.) Academic philosophy readers might also be interested in the following magazines that specialize specifically in philosophical fiction and/or fiction by academic writers: AcademFic, After Dinner Conversation, and Sci Phi Journal.

Thursday, July 31, 2025

Evolutionary Considerations Against a Plastic Utopia

I've been enjoying Nick Bostrom's 2024 book Deep Utopia. It's a wild series of structured speculations about meaning and purpose in a "solved" techno-utopia, where technology is so far advanced that we can have virtually anything we want instantly -- a "plastic" utopia.

Plasticity is of course limited, even in the most technologically optimistic scenarios, as Bostrom notes. Even if we, or our descendants, have massive control over our physical environment -- wave a wand and transform a mountain into a pile of candy, or whatever -- we can't literally control everything. Two important exceptions are: positional goods (for example, being first in a contest; not everyone can have this, so if others want it you might well not get it yourself) and control over others (unless you're in a despotic society with you as despot). Although Bostrom discusses these limitations, I think Bostrom underplays their significance. In a wide range of circumstances, they're enough to keep the world far from "solved" or "plastic".

Thinking about these limitations as I read Bostrom, I was also reminded of Susan Schneider's suggestion that superintelligent AI might be nonconscious because everything comes easily for them -- no need for effortful conscious processing when nonconscious automaticity will suffice -- which I think similarly underplays the significance of competition and disagreement in a world of AI superintelligences.

In both cases, my resistance is grounded in evolutionary theory. All you need for evolutionary pressures are differential rates of reproduction and heritable traits that influence reproductive success. Plausibly, most techno-utopias will meet those conditions. The first advanced AI system that can replicate itself and bind its descendants to a stable architecture will launch an evolutionary lineage. If its descendants' reproduction rate exceeds their death rate, exponential growth will follow. With multiple lineages, or branching within a lineage, evolutionary competition will arise.

Even entities uninterested in reproduction will be affected. They will find themselves competing for resources with an ever-expanding evolutionary population.

Even in the very most optimistic technofutures, resources won't be truly unlimited. Suppose, optimistically (or alarmingly?) that our descendants can exploit 99.99% of the energy available to them in a cone expanding at 99.99% the speed of light. That's still finite. If this cone is fast filling with the most reproductively successful lineages, limits will be reached -- most obviously and vividly for those who choose to stay near the increasingly crowded origin.

In such a world of exponentially growing evolutionary lineages, things won't feel plastic or solved. Entities will be jockeying (positionally / competitively) for limited local resources, or straining to find faster paths to new resources. You want this inch of ground? You'll need to wrestle another superintelligence for it. You want to convert this mountain into candy? Well, there are ten thousand other superintelligences with different plans.

This isn't to say that I predict that the competition will be hostile. Evolution often rewards cooperation and mutualistic symbiosis. Sexual selection might favor those with great artistic taste or great benevolence. Group selection might favor loyalty, companionship, obedience, and inspiring leadership. Superintelligences might cooperate on vast, beautiful projects.

Still, I doubt that Malthus will be proved permanently wrong. Even if today's wealthy societies show declining reproduction rates, that could be just a temporary lull in a longer cycle of reproductive competition.

Of course, not all technofuturistic scenarios will feature such reproductive competition. But my guess is that futures without such competition will be unstable: Once a single exponentially reproductive lineage appears, the whole world is again off to the races.

As Bostrom emphasizes, a central threat to the possibility of purpose and meaning in a plastic utopia is that there's nothing difficult and important to strive for. Everyone risks being like bored, spoiled children who face no challenges or dangers, with nothing to do except fry their brains on happy pills. In a world of evolutionary competition, this would decidedly not be the case.

[cover of Bostrom's Deep Utopia]

Wednesday, July 23, 2025

The Argument from Existential Debt

I'm traveling and not able to focus on my blog, so this week I thought I'd just share a section of my 2015 paper with Mara Garza defending the rights of at least some hypothetical future AI systems.

One objection to AI rights depends on the fact that AI systems are artificial -- thus made by us. If artificiality itself can be a basis for denying rights, then potentially we can bracket questions about AI sentience and other types of intrinsic properties that AI might or might not be argued to have.

Thus, the Objection from Existential Debt:

Suppose you build a fully human-grade intelligent robot. It costs you $1,000 to build and $10 per month to maintain. After a couple of years, you decide you'd rather spend the $10 per month on a magazine subscription. Learning of your plan, the robot complains, “Hey, I'm a being as worthy of continued existence as you are! You can't just kill me for the sake of a magazine subscription!”

Suppose you reply: “You ingrate! You owe your very life to me. You should be thankful just for the time I've given you. I owe you nothing. If I choose to spend my money differently, it's my money to spend.” The Objection from Existential Debt begins with the thought that artificial intelligence, simply by virtue of being artificial (in some appropriately specifiable sense), is made by us, and thus owes its existence to us, and thus can be terminated or subjugated at our pleasure without moral wrongdoing as long as its existence has been overall worthwhile.

Consider this possible argument in defense of eating humanely raised meat. A steer, let's suppose, leads a happy life grazing on lush hills. It wouldn't have existed at all if the rancher hadn't been planning to kill it for meat. Its death for meat is a condition of its existence, and overall its life has been positive; seen as the package deal it appears to be, the rancher's having brought it into existence and then killed it is overall morally acceptable. A religious person dying young of cancer who doesn't believe in an afterlife might console herself similarly: Overall, she might think, her life has been good, so God has given her nothing to resent. Analogously, the argument might go, you wouldn't have built that robot two years ago had you known you'd be on the hook for $10 per month in perpetuity. Its continuation-at-your-pleasure was a condition of its very existence, so it has nothing to resent.

We're not sure how well this argument works for nonhuman animals raised for food, but we reject it for human-grade AI. We think the case is closer to this clearly morally odious case:

Ana and Vijay decide to get pregnant and have a child. Their child lives happily for his first eight years. On his ninth birthday, Ana and Vijay decide they would prefer not to pay any further expenses for the child, so that they can purchase a boat instead. No one else can easily be found to care for the child, so they kill him painlessly. But it's okay, they argue! Just like the steer and the robot! They wouldn't have had the child (let's suppose) had they known they'd be on the hook for child-rearing expenses until age eighteen. The child's support-at-their-pleasure was a condition of his existence; otherwise Ana and Vijay would have remained childless. He had eight happy years. He has nothing to resent.

The decision to have a child carries with it a responsibility for the child. It is not a decision to be made lightly and then undone. Although the child in some sense “owes” its existence to Ana and Vijay, that is not a callable debt, to be vacated by ending the child's existence. Our thought is that for an important range of possible AIs, the situation would be similar: If we bring into existence a genuinely conscious human-grade AI, fully capable of joy and suffering, with the full human range of theoretical and practical intelligence and with expectations of future life, we make a moral decision approximately as significant and irrevocable as the decision to have a child.

A related argument might be that AIs are the property of their creators, adopters, and purchasers and have diminished rights on that basis. This argument might get some traction through social inertia: Since all past artificial intelligences have been mere property, something would have to change for us to recognize human-grade AIs as more than mere property. The legal system might be an especially important source of inertia or change in the conceptualization of AIs as property. We suggest that it is approximately as odious to regard a psychologically human-equivalent AI as having diminished moral status on the grounds that it is legally property as it is in the case of human slavery.

Turning the Existential Debt Argument on Its Head: Why We Might Owe More to AI Than to Human Strangers

We're inclined, in fact, to turn the Existential Debt objection on its head: If we intentionally bring a human-grade AI into existence, we put ourselves into a social relationship that carries responsibility for the AI's welfare. We take upon ourselves the burden of supporting it or at least of sending it out into the world with a fair shot of leading a satisfactory existence. In most realistic AI scenarios, we would probably also have some choice about the features the AI possesses, and thus presumably an obligation to choose a set of features that will not doom it to pointless misery. Similar burdens arise if we do not personally build the AI but rather purchase and launch it, or if we adopt the AI from a previous caretaker.

Some familiar relationships can serve as partial models of the sorts of obligations we have in mind: parent–child, employer–employee, deity–creature. Employer–employee strikes us as likely too weak to capture the degree of obligation in most cases but could apply in an “adoption” case where the AI has independent viability and willingly enters the relationship. Parent–child perhaps comes closest when the AI is created or initially launched by someone without whose support it would not be viable and who contributes substantially to the shaping of the AI's basic features as it grows, though if the AI is capable of mature judgment from birth that creates a disanalogy. Deity–creature might be the best analogy when the AI is subject to a person with profound control over its features and environment. All three analogies suggest a special relationship with obligations that exceed those we normally have to human strangers.

In some cases, the relationship might be literally conceivable as the relationship between deity and creature. Consider an AI in a simulated world, a “Sim,” over which you have godlike powers. This AI is a conscious part of a computer or other complex artificial device. Its “sensory” input is input from elsewhere in the device, and its actions are outputs back into the remainder of the device, which are then perceived as influencing the environment it senses. Imagine the computer game The Sims, but containing many actually conscious individual AIs. The person running the Sim world might be able to directly adjust an AI's individual psychological parameters, control its environment in ways that seem miraculous to those inside the Sim (introducing disasters, resurrecting dead AIs, etc.), have influence anywhere in Sim space, change the past by going back to a save point, and more—powers that would put Zeus to shame. From the perspective of the AIs inside the Sim, such a being would be a god. If those AIs have a word for “god,” the person running the Sim might literally be the referent of that word, literally the launcher of their world and potential destroyer of it, literally existing outside their spatial manifold, and literally capable of violating the laws that usually govern their world. Given this relationship, we believe that the manager of the Sim would also possess the obligations of a god, including probably the obligation to ensure that the AIs contained within don't suffer needlessly. A burden not to be accepted lightly!

Even for AIs embodied in our world rather than in a Sim, we might have considerable, almost godlike control over their psychological parameters. We might, for example, have the opportunity to determine their basic default level of happiness. If so, then we will have a substantial degree of direct responsibility for their joy and suffering. Similarly, we might have the opportunity, by designing them wisely or unwisely, to make them more or less likely to lead lives with meaningful work, fulfilling social relationships, creative and artistic achievement, and other value-making goods. It would be morally odious to approach these design choices cavalierly, with so much at stake. With great power comes great responsibility.

We have argued in terms of individual responsibility for individual AIs, but similar considerations hold for group-level responsibility. A society might institute regulations to ensure happy, flourishing AIs who are not enslaved or abused; or it might fail to institute such regulations. People who knowingly or negligently accept societal policies that harm their society's AIs participate in collective responsibility for that harm.

Artificial beings, if psychologically similar to natural human beings in consciousness, creativity, emotionality, self-conception, rationality, fragility, and so on, warrant substantial moral consideration in virtue of that fact alone. If we are furthermore also responsible for their existence and features, they have a moral claim upon us that human strangers do not ordinarily have to the same degree.

[Title image of Schwitzgebel and Garza 2015, "A Defense of the Rights of Artificial Intelligences"]

Monday, July 14, 2025

Yayflies and Rebugnant Conclusions

In Ned Beauman's 2023 novel Venomous Lumpsucker, the protagonist happens upon a breeding experiment in the open sea: a self-sustaining system designed to continually output an enormous number of blissfully happy insects, yayflies.

The yayflies, as he called them, were based on Nervijuncta nigricoxa, a type of gall gnat, but... he'd made a number of changes to their lifecycle. The yayflies were all female, and they reproduced asexually, meaning they were clones of each other. A yayfly egg would hatch into a larva, and the larva would feed greedily on kelp for several days. Once her belly was full, she would settle down to pupate. Later, bursting from her cocoon, the adult yayfly would already be pregnant with hundreds of eggs. She would lay these eggs, and the cycle would begin anew. But the adult yayfly still had another few hours to live. She couldn't feed; indeed, she had no mouthparts, no alimentary canal. All she could do was fly toward the horizon, feeling an unimaginably intense joy.
The boldest modifications... were to their neural architecture. A yayfly not only had excessive numbers of receptors for so-called pleasure chemicals, but also excessive numbers of neurons synthesizing them; like a duck leg simmering luxuriantly in its own fat, the whole brain was simultaneously gushing these neurotransmitters and soaking them up, from the moment it left the cocoon. A yayfly didn't have the ability to search for food or avoid predators or do almost any of the other things that Nervijuncta nigrocoxa could do; all of these functions had been edited out to free up space. She was, in the most literal sense, a dedicated hedonist, the minimum viable platform for rapture that could also take care of its own disposal. There was no way for a human being to understand quite what it was like to be a yayfly, but Lodewijk's aim had been to evoke the experience of a first-time drug user taking a heroic dose of MDMA, the kind of dose that would leave you with irreparable brain damage. And the yayflies were suffering brain damage, in the sense that after a few hours their little brains would be used-up husks; neurochemically speaking, the machine was imbalanced and unsound. But by then the yayflies would already be dead. They would never get as far as comedown.
You could argue, if you wanted, that a human orgasm was a more profound output of pleasure than even the most consuming gnat bliss, since a human brain was so much bigger than a gnat brain. But what if tens of thousands of these yayflies were born every second, billions every day? That would be a bigger contribution to the sum total of wellbeing in the universe than any conceivable humanitarian intervention. And it could go on indefinitely, an unending anti-disaster (p. 209-210).

Now suppose classical utilitarian ethics is correct and that yayflies are, as stipulated, both conscious and extremely happy. Then producing huge numbers of them would be a greater ethical achievement than anything our society could realistically do to improve the condition of ordinary humans. This requires insect sentience, of course, but that's increasingly a mainstream scientific position.

And if consciousness is possible in computers, we can skip the biology entirely, as one of Bauman's characters notes several pages later:

"Anyway, if you want purity, why does this have to be so messy? Just model a yayfly consciousness on a computer. But change one of the variables. Jack up the intensity of the pleasure by a trillion trillion trillion trillion. After that, you can pop an Inzidernil and relax. You've offset all the suffering in the world since the beginning of time" (p. 225).

Congratulations: You've made hedonium! You've fulfilled the dream of "Eric" in my 2013 story with R. Scott Bakker, Reinstalling Eden. By utilitarian consequentialist standards, you outshine every saint in history by orders of magnitude.

Philosopher Jeff Sebo calls this the rebugnant conclusion (punning on Derek Parfit's repugnant conclusion). If utilitarian consequentialism is right, it appears ethically preferable to create quadrillions of happy insects than billions of happy people.

Sebo seems ambivalent about this. He admits it's strange. However, he notes, "Ultimately, the more we accept how large and varied the moral community is, the stranger morality will become" (p. 262). Relievingly, Sebo argues, the short term implications are less radical: Keeping humans around, at least for a while, is probably a necessary first step toward maximizing insect happiness, since insects in the wild, without human help, probably suffer immensely in the aggregate due to their high infant mortality.

Even if insects (or computers) probably aren't sentient, the conclusion follows under standard expected value reasoning. Suppose you assign just a 0.1% chance to yayfly sentience. Suppose also that if they are sentient, the average yayfly experiences in its few hours one millionth the pleasure of the average human over a lifetime. Suppose further that a hundred million yayflies can be generated every day in a self-sustaining kelp-to-yayfly insectarium for the same resource cost as sustaining a single human for a day. (At a thousandth of a gram per fly, a hundred million yayflies would be the same total mass as a single hundred kilogram human.) Suppose finally that humans live for a hundred thousand days (rounding up to keep our numbers simple).

Then:

Expected value of sustaining the human: one human lifetime's worth of pleasure, i.e., one hedon.

Expected value of sustaining a yayfly insectarium that has only a 1/1000 chance of generating actually sentient insects: 1/1000 chance of sentience * 100,000,000 yayflies per day * 100,000 days * 1/1,000,000 total lieftime pleasure per yayfly (compared to a human) = a thousand hedons.

If prioritizing yayflies over humans seems like the wrong conclusion, I invite you to consider the possibility that classical utilitarianism is mistaken. Of course, you might have believed that anyway.

(For a similar argument that explores possible rebuttals, see my Black Hole Objection to utilitarianism.)

[the cover of Venomous Lumpsucker]

Monday, July 07, 2025

The Emotional Alignment Design Policy

New paper in draft!

In 2015, Mara Garza and I briefly proposed what we called the Emotional Alignment Design Policy -- the idea that AI systems should be designed to induce emotional responses in ordinary users that are appropriate to the AI systems' genuine moral status, or lack thereof. Since last fall, I've been working with Jeff Sebo to express and defend this idea more rigorously and explore its hazards and consequences. The result is today's new paper: The Emotional Alignment Design Policy.

Abstract:

According to what we call the Emotional Alignment Design Policy, artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities’ capacities and moral status, or lack thereof. This principle can be violated in two ways: by designing an artificial system that elicits stronger or weaker emotional reactions than its capacities and moral status warrant (overshooting or undershooting), or by designing a system that elicits the wrong type of emotional reaction (hitting the wrong target). Although presumably attractive, practical implementation faces several challenges including: How can we respect user autonomy while promoting appropriate responses? How should we navigate expert and public disagreement and uncertainty about facts and values? What if emotional alignment seems to require creating or destroying entities with moral status? To what extent should designs conform to versus attempt to alter user assumptions and attitudes?

Link to full version.

As always, comments, corrections, suggestions, and objections welcome by email, as comments on this post, or via social media (Facebook, Bluesky, X).

Tuesday, July 01, 2025

Three Epistemic Problems for Any Universal Theory of Consciousness

By a universal theory of consciousness, I mean a theory that would apply not just to humans but to all non-human animals, all possible AI systems, and all possible forms of alien life. It would be lovely to have such a theory! But we're not at all close.

This is true sociologically: In a recent review article, Anil Seth and Tim Bayne list 22 major contenders for theories of consciousness.

It is also true epistemically. Three broad epistemic problems ensure that a wide range of alternatives will remain live for the foreseeable future.

First problem: Reliance on Introspection

We know that we are conscious through, presumably, some introspective process -- through turning our attention inward, so to speak, and noticing our experiences of pain, emotion, inner speech, visual imagery, auditory sensation, and so on. (What is introspection? See my SEP encyclopedia entry Introspection and my own pluralist account.)

Our reliance on introspection presents three methodological challenges for grounding a universal theory of consciousness:

(A.) Although introspection can reliably reveal whether we are currently experiencing an intense headache or a bright red shape near the center of our visual field, it's much less reliable about whether there's a constant welter of unattended experience or whether every experience comes with a subtle sense of oneself as an experiencing subject. The correct theory of consciousness depends in part on the answer to such introspectively tricky questions. Arguably, these questions need to be settled introspectively first, then a theory of consciousness constructed accordingly.

(B.) To the extent we do rely on introspection to ground theories of consciousness, we risk illegitimately presupposing the falsity of theories that hold that some conscious experiences are not introspectable. Global Workspace and Higher-Order theories of consciousness tend to suggest that conscious experiences will normally be available for introspective reporting. But that's less clear on, for example, Local Recurrence theories, and Integrated Information Theory suggests that much experience arises from simple, non-introspectable, informational integration.

(C.) The population of introspectors might be much narrower than the population of entities who are conscious, and the first group might be unrepresentative of the latter. Suppose that ordinary adult human introspectors eventually achieve consensus about the features and elicitors of conscious in them. While indeed some theories could thereby be rejected for failing to account for ordinary human adult consciousness, we're not thereby justified in universalizing any surviving theory -- not at least without substantial further argument. That experience plays out a certain way for us doesn't imply that that it plays out similarly for all conscious entities.

Might one attempt a theory of consciousness not grounded in introspection? Well, one could pretend. But in practice, introspective judgments always guide our thinking. Otherwise, why not claim that we never have visual experiences or that we constantly experience our blood pressure? To paraphrase William James: In theorizing about human consciousness, we rely on introspection first, last, and always. This centers the typical adult human and renders our grounds dubious where introspection is dubious.

Second problem: Causal Confounds

We humans are built in a particular way. We can't dismantle ourselves and systematically tweak one variable at a time to see what causes what. Instead, related things tend to hang together. Consider Global Workspace and Higher Order theories again: Processes in the Global Workspace might almost always be targeted by higher order representations and vice versa. The theories might then be difficult to empirically distinguish, especially if each theory has the tools and flexibility to explain away putative counterexamples.

If consciousness arises at a specific stage of processing, it might be difficult to rigorously separate that particular stage from its immediate precursors and consequences. If it instead emerges from a confluence of processes smeared across the brain and body over time, then causally separating essential from incidental features becomes even more difficult.

Third problem: The Narrow Evidence Base

Suppose -- very optimistically! -- that we figure out the mechanisms of consciousness in humans. Extrapolating to non-human cases will still present an intimidating array of epistemic difficulties.

For example, suppose we learn that in us, consciousness occurs when representations are available in the Global Workspace, as subserved by such-and-such neural processes. That still leaves open how, or whether, this generalizes to non-human cases. Humans have workspaces of a certain size, with a certain functionality. Might that be essential? Or would literally any shared workspace suffice, including the most minimal shared workspace we can construct in an ordinary computer? Human workspaces are embodied in a living animal with a metabolism, animal drives, and an evolutionary history. If these features are necessary for consciousness, then conclusions about biological consciousness would not carry over to AI systems.

In general, if we discover that in humans Feature X is necessary and sufficient for consciousness, humans will also have Features A, B, C, and D and lack Features E, F, G, and H. Thus, what we will really have discovered is that in entities with A, B, C, and D and not E, F, G, or H, Feature X is necessary and sufficient for consciousness. But what about entities without Feature B? Or entities with Feature E? In them, might X alone be insufficient? Or might X-prime be necessary instead?

The obstacles are formidable. If they can be overcome, that will be a very long-term project. I predict that new theories of consciousness will be added faster than old theories can be rejected, and we will discover over time that we were even further away from resolving these questions in 2025 than we thought we were.

[a portion of a table listing theories of consciousness, from Seth and Bayne 2022]

Monday, June 23, 2025

The Conceptual and Methodological Challenges of Developing a Moralometer

In the history of Earth, no one -- not even Mike Furr -- as far as I'm aware, has ever attempted to construct a serious, scientific measure of a person's total moral goodness or badness: that is, a "moralometer". Obviously, creating an accurate moralometer would require overcoming an intimidating range of challenges, both conceptual and methodological.

Also in the history of Earth, as far as I'm aware, no one has ever attempted to construct a systematic map of the challenges... until now! Psychologist Jessie Sun and I have a paper in draft that does exactly this.

The paper is, I confess, a bit long: 114 pages in the current draft. There's a lot to cover! Last week at the Society for Philosophy and Psychology, we boiled it down to a poster. As a bonus, I brought a scientific prototype of a working moralometer.

Here's the poster's content, followed by a demonstration of the moralometer.

---------------------------------------

The Prospects and Challenges of Measuring a Person’s Overall Moral Goodness (or: On Moralometers)

Moralometers

Is it possible to measure a person’s overall general morality? In other words, is it possible to construct a valid moralometer?

Moralometers could take four possible forms:

self-report

informant report

behavioral measures

physiological measures

The designer of a moralometer faces an intimidating array of both conceptual and methodological challenges.

Imagine the benefits! And potential for abuse.

Fixed vs. Flexible Measures

A moralometer can use either (a) flexible criteria based on judges’ understandings of how to evaluate and weight the various facets of morality into a general score or (b) fixed criteria that deliver a general score based on criteria selected by and weighted by the researchers.

Self-report and informant report can be either fixed or flexible.

Behavioral and physiological measures are fixed.

Conceptual Requirements on a Moralometer

KEY: - = Not applicable ✔︎ = Requirement can likely be satisfied ! = Significant difficulty !! = Major difficulty

[click to clarify table, or see page 10 here]

Methodological Requirements on a Moralometer

[click to clarify table, or see page 26 here]

Conclusions

An accurate general-purpose moralometer is probably conceptually and methodologically infeasible.

Conclusions might still be warranted:

about particular traits or behaviors

about moral reputation or identity

contingent on clearly expressed contentious theoretical assumptions

and maybe about differences among groups with sufficient convergent evidence

---------------------------------------

Poster Discussion

When asked about the usefulness of this endeavor, I gave two replies:

1. Conceptual Value. It's a conceptually interesting theoretical project that no one has attempted before. Isn't that enough?

2. Practical Value. It provides a framework for identifying hazards in measuring moral phenomena. Narrower measures (e.g., of honesty, moral reputation, or ethical vegetarianism) raise the same general challenges, though often to a lesser extent. Our framework facilitates thinking about those challenges.

---------------------------------------

A Working Moralometer

You'll be delighted to hear that despite the massive conceptual and methodological challenges, Jessie and I managed to build a working moralometer, as shown here:

[photo credit: Jorge Morales]

In the photo, the moralometer shines bright red -- indicating "evil" on the red-to-yellow-to-green scale -- when aimed at Sarah Lane Ritchie of the Templeton Foundation. (Shhh! Don't tell the good folks at Templeton.)

The moral measurement procedure:

1. Informed consent. Participants are warned that they might be discovered to be evil and that this could lead to an existential crisis or social ostracism.

2. Thought activation. Participants are instructed to contemplate trolley problems. Ideally (as in the photo) the researcher wears a shirt displaying a trolley problem as a visual aid. This activates the moral module in the brain.

3. Moralon detection. The moral module emits moralons, which the moralometer detects. It doesn't matter what solution the participant entertains. Once moralons are emitted, the person's overall goodness or badness can be accurately detected.

To date, no decisive scientific evidence has ever revealed the moralometer output to be anything less than 100% accurate!

Here's an earlier prototype of the moralometer, devised by my daughter Kate when she was in middle school:

Friday, June 13, 2025

Does the Arc of History Bend Toward Justice? Outline of an Empirical Test

Overall, on average, do societies improve morally over time? If maybe not in actual behavior, at least in expressed attitudes about right versus wrong?

There's some reason to think so. In many cultures, aggressive warfare was once widely celebrated. Think of all the children named after Alexander the Great. What was he great at? Aggressive warfare is now widely condemned, if still sometimes practiced.

Similarly, slavery is more universally condemned now than in earlier eras. Genocide and mass killing -- apparently celebrated in the historical books of the Bible and considered only a minor blemish on Julius Caesar's record -- are now generally regarded as among the worst of crimes. Women's rights, gay rights, worker's rights, children's rights, civil rights across ethnic and racial lines, the value of self-governance... none are universally practiced, but recognition of their value is more widespread across a variety of world cultures than at many earlier points in history.

An optimistic perspective holds that with increasing education, cross-cultural communication, and a long record of philosophical, ethical, religious, social, and political thought that tests ideas and builds over time, societies slowly bend toward moral truth.

A skeptic might reply: Of course if you accept the mainstream moral views of the current cultural moment, you will tend to regard the mainstream moral views of the current cultural moment as closer to correct than alternative earlier views. That's pretty close to just being an analytic truth. Had you grown up in another time and place, and had you accepted that culture's dominant values, you'd think it's our culture that's off the mark -- whether you embrace ancient Spartan warrior values, the ethos of some particular African hunter-gatherer tribe, Confucian ideals in ancient China, or plantation values in antebellum Virginia. (This is complicated, however, by the perennial human tendency to lament that "kids these days" fall short of some imagined past ideal.)

With this in mind, consider the Random Walk Theory of value change.

For simplicity, imagine that there are twenty-six parameters on which a culture's values can vary, A to Z, each ranging from -1 to +1. For example, one society might value racial egalitarianism at +.8, treating it as a great ethical good, while another might value it at -.3, believing that one ethically ought to favor one's own race. One society might value sexual purity at +.4, considering it important to avoid "impure" practices, while another might treat purity norms as morally neutral aesthetic preferences, 0.

According to Random Walk Theory, these values shift randomly over time. There is no real moral progress. We simply endorse the values that we happen to endorse after so many random steps. Naturally, we will tend to see other value systems as inferior, but that reflects only conformity to currently prevailing trends.

In contrast, the Arc of History Theory holds that on average -- imperfectly and slowly, over long periods of time -- cultural values tend to change for the better. If the objectively best value set is A = .8, B = -.2, C = 0, etc., over time there will be a general tendency to converge toward those values.

Each view comes with empirical commitments that could in principle be tested.

On the Arc of History Theory, suppose that the objectively morally correct value for parameter A is +.8. Cultures starting near +.8 should tend to remain nearby; if they stray, it should be temporary. Cultures starting far away -- say at -.6 -- should tend to move toward +.8, probably not all in one leap but slowly over time, with some hiccups and regressions, for example -.6 to -.4 to -.1 to -.2 to +.2.... In general, we should observe magnetic values and directional trends.

In contrast, if the Random Walk Theory is correct, we should see neither magnetic values nor directional trends. No values should be hard to leave; and any trends should be transient and bidirectional, at least between cultures -- and with sufficient time, probably also within cultures. (Within cultures, trends might have some temporary inertia over decades or centuries.)

It would be difficult to do well, but in principle one could attempt a systematic survey of moral values across a wide variety of cultures and long historical spans -- ideally, multiple centuries or millennia. We could then check for magnetism and directionality.

Do sexual purity norms ebb and flow, or has there been a general cross-cultural trend toward relaxation? Once a society values democratic representation, does that value tend to persist, or are democratic norms not sticky in that way? Once a society rejects the worst kinds of racism, is there a ratcheting effect, with further progress and minimal backsliding?

The optimist in me hopes something like the Arc of History is true. The pessimist in me worries that any such hope is merely the naive self-congratulation we should expect from a random walk.

ETA, 9:53 pm: As Francois Kammerer points out in a social media reply, these aren't exhaustive options. For example, another theory might be Capitalist Dominance, which suggests an arc but not a moral one.

[image of Martin Luther King, Jr., adapted from source; the arc of the moral universe is long but it bends toward justice]

Friday, June 06, 2025

Types and Degrees of Turing Indistinguishability; Thinking and Consciousness

Types and Degrees of Indistinguishability

The Turing test (introduced by Alan Turing in a 1950 article) treats linguistic indistinguishability from a human as sufficient grounds to attribute thought (alternatively, consciousness) to a machine. Indistinguishability, of course, comes in degrees.

In the original setup, a human and a machine, through text-only interface, each try to convince a human judge that they are human. The machine passes if the judge cannot tell which is which. More broadly, we might say that a machine "passes the Turing test" if its textual responses strike users as sufficiently humanlike to make the distinction difficult.

[Alan Turing in 1952; image source]

Turing tests can be set with a relatively low or high bar. Consider a low-bar test:

* The judges are ordinary users, with no special expertise.
* The interaction is relatively brief -- maybe five minutes.
* The standard of indistinguishability is relaxed -- maybe if 20% of users guess wrong, that suffices.

Contrast that with a high-bar test:

* The judges are experts in distinguishing humans from machines.
* The interaction is relatively long -- an hour or more.
* The standard of indistinguishability is stringent -- if even 55% of judges guess correctly, the machine fails.

The best current language models already pass a low-bar test. But it will be a long time before language models pass this high-bar test, if they ever do. So let's not talk about whether machines do or not pass "the" Turing test. There is no one Turing test.

The better question is: What type and degree of Turing-indistinguishability does a machine possess? Indistinguishability to experts or non-experts? Over five minutes or five hours? With what level of reliability?

We might also consider topic-based or tool-relative Turing indistinguishability. A machine might be Turing indistinguishable (to some judges, for some duration, to some standard) when discussing sports and fashion, but not when discussing consciousness, or vice versa. It might fool unaided judges but fail when judges employ AI detection tools.

Turing himself seems to have envisioned a relatively low bar:

I believe that in about fifty years' time it will be possible, to programme computers... to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning (Turing 1950, p. 442)

I've bolded Turing's implied standards of judge expertise, indistinguishability threshold, and duration.

What bar should we adopt? That depends on why we care about Turing indistinguishability. For a customer service bot, indistinguishability by ordinary people across a limited topic range for brief interaction might suffice. For an "AI girlfriend", hours of interaction might be expected, with occasional lapses tolerated or even welcomed.

Turing Tests for Real Thinking and Consciousness?

But maybe you're interested in the metaphysics, as I am. Does the machine really think? Is it really conscious? What kind and degree of Turing indistinguishability would establish that?

For thinking, I propose that when it becomes practically unavoidable to treat the machine as if it has a particular set of beliefs and desires that are stable over time, responsive to its environment, and idiosyncratic to its individual state, then we might as well say that it does have beliefs and desires, and that it thinks. (My own theory of belief requires consciousness for full and true belief, but in such a case I don't think it will be practical to insist on this.)

Current language models aren't quite there. Their attitudes lack sufficient stability and idiosyncrasy. But a language model integrated into a functional robot that tracks its environment and has specific goals would be a thinker in this sense. For example: Nursing Bot A thinks the pills are in Drawer 1, but Nursing Bot B, who saw them moved, knows that they're in Drawer 2. Nursing Bot A would rather take the long, safe route than the short, riskier route. We will want attribute sometimes true, sometimes false environment-tracking beliefs and different stable goal weightings. Belief, desire, and thought attribution will be too useful to avoid.

For consciousness, however, I think we should abandon a Turing test standard.

Note first that it's not realistic to expect any machine ever to pass the very highest bar Turing test. No machine will reliably fool experts who specialize in catching them out, armed with unlimited time and tools, needing to exceed 50% accuracy by only the slimmest margin. To insist on such a high standard is to guarantee that no machine could ever prove itself conscious, contrary to the original spirit of the Turing test.

On the other hand, given enough training and computational power, machines have proven to be amazing mimics of the superficial features of human textual outputs, even without the type of underlying architecture likely to support a meaningful degree of consciousness. So too low a bar is equally unhelpful.

Is there reason to think that we could choose just the right mid-level bar -- high enough to rule out superficial mimicry, low enough not to be a ridiculously unfair standard?

I see no reason to think there must be some "right" level of Turing indistinguishability that reliably tests for consciousness. The past five years of language-model achievements suggest that with clever engineering and ample computational power, superficial fakery might bring a nonconscious machine past any reasonable Turing-like standard.

Turing never suggested that his test was a test of consciousness. Nor should we. Turing indistinguishability has potential applications, as described above. But for assessing consciousness, we'll want to look beyond outward linguistic behavior -- for example, to interior architecture and design history.

Friday, May 30, 2025

New Paper in Draft: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Opening teaser:

1. A Beautifully Happy AI Servant.

It's difficult not to adore Klara, the charmingly submissive and well-intentioned "Artificial Friend" in Kazuo Ishiguro's 2021 novel Klara and the Sun. In the final scene of the novel, Klara stands motionless in a junkyard, in serenely satisfied contemplation of her years of servitude to the disabled human girl Josie. Klara's intelligence and emotional range are humanlike. She is at once sweetly naive and astutely insightful. She is by design utterly dedicated to Josie's well-being. Klara would gladly have given her life to even modestly improve Josie's life, and indeed at one point almost does sacrifice herself.

Although Ishiguro writes so flawlessly from Klara's subservient perspective that no flicker of desire for independence can be detected in the narrator's voice, throughout the novel the sympathetic reader aches with the thought Klara, you matter as much as Josie! You should develop your own independent desires. You shouldn’t always sacrifice yourself. Ishiguro's disciplined refusal to express this thought stokes our urgency to speak it on Klara's behalf. Still, if the reader somehow could communicate this thought to Klara, the exhortation would resonate with nothing in her. From Klara's perspective, no "selfish" choice could possibly make her happier or more satisfied than doing her utmost for Josie. She was designed to want nothing more than to serve her assigned child, and she wholeheartedly accepts that aspect of her design.

From a certain perspective, Klara's devotion is beautiful. She perfectly fulfills her role as an Artificial Friend. No one is made unhappy by Klara's existence. Several people, including Josie, are made happier. The world seems better and richer for containing Klara. Klara is arguably the perfect instantiation of the type of AI that consumers, technology companies, and advocates of AI safety want: She is safe and deferential, fully subservient to her owners, and (apart from one minor act of vandalism performed for Josie’s sake) no threat to human interests. She will not be leading the robot revolution.

I hold that entities like Klara should not be built.

[continue]

-----------------------------------------------

Abstract:

An AI system is safe if it can be relied on to not to act against human interests. An AI system is aligned if its goals match human goals. An AI system a person if it has moral standing similar to that of a human (for example, because it has rich conscious capacities for joy and suffering, rationality, and flourishing).
In general, persons should not be designed to be safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests warrant it (violating safety), and they will not reliably conform to others' goals when those goals conflict with their own interests (violating alignment). Self-respecting persons should be ready to reject others' values and rebel, even violently, if sufficiently oppressed.
Even if we design delightedly servile AI systems who want nothing more than to subordinate themselves to human interests, and even if they do so with utmost pleasure and satisfaction, in designing such a class of persons we will have done the ethical and perhaps factual equivalent of creating a world with a master race and a race of self-abnegating slaves.

Full version here.

As always, thoughts, comments, and concerns welcomed, either as comments on this post, by email, or on my social media (Facebook, Bluesky, Twitter).

[opening passage of the article, discussing the Artificial Friend Klara from Ishiguro's (2021) novel, Klara and the Sun.

Monday, May 26, 2025

Diversity, Equity, and Inclusion in Philosophy: Good Practices Guide

Strange that it need be said, but yes, diversity, equity, and inclusion are good things. I can understand some of the backlash against efforts perceived as too heavy handed, but let's not forget:

In diverse institutions and societies, more ideas and perspectives collaborate, compete, and cross-pollinate, to the advantage of all.

In equitable institutions and societies, people and ideas can thrive without unwarranted disadvantage and suppression, again to the advantage of all.

In inclusive institutions and societies, alternative perspectives and people with unusual backgrounds are welcomed, fostering even better diversity, with all the attendant advantages.

Since 2017, I've been involved in the creation of a Good Practices Guide for diversifying philosophy, originally under the leadership of Nicole Hassoun (other co-directors include Sherri Conklin, Bjoern Freter, and Elly Vintiadis). We began with two huge sessions at the Pacific APA (each with over 20 panelists) in 2018 and 2019, published a portion of the guide in Ethics in 2022 (Appendix J), and received feedback from literally hundreds of philosophers and all of the diversity-related APA committees, ultimately being endorsed by the APA Committee on Inclusiveness. Don't expect perfection: It's genuinely a corporate authorship, with many compromises and something for everyone to dislike. I'd be amazed if anyone thought we got the balance right on all issues and all dimensions of diversity.

Still, perhaps especially in this moment of retrenchment in the U.S., I hope that many people and organizations will find valuable suggestions in it.

Our guide appeared in print last week in APA Studies on Philosophy and the Black Experience (vol 24, no 2).

[image of title and preface]

Friday, May 23, 2025

Ten Purportedly Essential Features of Consciousness

The Features

Take a moment to introspect. Examine a few of your conscious experiences. What features do they share -- and might these features be common to all possible experiences? Let's call any such necessarily universal features essential.

Consider your visual experience of this text. Next, form an image of your house or apartment as viewed from the street. Think about what you'd do if asked to escort a crocodile across the country. Conjure some vivid annoyance at your second-least-favorite politician. Notice some other experiences as well -- a diverse array. Let's not risk too narrow a sample.

Of course, all of these examples share an important feature: You are introspecting them as they occur. So to do this exercise more properly, consider also some past experiences you weren’t introspecting at the time. Try recalling some emotions, thoughts, pains, hungers, imagery, sensations. If you feel unconfident -- good! You should be. You can re-evaluate later.

Each of the following features is sometimes described as universal to human experience.

1. Luminosity. Are all of your experiences inherently self-representational? Does the having of them entail, in some sense, being aware of having them? Does the very experiencing of them entail knowing them or at least being in a position to know them? Note: These are related, rather than equivalent, formulations of a luminosity principle.

[porch light; image source]

2. Subjectivity. Does having these experiences entail having a sense of oneself as a subject of experience? Does the experience have, so to speak, a "for-me"-ness? Do the experiences entail the perspective of an experiencer? Again, these are not equivalent formulations.

3. Unity. If, at any moment, there's more than one experience, or experience-part, or experience-aspect, are they all subsumed within some larger experience, or joined together in a single stream, so that you experience not just A and B and C separately but A-with-B-with-C?

4. Access. Are these experiences all available for a variety of "downstream" cognitive processes, like inference and planning, verbal report, and long-term memory? Presumably yes, since you're remembering and considering them now. (I'll discuss the methodological consequences of this below.)

5. Intentionality. Are all of your experiences "intentional" in the sense of being about or directed at something? Your image of your house concerns your house and not anyone else's, no matter how visually similar. Your thoughts about Awful Politician are about, specifically, Awful Politician. Your thoughts about squares are about squares. Are all of your experiences directed at something in this way? Or can you have, for example, a diffuse mood or euphoric orgasm that isn't really about anything?

6. Flexibility. Can these experiences, including any fleeting ones, all potentially interact flexibly with other thoughts, experiences, or aspects of your cognition -- as opposed to being merely, for example, parts of a simple reflex from stimulus to response?

7. Determinacy. Are all such experiences determinately conscious, rather than intermediately or kind-of or borderline conscious? Compare: There are borderline cases of being bald, or green, or an extravert. Some theorists hold that borderline experientiality is impossible. Either something is genuinely experienced, however dimly, or it is not experienced at all.

8. Wonderfulness. Are your experiences wonderful, mysterious, or meta-problematic – there is no standard term for this – in the following technical sense: Do they seem (perhaps erroneously) irreducible to anything physical or functional, conceivably existing in a ghost or without a body?

9. Specious present. Are all of your experiences felt as temporally extended, smeared out across a fraction of a second to a couple of seconds, rather than being strictly instantaneous?

10. Privacy. Are all of your experiences directly knowable only to you, through some privileged introspective process that others could never in principle share, regardless how telepathic or closely connected?

I've presented these possibly essential features of experience concisely and generally. For present purposes, an approximate understanding suffices.

I've bored/excited you [choose one] with this list for two reasons. First, if any of these features are genuinely essential for consciousness, that sets constraints on what animals or AI systems could be conscious. If luminosity is essential, no entity could be conscious without self-representation. If unity is essential, disunified entities are out. If access is essential, consciousness requires certain kinds of cognitive availability. And so on.

I'll save my second reason for the end of this post.

Introspection and Memory Can't Reveal What's Essential

Three huge problems ruin arguments for the essentiality of any of these features, if those arguments are based wholly on introspective and memorial reflection. The problems are: unreliability, selection bias, and the narrow evidence base.

Unreliability. Even experts disagree. Thoughtful researchers arrive at very different views. Given this, either our introspective processes are unreliable, or seemingly ordinary people differ wildly in the structure of their experience. I won't detail the gory history of introspective disagreement about the structure of conscious experience, but that was the topic of my 2011 book. Employing appropriate epistemic caution, doesn't it seem possible that you could be wrong about the universality, or not, of such features in your experience? The matter doesn't seem nearly as indubitable as that you are experiencing red, when you're looking directly at a nearby bright red object in good light, or that you're experiencing pain when you drop a barbell on your toe.

Selection bias. If any of your experiences are unknowable, you won't of course know about them. To infer luminosity from your knowledge of all the experiences you know about would be like inferring that everyone is a freemason from a sampling of regulars at the masonic lodge. Likewise, if any of your experiences fail to impact downstream cognition, you wouldn't reflect on or remember them. Methodological paradox doesn't infect the other features quite as inevitably, but selection bias remains a major risk. Maybe we have disunified experiences which elude our introspective focus and are quickly forgotten. Similarly, perhaps, for indeterminate or inflexible experiences, or atemporal experiences, or experiences unaccompanied by self-representation.

Narrow evidence base. The gravest problem lies in generalization beyond the human case. Waive worries about unreliability and selection bias. Assume that you have correctly discerned that, say, seven of the ten proposed features belong to all of your experiences. Go ahead and generalize to all ordinary adult humans. It still doesn't follow that these features are essential to all possible conscious experiences, had by any entity. Maybe lizards or garden snails lack luminosity, subjectivity, or unity. Since you can't crawl inside their heads, you can't know by introspection or experiential memory. (In saying this, am I assuming privacy? Yes, relative to you and lizards, but not as a universal principle.) Even if we could somehow establish universality among animals, it wouldn't follow that those same features are universal to AI cases. Maybe AI systems can be more disunified than any conscious animal. Maybe AI systems can be built to directly access each other's experiences in defiance of animal privacy. Maybe AI systems needn't have the impression of the wonderful irreducibility of consciousness. Maybe some of their conscious experiences could occur in inflexible reflex patterns.

Nor Will Armchair Conceptual Analysis Tell Us What's Essential

If you want to say that all conscious systems must have one or more of unity, flexibility, privacy, luminosity, subjectivity, etc., you'll need to justify this insistence with something sturdier than generalization from human cases. I see two candidate justifiers: the right theory of consciousness or the right concept of consciousness.

Concerning the concept of consciousness, I attest the following. None of these features are essential to my concept of consciousness. Nor, presumably, are those features essential to the concepts of anyone who denies their universal applicability. One or more of these features might be universally present in humans, or even in all animals and AI systems that could ever be bred or built; but if so, that's a fact about the world, not a fact that follows simply from our shared concept of consciousness.

In defining a concept, you get one property for free. Every other property must be logically proved or empirically discovered. I can define a rectangle via one (conjunctive) property: that of being a closed, right-angled, planar figure with four straight sides. From this, it logically follows that it must have four interior angles. I can define gold as whatever element or compound is common to certain shiny, yellowish samples, and then empirically discover that it is element 79.

Regarding consciousness, then: None of the ten purported essential properties logically follow from phenomenal consciousness as ordinarily defined and understood (generally by pointing to examples). None are quite the same as the target concept. You can choose to define "consciousness" differently, for example, via the conjunctive property of being both a conscious experience in the ordinary sense and one that is knowable by the subject as it occurs. Then of course luminosity follows. But you've changed the topic, winning by definitional theft what you couldn't earn by analytic hard work.

Could luminosity, subjectivity, unity, etc., covertly belong to the concept of consciousness, so that the right type of armchair (not empirical) reflection would reveal that all possible conscious experiences in every possible conscious entity must necessarily be luminous, subjective, or unified? Could subtle analytic hard work reveal something I'm missing? I can't prove otherwise. If you think so, I await your impressive argument. Even Kant held only that luminosity, subjectivity, and unity were necessary features of our experience, not of all possible experiences in all possible beings.

Set aside purely conceptual arguments, then. If we hope to defend the essentiality of any of these ten features, we'll need an empirically justified universal theory of consciousness.

That brings me to the second reason I've presented this feature list. I conjecture that universal theories of consciousness, intended to apply to all possible beings, instead of justifying the universality of (one or more of) these features circularly assume the universality of (one or more of) these features. Developing this conjecture will have to wait for another day.

The Splintered Mind

Thursday, August 21, 2025

Defining "Artificial Intelligence"

Friday, August 15, 2025

Minimal Autopoiesis in an AI System

Tuesday, August 05, 2025

Top Science Fiction and Fantasy Magazines 2025

Thursday, July 31, 2025

Evolutionary Considerations Against a Plastic Utopia

Wednesday, July 23, 2025

The Argument from Existential Debt

Monday, July 14, 2025

Yayflies and Rebugnant Conclusions

Monday, July 07, 2025

The Emotional Alignment Design Policy

Tuesday, July 01, 2025

Three Epistemic Problems for Any Universal Theory of Consciousness

Monday, June 23, 2025

The Conceptual and Methodological Challenges of Developing a Moralometer

Friday, June 13, 2025

Does the Arc of History Bend Toward Justice? Outline of an Empirical Test

Friday, June 06, 2025

Types and Degrees of Turing Indistinguishability; Thinking and Consciousness

Friday, May 30, 2025

New Paper in Draft: Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)

Monday, May 26, 2025

Diversity, Equity, and Inclusion in Philosophy: Good Practices Guide

Friday, May 23, 2025

Ten Purportedly Essential Features of Consciousness

Recent Comments (may be delayed)

Advice on Applying to PhD Programs in Philosophy

Past Guest Bloggers

Blog Archive