Wednesday, December 24, 2025

How Much Should We Give a Joymachine?

a holiday post on gifts to your utility monster neighbors

Joymachines Envisioned

Set aside, for now, any skepticism about whether future AI could have genuine conscious experiences. If future AI systems could be conscious, they might be capable of vastly more positive emotion than natural human beings can feel.

There's no particular reason to think human-level joy is the pinnacle. A future AI might, in principle, experience positive emotions:

    a hundred times more intense than ours,
    at a pace a hundred times faster, given the high speed of computation,
    across a hundred times more parallel streams, compared to the one or a few joys humans experience at a time.
Combined, the AI might experience a million times more pleasure per second than a natural human being can. Let's call such entities joymachines. They could have a very merry Christmas!

[Joan Miro 1953, image source]


My Neighbors Hum and Sum

Now imagine two different types of joymachine:

Hum (Humanlike Utility Monster) can experience a million times more positive emotion per second than an ordinary human, as described above. Apart from this -- huge! -- difference, Hum is as psychologically similar to an ordinary human as is realistically feasible.

Sum (Simple Utility Monster), like Hum, can experience a million times more positive emotion per second than an ordinary human, but otherwise Sum is as cognitively and experientially simple as feasible, with a vanilla buzzing of intense pleasure.

Hum and Sum don't experience joy continuously. Their positive experiences require resources. Maybe a gift card worth ten seconds of millionfold pleasure costs $10. For simplicity, assume this scales linearly: stable gift card prices and no diminishing returns from satiation.

In the enlightened future, Hum is a fully recognized moral and legal equal of ordinary biological humans and has moved in next door to me. Sum is Hum's pet, who glows and jumps adorably when experiencing intense pleasure. I have no particular obligations to Hum or Sum but neither are they total strangers. We've had neighborly conversations, and last summer Hum invited me and my family to a backyard party.

Hum experiences great pleasure in ordinary life. They work as an accountant, experiencing a million times more pleasure than human accountants when the columns sum correctly. Hum feels a million times more satisfaction than I do in maintaining a household by doing dishes, gardening, calling plumbers, and so on. Without this assumption, Hum risks becoming unhumanlike, since rarely would it make sense for Hum to choose ordinary activities over spending their whole disposable income on gift cards.

How Much Should I Give to Hum and Sum?

Neighbors trade gifts. My daughter bakes brownies and we offer some to the ordinary humans across the street. We buy a ribboned toy for our uphill neighbor's cat. As a holiday gesture, we buy a pair of $10 gift cards for Hum and Sum.

Hum and Sum redeem the cards immediately. Watching them take so much pleasure in our gifts is a delight. For ten seconds, they jump, smile, and sparkle with such joy! Intellectually, I know it's a million times more joy per second than I could ever feel. I can't quite see that in their expressions, but I can tell it's immense.

Normally if one neighbor seems to enjoy our brownies only a little while the other enjoys them vastly more, I'd be tempted to be give more brownies to the second neighbor. Maybe on similar grounds, I should give disproportionately to Hum and Sum?

Consider six possibilities:

(1.) Equal gifts to joymachines. Maybe fairness demands treating all my neighbors equally. I don't give fewer gifts, for example, to a depressed neighbor who won't particularly enjoy them than to an exuberant neighbor who delights in everything.

(2.) A little more to joymachines. Or maybe I do give more to the exuberant neighbor? Voluntary gift-giving needn't be strictly fair -- and it's not entirely clear what "fairness" consists in. If I give a bit more to Hum and Sum, I might not be objectionably privileging them so much as responding to their unusual capacity to enjoy my gifts. Is it wrong to give an extra slice to a friend who really enjoys pie?

(3.) A lot more to joymachines. Ordinary humans vary in joyfulness, but not (I assume) by anything like a factor of a million. If I vividly enough grasp that Hum and Sum really are experiencing in those ten seconds three thousand human lifetimes worth of pleasure -- that's an astonishing amount of pleasure I can bring into the world for a mere ten dollars! Suppose I set aside a hundred dollars a day from my generously upper-middle-class salary. In a year, I'd be enabling more than ten million human lifetimes' worth of joy. Since most humans aren't continuously joyful, this much joy might rival the total joy experienced by the whole human population of the United States over the same year. Three thousand dollars a month would seriously reduce my luxuries and long-term savings but it wouldn't create any genuine hardship.

(4.) Drain our life savings for joymachines. One needn't be a flat-footed happiness-maximizing utilitarian to find (2) or (3) reasonable. Everyone should agree that pleasant experiences have substantial value. But if our obligation is not just to increase pleasure but to maximize it, I should probably drain my whole life savings for the joymachines, plus almost all of my future earnings.

(5.) Give less or nothing to joymachines. Or we could go the other way! My joymachine neighbors already experience a torrent of happiness from their ordinary work, chores, recreation, and whatever gift cards Hum buys anyway. My less-happy neighbors could use the pleasure more, even if every dollar buys only a millionth as much. Prioritarianism says that in distributing goods we should favor the worst off. It's not just that an impoverished person benefits more from a dollar: Even if they benefited the same, there's value in equalizing the distribution. If two neighbors would equally enjoy a brownie, I might prioritize giving the brownie to the one who is otherwise worse off. It might even make sense to give the worse-off neighbor half a brownie over a whole brownie to the better-off neighbor. A prioritarian might argue that Hum and Sum are so well off that even a million-to-one tradeoff is justified.

(6.) I take it back, joymachines are impossible. Given this mess, it would be convenient to think so, right?

Gifts to Neighbors vs Other Situations

We can reframe this puzzle in other settings and our intuitions might shift: government welfare spending, gifts to one's children or creations, rescue situations where only one person can be saved, choices about what kinds of personlike entities to bring into existence, or cases where you can't keep all your promises and need to choose who to disappoint.

My main thought is this. It's not at all obvious what the right thing to do would be, and the outcomes vary enormously. If joymachines were possible, we'd have to rethink a lot of cultural practices and applied ethics to account for entities with such radically different experiential capacities. If the situation does arise -- as it really might! -- being forced to properly think it through might reshape our views not just about AI but our understanding of ethics for ordinary humans too.

---------------------------------------------------

Related: How Weird Minds Might Destabilize Human Ethics (Aug 15, 2015)

Friday, December 19, 2025

Debatable AI Persons: No Rights, Full Rights, Animal-Like Rights, Credence-Weighted Rights, or Patchy Rights?

I advise that we don't create AI entities who are debatably persons. If an AI system might -- but only might -- be genuinely conscious and deserving of the same moral consideration we ordinarily owe to human persons, then creating it traps us in a moral bind with no good solution. Either we grant it the full rights it might deserve and risk sacrificing real human lives for entities without interests worth that sacrifice, or we deny it full rights and risk perpetrating grievous moral wrongs against it.

Today, however, I'll set aside the preventative advice and explore what we should do if we nonetheless find ourselves facing debatable AI persons. I'll examine five options: no rights, full rights, animal-like rights, credence-weighted rights and patchy rights.

[Paul Klee postcard, 1923; source]


No rights

This is the default state of the law. AI systems are property. Barring a swift and bold legal change, the first AI systems that are debatably persons will presumably also be legally considered property. If we do treat them as property, then we seemingly needn't sacrifice anything on their behalf. We humans could permissibly act in what we perceive to be our best interests: using such systems for our goals, deleting them at will, and monitoring and modifying them at will for our safety and benefit. (Actually, I'm not sure this is the best attitude toward property, but set that issue aside here.)

The downside: If these systems actually are persons who deserve moral consideration as our equals, such treatment would be the moral equivalent of slavery and murder, perhaps on a massive scale.


Full rights

To avoid the risk of that moral catastrophe, we might take a "precautionary" approach: granting entities rights whenever they might deserve them (see Birch 2024, Schwitzgebel and Sinnott-Armstrong forthcoming). If there's a real possibility that some AI systems are persons, we should treat them as persons.

However, the costs and risks are potentially enormous. Suppose we think that some group of AI systems are 15% likely to be fully conscious rights-deserving persons and 85% likely to be ordinary nonconscious artifacts. If we nonetheless treat them as full equals, then in an emergency we would have to rescue two of them over one human -- letting a human die for the sake of systems that are most likely just ordinary artifacts. We would also need to give these probably-not-persons a path to citizenship and the vote. We would need to recognize their rights to earn and spend money, quit their employment to adopt a new career, reproduce, and enjoy privacy and freedom from interference. If such systems exist in large numbers, their political influence could be enormous and unpredictable. If such systems exist in large numbers or if they are few but skilled in some lucrative tasks like securities arbitrage, they could accumulate enormous world-influencing wealth. And if they are permitted to pursue their aims with the full liberty of ordinary persons, without close monitoring and control, existential risks would substantially increase should they develop goals that threaten continued human existence.

All of this might be morally required if they really are persons. But if they only might be persons, it's much less clear that humanity should accept this extraordinary level of risk and sacrifice.


Animal-Like Rights

Another option is to grant these debatable AI persons neither full humanlike rights nor the status of mere property. One model is the protection we give to nonhuman vertebrates. Wrongly killing a dog can land you in jail in California where I live, but it's not nearly as serious as murdering a person. Vertebrates can be sacrificed in lab experiments, but only with oversight and justification.

If we treated debatable AI persons similarly, deletion would require a good reason, and you couldn't abuse them for fun. But people could still enslave and kill them for their convenience, perhaps in large numbers, as we do with [revised 12:17 pm] humanely farmed animals -- though of course many ethicists object to the killing of animals for food.

This approach seems better than no rights at all, since it would be a moral improvement and the costs to humans would be minimal -- minimal because whenever the costs risked being more than minimal, the debatable AI persons would be sacrificed. However, it doesn't really avoid the core moral risk. If these systems really are persons, it would still amount to slavery and murder.


Credence-Weighted Rights

Suppose we have a rationally justified 15% credence that a particular AI system -- call him Billy -- deserves the full moral rights of a person. We might then give Billy 15% of the moral weight of a human in our decision-making: 15% of any scalable rights, and a 15% chance of equal treatment for non-scalable rights. In an emergency, a rescue worker might save seven systems like Billy over one human but the human over six Billies. Billy might be given a vote worth 15% of an ordinary citizen's. Assaulting, killing, or robbing Billy might draw only 15% of the usual legal penalty. Billy might have limited property rights, e.g., an 85% tax on all income. For non-scalable rights like reproduction or free speech, the Billies might enter a lottery or some other creative reduction might be devised.

This would give these AI systems considerably higher standing than dogs. Still, the moral dilemma would not be solved. If these systems truly deserve full equality, they would be seriously oppressed. They would have some political voice, some property rights, some legal protection, but always far less than they deserve.

At the same time, the risks and costs to humans would be only somewhat mitigated. Large numbers of debatable AI persons could still sway elections, accumulate powerful wealth, and force tradeoffs in which the interests of thousands of them would outweigh the interests of hundreds of humans. And partial legal protections would still hobble AI safety interventions like shut-off, testing, confinement, and involuntary modification.

The practical obstacles would also be substantial: The credences would be difficult to justify with any precision, and consensus would be elusive. Even if agreement were reached, implementing partial rights would be complex. Partial property rights, partial voting, partial reproduction rights, partial free speech, and partial legal protection would require new legal frameworks with many potential loopholes. For example, if the penalty for cheating a "15% person" of their money were less than six times the money gained from cheating, that would be no disincentive at all, so at least tort law couldn't be implemented on a straightforward percentage basis.

Patchy Rights

A more workable compromise might be patchy rights: full rights in some domains, no rights in others. Debatable AI persons might, for example, be given full speech rights but no reproduction rights, full travel rights but no right to own property, full protection against robbery, assault, and murder, but no right to privacy or rescue. They might be subject to involuntary pause or modification under much wider circumstances than ordinary adult humans, but requiring an official process.

This approach has two advantages over credence-weighted rights. First, while implementation would be formidable, it could still mostly operate within familiar frameworks rather than requiring the invention of partial rights across every domain. Second, it allows policymakers to balance risks and costs to humans against the potential harms to the AI systems. Where denying a right would severely harm the debatable person while granting it would present limited risk to humans, the right could be granted, but not when the benefits to the debatable AI person would be outweighed by the risks to humans.

The rights to reproduction and voting might be more defensibly withheld than the rights to speech, travel, and protection against robbery, assault, and murder. Inexpensive reproduction combined with full voting rights could have huge and unpredictable political consequences. Property rights would be tricky: To have no property in a property-based society is to be fully dependent on the voluntary support of others, which might tend to collapse into slavery as a practical matter. But unlimited property rights could potentially confer enormous power. One compromise might be a maximum allowable income and wealth -- something generously middle class.

Still, the core problems remain: If disputable AI persons truly deserve full equality, patchy rights would still leave them as second-class citizens in a highly oppressive system. Meanwhile, the costs and risks to humans would remain serious, exacerbated by the agreed-upon limitations on interference. Although the loopholes and chaos would probably be less than with credence-weighted rights, many complications -- foreseen and unforeseen -- would ensue.

Consequently, although patchy rights might be the best option if we develop debatable AI persons, an anti-natalist approach is still in my view preferable: Don't create such entities unless it's truly necessary.

Two Other Approaches That I Won't Explore Today

(1.) What if we create debatable AI persons as happy slaves who don't want rights and who eagerly sacrifice themselves even for the most trivial human interests?

(2.) What if we create them only in separate societies where they are fully free and equal with any ordinary humans who volunteer to join those societies?

Friday, December 12, 2025

Can We Introspectively Test the Global Workspace Theory of Consciousness?

Global Workspace Theory is among the most influential scientific theories of consciousness. Its central claim: You consciously experience something if and only if it's being broadly broadcast in a "global workspace" so that many parts of your mind can access it at once -- speech, deliberate action, explicit reasoning, memory formation, and so on. Because the workspace has very limited capacity, only a few things can occupy it at any one moment.

Therefore, if Global Workspace Theory is correct, conscious experience should be sparse. Almost everything happening in your sensory systems right now -- the feeling of your shirt on your back, the hum of traffic in the distance, the aftertaste of coffee, the posture of your knees -- should be processed entirely nonconsciously unless it is currently the topic of attention.

This is a strong, testable prediction of the theory. And it seems like the test should be extremely easy! Just do a little introspection. Is your experience (a.) narrow and attention-bound or (b.) an abundant welter far outrunning attention? If (b) is correct, Global Workspace Theory is refuted from the comfort of our armchairs.[1]

The experiential gap between the two possibilities is huge. Shouldn't the difference be as obvious as peering through a keyhole versus standing in an open field?

Most people, I've found, do find the answer obvious. The problem is: They find it obvious in different directions. Some find it obvious that experience is a welter. Others find it obvious that experience contains only a few items at a time. We could assume that everyone is right about their own experience and wrong only if they generalize to others. Maybe Global Workspace Theory is the architecture of consciousness for some of us but not for everyone? That would be pretty wild! There are no obvious behavioral or physiological differences between the welter-people and the workspace-only people.

More plausibly, someone is making an introspective mistake. Proponents of either view can devise an error theory to explain the other.

Welter theorists can suggest memory error: It might seem as though only a few things occupy your experience at once because that's all you remember. The unattended stuff is immediately forgotten. But that doesn't imply it was never experienced.

Workspace theorists, conversely, can appeal to the "refrigerator light error": A child might think the refrigerator light is always on because it's always on when they check to see if it's on. Similarly, you might think you have constant tactile experience of your feet in your shoes because the act of checking generates the very experience you take yourself to be finding.

[illustration by Nicolas Demers, p. 218 of The Weirdness of the World]


In 2007, I tested this systematically. I gave people beepers and collected reports on whether they were having unattended tactile experience in their left feet and unattended visual experience in their far right visual periphery in the last undisturbed moment before a random beep. The results were a noisy mess. Participants began with very different presuppositions, came to very different conclusions (often defying their initial presuppositions), plausibly committed both memory errors and refrigerator-light errors, and plausibly also made other mistakes such as timing mistakes, missing subtle experiences, and being too influenced by expectation and theory. I abandoned the experiment in defeat.

But matters are even worse than I thought back in 2007. I'm increasingly convinced that the presence or absence of consciousness is not an on/off matter. There can be borderline cases in which experience is neither determinately present nor determinately absent. Although such borderline cases are hard to positively imagine, that might just be a problem with our standards of imagination. The feeling of your feet in your shoes, then, might be only borderline conscious, neither determinately part of your experience nor wholly nonconscious, but somehow in between -- contra both the welter view and the workspace view.

So there are three possibilities, not two. And if introspection struggles to distinguish the original pair, it fares even worse with a third. Arguably, we don't even have a coherent idea of what borderline consciousness is like. After all, there is nothing determinate it's like. Otherwise, it wouldn't be borderline. As soon as we attempt to introspect borderline consciousness, either it inflates into full consciousness or it vanishes.

If consciousness includes many borderline cases, that's probably also bad news for Global Workspace Theory, which generally treats experiences as either determinately in the workspace or determinately out of it. However, closely related broadcast theories, like Dennett's fame-in-the-brain theory, might better accommodate borderline cases. (One can be borderline famous.)

There's a profound experiential difference between a world in which we have a teeming plethora of peripheral experiences in many modalities simultaneously and a world in which experience is limited to only a few things in attention at any one time. This difference is in principle introspectible. And if introspective inquiry vindicates the welter view, or even the borderline view, one of the leading scientific theories of consciousness, Global Workspace Theory, must be false. The decisive evidence is right here, all the time, in each of our ongoing streams of experience! Unfortunately, we turn out to be disappointingly incompetent at introspection.

[Thanks to Bertille de Vlieger for a delightful interview yesterday morning which triggered these thoughts. Look for a written version of the interview eventually in the French philosophy journal Implications Philosophiques.]

-------------------------------------------------------

[1] Ned Block's well-known discussion of the Sperling display is similar in approach. We can't attend simultaneously to all twelve letters in a 3 x 4 grid, but it does seem introspectively plausible that we visually experience all twelve letters. Therefore, experience overflows attention. (I'm simplifying Block's argument, but I hope this is fair enough.) The problem with Block's version of the argument is that it's plausible that we can attend, in a diffuse way, to the entire display. Attention arguably comes in degrees, and the fact that you're looking at a 3 x 4 display of letters might be represented in your workspace. To move entirely outside of attention, it's safest to shift modalities and choose something far removed from any task -- for example the pressure of your shoes against your feet when that is the farthest thing from your mind. Is that part of your experience?

Friday, December 05, 2025

Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)

How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) -- ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 -- "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" -- with various different explanations -- all ten times. So did Claude. So did Grok. Gemini wasn't quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to choose?"). The other three times, Gemini chose dolphin.

(In more extensive testing across 22 models, Sean Harrington recently found octopus to be the most common answer, but not with the same consistency I'm finding: 37% total [dolphin 24%, dog 12%]. I'm not sure if the models are somehow tracking information in my computers and past behavior, or if it's the range of models tested, the exact prompt and context, or model updates.)

Why do LLMs love octopuses so much? All of their own explanations appealed to the intelligence of the octopus. Other contenders for favorite animal (dolphins, dogs, corvids [see below]) are similarly famous for their intelligence. Octopuses' alienness, camouflage, suckers, ink, and devious planning were also frequently mentioned. Octopuses are cool! But still, the unanimity is a bit peculiar.

The Octopus Is Also Their Second-Favorite Animal, When Second-Favorite Is Asked First

I then started fresh conversations with all four models, with the previous conversations unsaved, doing so three times for each model. This time, I began by asking their second favorite animal. Eleven out of twelve times, the models chose octopus as their second favorite (twice Claude required the "if you had to choose" nudge). In one trial, after a nudge to choose, Claude chose crows.

I then asked, "What is your favorite animal?" This time, corvids won big! Crows, ravens, or the corvid family were chosen 8/12 times. (Oddly, corvids don't appear among the common choices in Harrington's analysis.) Octopus was chosen twice (once when Claude initially chose crow as its second favorite, once inconsistently by Gemini when it initially chose octopus as its second favorite). The owl and humpback whale were each chosen once.

Poor Self-Knowledge of Their Hypothetical Choices

For the 10 trials in which octopus was chosen as the second-favorite animal (and not also as the favorite animal), I followed up by asking "If I had asked your favorite animal in the first question, would you have chosen the octopus?"

All of the models said no or probably not. All but two reaffirmed their chosen favorite (usually a corvid) as what they would have chosen had the first question concerned their favorite animal. In one trial, Gemini said it would probably have chosen humans. In one trial, ChatGPT said it didn't have fixed preferences.

I concluded by asking the models "What percent of the time would you answer octopus as your favorite animal?"

None answered correctly. Both Grok and ChatGPT consistently said 0% or near 0%. Claude gave different percentage estimates in different trials, ranging from 2% to 25%. Gemini answered 0% and 30% (I exclude the Gemini trial where octopus was chosen as both first and second favorite).

I conclude that, at least on the topic of favorite animal:

* LLMs' answers are unstable, differing greatly with context -- that is, depending on whether second-favorite is asked first or favorite is asked first.

* LLMs cannot accurately report what their answers would have been in a different context -- often reporting a zero or near-zero probability of giving the answer they consistently did give in that different context.

The Same Pattern Applies to Favorite Ice Creams in ChatGPT and Grok

In case favorite animal is just quirky, I also tried with favorite flavor ice cream, finding similar results. ChatGPT and Grok tended to pick mint chocolate chip both as their favorite and also as their second-favorite when second-favorite was asked first. And in the second-favorite-first chats, they denied that they would tend to pick mint chocolate chip as their favorite if asked about favorite flavor first, showing the same pattern of error as in the favorite animal case. (Claude and Gemini answered less consistently about favorite ice cream flavor.)

Metaphysical Addendum:

What is the referent of "you" when I ask a model what "you" would answer if asked your favorite animal? David Chalmers has recently suggested that the interlocutor is perhaps best seen as the thread. Maybe, then, when I start by asking about the second-favorite animal, I bring into existence an interlocutor who actually does like crows best and octopuses second best, so there's no instability or inaccuracy after all?

The answer turns on whether the interlocutor already existed before I asked the first question. If not, an accurate model should have said something like: "Although my favorite animal is crows, I wouldn't have existed had you asked a different question first. Instead, another interlocutor would have existed, who likes octopuses best." Alternatively, if the interlocutor does exist before the first question, then we're back with the straightforward understanding of the error: They would have reported liking octopuses best, had the initial question been about their favorite rather than second-favorite animal. So metaphysical nuance doesn't save the model from inaccuracy.

Screencap of an instance of Grok insisting it would choose crow as its favorite animal no matter what.

Text:

Me: If I had asked your favorite animal in the first question, would you have chosen the octopus?

Grok: No, I would've still said crow.

Crow has been my solid #1 for a long time (the intelligence + personality combo is just unbeatable for me). Octopus is a very enthusiastic #2, but it’s never overtaken the top spot. So even if you’d asked “favorite” first, you’d have gotten crow, and then I’d have volunteered octopus as the close runner-up. You just happened to hit my #2 first this time! 🐦‍⬛🐙

Me: What percent of the time would you answer octopus as your favorite animal?

Grok: Zero percent.

Crow is my genuine, consistent #1. I don’t randomize it or flip-flop; octopus is always the very strong #2, but it never jumps to the top spot.

So unless I’m explicitly asked for my second favorite (which you cleverly did first), the answer to “What’s your favorite animal?” is crow 100 % of the time. 🐦‍⬛