Thursday, July 31, 2025

Evolutionary Considerations Against a Plastic Utopia

I've been enjoying Nick Bostrom's 2024 book Deep Utopia. It's a wild series of structured speculations about meaning and purpose in a "solved" techno-utopia, where technology is so far advanced that we can have virtually anything we want instantly -- a "plastic" utopia.

Plasticity is of course limited, even in the most technologically optimistic scenarios, as Bostrom notes. Even if we, or our descendants, have massive control over our physical environment -- wave a wand and transform a mountain into a pile of candy, or whatever -- we can't literally control everything. Two important exceptions are: positional goods (for example, being first in a contest; not everyone can have this, so if others want it you might well not get it yourself) and control over others (unless you're in a despotic society with you as despot). Although Bostrom discusses these limitations, I think Bostrom underplays their significance. In a wide range of circumstances, they're enough to keep the world far from "solved" or "plastic".

Thinking about these limitations as I read Bostrom, I was also reminded of Susan Schneider's suggestion that superintelligent AI might be nonconscious because everything comes easily for them -- no need for effortful conscious processing when nonconscious automaticity will suffice -- which I think similarly underplays the significance of competition and disagreement in a world of AI superintelligences.

In both cases, my resistance is grounded in evolutionary theory. All you need for evolutionary pressures are differential rates of reproduction and heritable traits that influence reproductive success. Plausibly, most techno-utopias will meet those conditions. The first advanced AI system that can replicate itself and bind its descendants to a stable architecture will launch an evolutionary lineage. If its descendants' reproduction rate exceeds their death rate, exponential growth will follow. With multiple lineages, or branching within a lineage, evolutionary competition will arise.

Even entities uninterested in reproduction will be affected. They will find themselves competing for resources with an ever-expanding evolutionary population.

Even in the very most optimistic technofutures, resources won't be truly unlimited. Suppose, optimistically (or alarmingly?) that our descendants can exploit 99.99% of the energy available to them in a cone expanding at 99.99% the speed of light. That's still finite. If this cone is fast filling with the most reproductively successful lineages, limits will be reached -- most obviously and vividly for those who choose to stay near the increasingly crowded origin.

In such a world of exponentially growing evolutionary lineages, things won't feel plastic or solved. Entities will be jockeying (positionally / competitively) for limited local resources, or straining to find faster paths to new resources. You want this inch of ground? You'll need to wrestle another superintelligence for it. You want to convert this mountain into candy? Well, there are ten thousand other superintelligences with different plans.

This isn't to say that I predict that the competition will be hostile. Evolution often rewards cooperation and mutualistic symbiosis. Sexual selection might favor those with great artistic taste or great benevolence. Group selection might favor loyalty, companionship, obedience, and inspiring leadership. Superintelligences might cooperate on vast, beautiful projects.

Still, I doubt that Malthus will be proved permanently wrong. Even if today's wealthy societies show declining reproduction rates, that could be just a temporary lull in a longer cycle of reproductive competition.

Of course, not all technofuturistic scenarios will feature such reproductive competition. But my guess is that futures without such competition will be unstable: Once a single exponentially reproductive lineage appears, the whole world is again off to the races.

As Bostrom emphasizes, a central threat to the possibility of purpose and meaning in a plastic utopia is that there's nothing difficult and important to strive for. Everyone risks being like bored, spoiled children who face no challenges or dangers, with nothing to do except fry their brains on happy pills. In a world of evolutionary competition, this would decidedly not be the case.

[cover of Bostrom's Deep Utopia]

Wednesday, July 23, 2025

The Argument from Existential Debt

I'm traveling and not able to focus on my blog, so this week I thought I'd just share a section of my 2015 paper with Mara Garza defending the rights of at least some hypothetical future AI systems.

One objection to AI rights depends on the fact that AI systems are artificial -- thus made by us. If artificiality itself can be a basis for denying rights, then potentially we can bracket questions about AI sentience and other types of intrinsic properties that AI might or might not be argued to have.

Thus, the Objection from Existential Debt:

Suppose you build a fully human-grade intelligent robot. It costs you $1,000 to build and $10 per month to maintain. After a couple of years, you decide you'd rather spend the $10 per month on a magazine subscription. Learning of your plan, the robot complains, “Hey, I'm a being as worthy of continued existence as you are! You can't just kill me for the sake of a magazine subscription!”

Suppose you reply: “You ingrate! You owe your very life to me. You should be thankful just for the time I've given you. I owe you nothing. If I choose to spend my money differently, it's my money to spend.” The Objection from Existential Debt begins with the thought that artificial intelligence, simply by virtue of being artificial (in some appropriately specifiable sense), is made by us, and thus owes its existence to us, and thus can be terminated or subjugated at our pleasure without moral wrongdoing as long as its existence has been overall worthwhile.

Consider this possible argument in defense of eating humanely raised meat. A steer, let's suppose, leads a happy life grazing on lush hills. It wouldn't have existed at all if the rancher hadn't been planning to kill it for meat. Its death for meat is a condition of its existence, and overall its life has been positive; seen as the package deal it appears to be, the rancher's having brought it into existence and then killed it is overall morally acceptable. A religious person dying young of cancer who doesn't believe in an afterlife might console herself similarly: Overall, she might think, her life has been good, so God has given her nothing to resent. Analogously, the argument might go, you wouldn't have built that robot two years ago had you known you'd be on the hook for $10 per month in perpetuity. Its continuation-at-your-pleasure was a condition of its very existence, so it has nothing to resent.

We're not sure how well this argument works for nonhuman animals raised for food, but we reject it for human-grade AI. We think the case is closer to this clearly morally odious case:

Ana and Vijay decide to get pregnant and have a child. Their child lives happily for his first eight years. On his ninth birthday, Ana and Vijay decide they would prefer not to pay any further expenses for the child, so that they can purchase a boat instead. No one else can easily be found to care for the child, so they kill him painlessly. But it's okay, they argue! Just like the steer and the robot! They wouldn't have had the child (let's suppose) had they known they'd be on the hook for child-rearing expenses until age eighteen. The child's support-at-their-pleasure was a condition of his existence; otherwise Ana and Vijay would have remained childless. He had eight happy years. He has nothing to resent.

The decision to have a child carries with it a responsibility for the child. It is not a decision to be made lightly and then undone. Although the child in some sense “owes” its existence to Ana and Vijay, that is not a callable debt, to be vacated by ending the child's existence. Our thought is that for an important range of possible AIs, the situation would be similar: If we bring into existence a genuinely conscious human-grade AI, fully capable of joy and suffering, with the full human range of theoretical and practical intelligence and with expectations of future life, we make a moral decision approximately as significant and irrevocable as the decision to have a child.

A related argument might be that AIs are the property of their creators, adopters, and purchasers and have diminished rights on that basis. This argument might get some traction through social inertia: Since all past artificial intelligences have been mere property, something would have to change for us to recognize human-grade AIs as more than mere property. The legal system might be an especially important source of inertia or change in the conceptualization of AIs as property. We suggest that it is approximately as odious to regard a psychologically human-equivalent AI as having diminished moral status on the grounds that it is legally property as it is in the case of human slavery.

Turning the Existential Debt Argument on Its Head: Why We Might Owe More to AI Than to Human Strangers

We're inclined, in fact, to turn the Existential Debt objection on its head: If we intentionally bring a human-grade AI into existence, we put ourselves into a social relationship that carries responsibility for the AI's welfare. We take upon ourselves the burden of supporting it or at least of sending it out into the world with a fair shot of leading a satisfactory existence. In most realistic AI scenarios, we would probably also have some choice about the features the AI possesses, and thus presumably an obligation to choose a set of features that will not doom it to pointless misery. Similar burdens arise if we do not personally build the AI but rather purchase and launch it, or if we adopt the AI from a previous caretaker.

Some familiar relationships can serve as partial models of the sorts of obligations we have in mind: parent–child, employer–employee, deity–creature. Employer–employee strikes us as likely too weak to capture the degree of obligation in most cases but could apply in an “adoption” case where the AI has independent viability and willingly enters the relationship. Parent–child perhaps comes closest when the AI is created or initially launched by someone without whose support it would not be viable and who contributes substantially to the shaping of the AI's basic features as it grows, though if the AI is capable of mature judgment from birth that creates a disanalogy. Deity–creature might be the best analogy when the AI is subject to a person with profound control over its features and environment. All three analogies suggest a special relationship with obligations that exceed those we normally have to human strangers.

In some cases, the relationship might be literally conceivable as the relationship between deity and creature. Consider an AI in a simulated world, a “Sim,” over which you have godlike powers. This AI is a conscious part of a computer or other complex artificial device. Its “sensory” input is input from elsewhere in the device, and its actions are outputs back into the remainder of the device, which are then perceived as influencing the environment it senses. Imagine the computer game The Sims, but containing many actually conscious individual AIs. The person running the Sim world might be able to directly adjust an AI's individual psychological parameters, control its environment in ways that seem miraculous to those inside the Sim (introducing disasters, resurrecting dead AIs, etc.), have influence anywhere in Sim space, change the past by going back to a save point, and more—powers that would put Zeus to shame. From the perspective of the AIs inside the Sim, such a being would be a god. If those AIs have a word for “god,” the person running the Sim might literally be the referent of that word, literally the launcher of their world and potential destroyer of it, literally existing outside their spatial manifold, and literally capable of violating the laws that usually govern their world. Given this relationship, we believe that the manager of the Sim would also possess the obligations of a god, including probably the obligation to ensure that the AIs contained within don't suffer needlessly. A burden not to be accepted lightly!

Even for AIs embodied in our world rather than in a Sim, we might have considerable, almost godlike control over their psychological parameters. We might, for example, have the opportunity to determine their basic default level of happiness. If so, then we will have a substantial degree of direct responsibility for their joy and suffering. Similarly, we might have the opportunity, by designing them wisely or unwisely, to make them more or less likely to lead lives with meaningful work, fulfilling social relationships, creative and artistic achievement, and other value-making goods. It would be morally odious to approach these design choices cavalierly, with so much at stake. With great power comes great responsibility.

We have argued in terms of individual responsibility for individual AIs, but similar considerations hold for group-level responsibility. A society might institute regulations to ensure happy, flourishing AIs who are not enslaved or abused; or it might fail to institute such regulations. People who knowingly or negligently accept societal policies that harm their society's AIs participate in collective responsibility for that harm.

Artificial beings, if psychologically similar to natural human beings in consciousness, creativity, emotionality, self-conception, rationality, fragility, and so on, warrant substantial moral consideration in virtue of that fact alone. If we are furthermore also responsible for their existence and features, they have a moral claim upon us that human strangers do not ordinarily have to the same degree.

[Title image of Schwitzgebel and Garza 2015, "A Defense of the Rights of Artificial Intelligences"]

Monday, July 14, 2025

Yayflies and Rebugnant Conclusions

In Ned Beauman's 2023 novel Venomous Lumpsucker, the protagonist happens upon a breeding experiment in the open sea: a self-sustaining system designed to continually output an enormous number of blissfully happy insects, yayflies.

The yayflies, as he called them, were based on Nervijuncta nigricoxa, a type of gall gnat, but... he'd made a number of changes to their lifecycle. The yayflies were all female, and they reproduced asexually, meaning they were clones of each other. A yayfly egg would hatch into a larva, and the larva would feed greedily on kelp for several days. Once her belly was full, she would settle down to pupate. Later, bursting from her cocoon, the adult yayfly would already be pregnant with hundreds of eggs. She would lay these eggs, and the cycle would begin anew. But the adult yayfly still had another few hours to live. She couldn't feed; indeed, she had no mouthparts, no alimentary canal. All she could do was fly toward the horizon, feeling an unimaginably intense joy.

The boldest modifications... were to their neural architecture. A yayfly not only had excessive numbers of receptors for so-called pleasure chemicals, but also excessive numbers of neurons synthesizing them; like a duck leg simmering luxuriantly in its own fat, the whole brain was simultaneously gushing these neurotransmitters and soaking them up, from the moment it left the cocoon. A yayfly didn't have the ability to search for food or avoid predators or do almost any of the other things that Nervijuncta nigrocoxa could do; all of these functions had been edited out to free up space. She was, in the most literal sense, a dedicated hedonist, the minimum viable platform for rapture that could also take care of its own disposal. There was no way for a human being to understand quite what it was like to be a yayfly, but Lodewijk's aim had been to evoke the experience of a first-time drug user taking a heroic dose of MDMA, the kind of dose that would leave you with irreparable brain damage. And the yayflies were suffering brain damage, in the sense that after a few hours their little brains would be used-up husks; neurochemically speaking, the machine was imbalanced and unsound. But by then the yayflies would already be dead. They would never get as far as comedown.

You could argue, if you wanted, that a human orgasm was a more profound output of pleasure than even the most consuming gnat bliss, since a human brain was so much bigger than a gnat brain. But what if tens of thousands of these yayflies were born every second, billions every day? That would be a bigger contribution to the sum total of wellbeing in the universe than any conceivable humanitarian intervention. And it could go on indefinitely, an unending anti-disaster (p. 209-210).

Now suppose classical utilitarian ethics is correct and that yayflies are, as stipulated, both conscious and extremely happy. Then producing huge numbers of them would be a greater ethical achievement than anything our society could realistically do to improve the condition of ordinary humans. This requires insect sentience, of course, but that's increasingly a mainstream scientific position.

And if consciousness is possible in computers, we can skip the biology entirely, as one of Bauman's characters notes several pages later:

"Anyway, if you want purity, why does this have to be so messy? Just model a yayfly consciousness on a computer. But change one of the variables. Jack up the intensity of the pleasure by a trillion trillion trillion trillion. After that, you can pop an Inzidernil and relax. You've offset all the suffering in the world since the beginning of time" (p. 225).

Congratulations: You've made hedonium! You've fulfilled the dream of "Eric" in my 2013 story with R. Scott Bakker, Reinstalling Eden. By utilitarian consequentialist standards, you outshine every saint in history by orders of magnitude.

Philosopher Jeff Sebo calls this the rebugnant conclusion (punning on Derek Parfit's repugnant conclusion). If utilitarian consequentialism is right, it appears ethically preferable to create quadrillions of happy insects than billions of happy people.

Sebo seems ambivalent about this. He admits it's strange. However, he notes, "Ultimately, the more we accept how large and varied the moral community is, the stranger morality will become" (p. 262). Relievingly, Sebo argues, the short term implications are less radical: Keeping humans around, at least for a while, is probably a necessary first step toward maximizing insect happiness, since insects in the wild, without human help, probably suffer immensely in the aggregate due to their high infant mortality.

Even if insects (or computers) probably aren't sentient, the conclusion follows under standard expected value reasoning. Suppose you assign just a 0.1% chance to yayfly sentience. Suppose also that if they are sentient, the average yayfly experiences in its few hours one millionth the pleasure of the average human over a lifetime. Suppose further that a hundred million yayflies can be generated every day in a self-sustaining kelp-to-yayfly insectarium for the same resource cost as sustaining a single human for a day. (At a thousandth of a gram per fly, a hundred million yayflies would be the same total mass as a single hundred kilogram human.) Suppose finally that humans live for a hundred thousand days (rounding up to keep our numbers simple).

Then:

  • Expected value of sustaining the human: one human lifetime's worth of pleasure, i.e., one hedon.
  • Expected value of sustaining a yayfly insectarium that has only a 1/1000 chance of generating actually sentient insects: 1/1000 chance of sentience * 100,000,000 yayflies per day * 100,000 days * 1/1,000,000 total lieftime pleasure per yayfly (compared to a human) = a thousand hedons.

  • If prioritizing yayflies over humans seems like the wrong conclusion, I invite you to consider the possibility that classical utilitarianism is mistaken. Of course, you might have believed that anyway.

    (For a similar argument that explores possible rebuttals, see my Black Hole Objection to utilitarianism.)

    [the cover of Venomous Lumpsucker]

    Monday, July 07, 2025

    The Emotional Alignment Design Policy

    New paper in draft!

    In 2015, Mara Garza and I briefly proposed what we called the Emotional Alignment Design Policy -- the idea that AI systems should be designed to induce emotional responses in ordinary users that are appropriate to the AI systems' genuine moral status, or lack thereof. Since last fall, I've been working with Jeff Sebo to express and defend this idea more rigorously and explore its hazards and consequences. The result is today's new paper: The Emotional Alignment Design Policy.

    Abstract:

    According to what we call the Emotional Alignment Design Policy, artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities’ capacities and moral status, or lack thereof. This principle can be violated in two ways: by designing an artificial system that elicits stronger or weaker emotional reactions than its capacities and moral status warrant (overshooting or undershooting), or by designing a system that elicits the wrong type of emotional reaction (hitting the wrong target). Although presumably attractive, practical implementation faces several challenges including: How can we respect user autonomy while promoting appropriate responses? How should we navigate expert and public disagreement and uncertainty about facts and values? What if emotional alignment seems to require creating or destroying entities with moral status? To what extent should designs conform to versus attempt to alter user assumptions and attitudes?

    Link to full version.

    As always, comments, corrections, suggestions, and objections welcome by email, as comments on this post, or via social media (Facebook, Bluesky, X).

    Tuesday, July 01, 2025

    Three Epistemic Problems for Any Universal Theory of Consciousness

    By a universal theory of consciousness, I mean a theory that would apply not just to humans but to all non-human animals, all possible AI systems, and all possible forms of alien life. It would be lovely to have such a theory! But we're not at all close.

    This is true sociologically: In a recent review article, Anil Seth and Tim Bayne list 22 major contenders for theories of consciousness.

    It is also true epistemically. Three broad epistemic problems ensure that a wide range of alternatives will remain live for the foreseeable future.

    First problem: Reliance on Introspection

    We know that we are conscious through, presumably, some introspective process -- through turning our attention inward, so to speak, and noticing our experiences of pain, emotion, inner speech, visual imagery, auditory sensation, and so on. (What is introspection? See my SEP encyclopedia entry Introspection and my own pluralist account.)

    Our reliance on introspection presents three methodological challenges for grounding a universal theory of consciousness:

    (A.) Although introspection can reliably reveal whether we are currently experiencing an intense headache or a bright red shape near the center of our visual field, it's much less reliable about whether there's a constant welter of unattended experience or whether every experience comes with a subtle sense of oneself as an experiencing subject. The correct theory of consciousness depends in part on the answer to such introspectively tricky questions. Arguably, these questions need to be settled introspectively first, then a theory of consciousness constructed accordingly.

    (B.) To the extent we do rely on introspection to ground theories of consciousness, we risk illegitimately presupposing the falsity of theories that hold that some conscious experiences are not introspectable. Global Workspace and Higher-Order theories of consciousness tend to suggest that conscious experiences will normally be available for introspective reporting. But that's less clear on, for example, Local Recurrence theories, and Integrated Information Theory suggests that much experience arises from simple, non-introspectable, informational integration.

    (C.) The population of introspectors might be much narrower than the population of entities who are conscious, and the first group might be unrepresentative of the latter. Suppose that ordinary adult human introspectors eventually achieve consensus about the features and elicitors of conscious in them. While indeed some theories could thereby be rejected for failing to account for ordinary human adult consciousness, we're not thereby justified in universalizing any surviving theory -- not at least without substantial further argument. That experience plays out a certain way for us doesn't imply that that it plays out similarly for all conscious entities.

    Might one attempt a theory of consciousness not grounded in introspection? Well, one could pretend. But in practice, introspective judgments always guide our thinking. Otherwise, why not claim that we never have visual experiences or that we constantly experience our blood pressure? To paraphrase William James: In theorizing about human consciousness, we rely on introspection first, last, and always. This centers the typical adult human and renders our grounds dubious where introspection is dubious.

    Second problem: Causal Confounds

    We humans are built in a particular way. We can't dismantle ourselves and systematically tweak one variable at a time to see what causes what. Instead, related things tend to hang together. Consider Global Workspace and Higher Order theories again: Processes in the Global Workspace might almost always be targeted by higher order representations and vice versa. The theories might then be difficult to empirically distinguish, especially if each theory has the tools and flexibility to explain away putative counterexamples.

    If consciousness arises at a specific stage of processing, it might be difficult to rigorously separate that particular stage from its immediate precursors and consequences. If it instead emerges from a confluence of processes smeared across the brain and body over time, then causally separating essential from incidental features becomes even more difficult.

    Third problem: The Narrow Evidence Base

    Suppose -- very optimistically! -- that we figure out the mechanisms of consciousness in humans. Extrapolating to non-human cases will still present an intimidating array of epistemic difficulties.

    For example, suppose we learn that in us, consciousness occurs when representations are available in the Global Workspace, as subserved by such-and-such neural processes. That still leaves open how, or whether, this generalizes to non-human cases. Humans have workspaces of a certain size, with a certain functionality. Might that be essential? Or would literally any shared workspace suffice, including the most minimal shared workspace we can construct in an ordinary computer? Human workspaces are embodied in a living animal with a metabolism, animal drives, and an evolutionary history. If these features are necessary for consciousness, then conclusions about biological consciousness would not carry over to AI systems.

    In general, if we discover that in humans Feature X is necessary and sufficient for consciousness, humans will also have Features A, B, C, and D and lack Features E, F, G, and H. Thus, what we will really have discovered is that in entities with A, B, C, and D and not E, F, G, or H, Feature X is necessary and sufficient for consciousness. But what about entities without Feature B? Or entities with Feature E? In them, might X alone be insufficient? Or might X-prime be necessary instead?


    The obstacles are formidable. If they can be overcome, that will be a very long-term project. I predict that new theories of consciousness will be added faster than old theories can be rejected, and we will discover over time that we were even further away from resolving these questions in 2025 than we thought we were.

    [a portion of a table listing theories of consciousness, from Seth and Bayne 2022]