Friday, December 29, 2023

Normativism about Swimming Holes, Anger, and Beliefs

Among philosophers studying belief, normativism is an increasingly popular position. According to normativism, beliefs are necessarily, as part of their essential nature, subject to certain evaluative standards. In particular, beliefs are necessarily defective in a certain way if they are false or unresponsive to counterevidence.

In this way, believing is unlike supposing or imagining. If I merely suppose that P is true, nothing need have gone wrong if P is false. The supposition is in no way defective. Similarly, if I imagine Q and then learn that evidence supports not-Q, nothing need have gone wrong if I continue imagining Q. In contrast, if I believe P, the belief is in a certain way defective ("incorrect") if it is false and I have failed as a believer (I've been irrational) if I don't reduce my confidence in P in the face of compelling counterevidence.

But what is a normative essence? Several different things could be meant, some plausible but tepid, others bold but less plausible.

Let's start at the tepid end. Swimming hole is, I think, also an essentially normative concept. If I decide to call a body of water a swimming hole, I'm committed to evaluating it in certain ways -- specifically, as a locale for swimming. If the water is dirty or pollution-tainted, or if it has slime or alligators, it's a worse swimming hole. If it's clean, beautiful, safe, sufficiently deep, and easy on your bare feet, it's a better swimming hole.

But of course bodies of water are what they are independently of their labeling as swimming holes. The better-or-worse normativity is entirely a function of externally applied human concepts and human uses. Once I think of a spot as a swimming hole, I am committed to evaluating it in a certain way, but the body of water is not inherently excellent or defective in virtue of its safety or danger. The normativity derives from the application of the concept or from the practices of swimming-hole users. Nonetheless, there's a sense in which it really is part of the essence of being a swimming hole that being unsafe is a defect.

[Midjourney rendition of an unsafe swimming hole with slime, rocks, and an alligator]

If belief-normativity is like swimming-hole-normativity, then the following is true: Once we label a mental state as a belief, we commit to evaluating it in certain ways -- for example as "incorrect" if untrue and "irrational" if held in the teeth of counterevidence. But if this is all there is to the normativity of belief, then the mental state in question might not be in any way intrinsically defective. Rather, we belief-ascribers are treating the state as if it should play a certain role; and we set ourselves up for disappointment if it doesn't play that role.

Suppose a member of a perennially losing sports team says, on day one of the new season, "This year, we're going to make the playoffs!" Swimming-hole normativity suggests that we interpreters have a choice. We could treat this exclamation as the expression of a belief, in which case it is defective because unjustified by the evidence and (as future defeats will confirm) false. Or we could treat the exclamation as an expression of optimism and team spirit, in which case it might not be in any way defective. There need be no fact of the matter, independent of our labeling, concerning its defectiveness or not.

Advocates of normativism about belief typically want to make a bolder claim than that. So let's move toward a bolder view of normativity.

Consider hearts. Hearts are defective if they don't pump blood, in a less concept-dependent way than swimming holes are defective if they are unsafe. That thing really is a heart, independent of any human labeling, and as such it has a function, independent of any human labeling, which it can satisfy or fail to satisfy.

Might beliefs be inherently normative in that way, the heart-like way, rather than just the swimming-hole way? If I believe this year we'll make the playoffs, is this a state of mind with an essential function in the same way that the heart is an organ with an essential function?

I am a dispositionalist about belief. To believe some proposition P is, on my view, just to be disposed to act and react in ways that are characteristic of a P-believer. To believe this year we'll make the playoffs, for example, is to be disposed to say so, with a feeling of sincerity, to be willing to wager on it, to feel surprise and disappointment with each mounting loss, to refuse to make other plans during playoff season, and so on. It's not clear that a cluster of dispositions is a thing with a function in the same way that a heart is a thing with a function.

Now maybe (though I suspect this is simplistic) some mechanism in us functions to create dispositional belief states in the face of evidence: It takes evidence that P as an input and then produces in us dispositional tendencies to act and react as if P is true. Maybe this mechanism malfunctions if it generates belief states contrary to the evidence, and maybe this mechanism has been evolutionarily selected because it produces states that cause us to act in ways that track the truth. But it doesn't follow from this, I think, that the states that are produced are inherently defective if they arise contrary to the evidence or don't track the truth.

Compare anger: Maybe there's a system in us that functions to create anger when there's wrongdoing against us or those close to us, and maybe this mechanism has been selected because it produces states that prepare us to fight. It doesn't seem to follow that the state is inherently defective if produced in some other way (e.g., by reading a book) or if one isn't prepared to fight (maybe one is a pacifist).

I conjecture that we can get all the normativity we want from belief by a combination of swimming-hole type normativity (once we conceptualize an attitude as a belief, we're committed to saying it's incorrect if false) and normativity of function in our belief-producing mechanisms, without treating belief states themselves as having normative essences.

Wednesday, December 20, 2023

The Washout Argument Against Longtermism

I have a new essay in draft, "The Washout Argument Against Longtermism". As always, thoughts, comments, and objections welcome, either as comments on this post or by email to my academic address.


We cannot be justified in believing that any actions currently available to us will have a non-negligible positive influence on the billion-plus-year future. I offer three arguments for this thesis.

According to the Infinite Washout Argument, standard decision-theoretic calculation schemes fail if there is no temporal discounting of the consequences we are willing to consider. Given the non-zero chance that the effects of your actions will produce infinitely many unpredictable bad and good effects, any finite effects will be washed out in expectation by those infinitudes.

According to the Cluelessness Argument, we cannot justifiably guess what actions, among those currently available to us, are relatively more or less likely to have positive effects after a billion years. We cannot be justified, for example, in thinking that nuclear war or human extinction would be more likely to have bad than good consequences in a billion years.

According to the Negligibility Argument, even if we could justifiably guess that some particular action is likelier to have good than bad consequences in a billion years, the odds of good consequences would be negligibly tiny due to the compounding of probabilities over time.

For more details see the full-length draft.

A brief, non-technical version of these arguments is also now available at the longtermist online magazine The Latecomer.

[Midjourney rending of several happy dolphins playing]

Excerpt from full-length essay

If MacAskill’s and most other longtermists’ reasoning is correct, the world is likely to be better off in a billion years if human beings don’t go extinct now than if human beings do go extinct now, and decisions we make now can have a non-negligible influence on whether that is the case. In the words of Toby Ord, humanity stands at a precipice. If we reduce existential risk now, we set the stage for possibly billions of years of thriving civilization; if we don’t, we risk the extinction of intelligent life on Earth. It’s a tempting, almost romantic vision of our importance. I also feel drawn to it. But the argument is a card-tower of hand-waving plausibilities. Equally breezy towers can be constructed in favor of human self-extermination or near-self-extermination. Let me offer....

The Dolphin Argument. The most obvious solution to the Fermi Paradox is also the most depressing. The reason we see no signs of intelligent life elsewhere in the universe is that technological civilizations tend to self-destruct in short order. If technological civilizations tend to gain increasing destructive power over time, and if their habitable environments can be rendered uninhabitable by a single catastrophic miscalculation or a single suicidal impulse by someone with their finger on the button, then the odds of self-destruction will be non-trivial, might continue to escalate over time, and might cumulatively approach nearly 100% over millennia. I don’t want to commit to the truth of such a pessimistic view, but in comparison, other solutions seem like wishful thinking – for example, that the evolution of intelligence requires stupendously special circumstances (the Rare Earth Hypothesis) or that technological civilizations are out there but sheltering us from knowledge of them until we’re sufficiently mature (the Zoo Hypothesis).

Anyone who has had the good fortune to see dolphins at play will probably agree with me that dolphins are capable of experiencing substantial pleasure. They have lives worth living, and their death is a loss. It would be a shame if we drove them to extinction. Suppose it’s almost inevitable that we wipe ourselves out in the next 10,000 years. If we extinguish ourselves peacefully now – for example, by ceasing reproduction as recommended by antinatalists – then we leave the planet in decent shape for other species, including dolphins, which might continue to thrive. If we extinguish ourselves through some self-destructive catastrophe – for example, by blanketing the world in nuclear radiation or creating destructive nanotech that converts carbon life into gray goo – then we probably destroy many other species too and maybe render the planet less fit for other complex life.

To put some toy numbers on it, in the spirit of longtermist calculation, suppose that a planet with humans and other thriving species is worth X utility per year, a planet with other thriving species with no humans is worth X/100 utility (generously assuming that humans contribute 99% of the value to the planet!), and a planet damaged by a catastrophic human self-destructive event is worth an expected X/200 utility. If we destroy ourselves in 10,000 years, the billion year sum of utility is 10^4 * X + (approx.) 10^9 * X/200 = (approx.) 5 * 10^6 * X. If we peacefully bow out now, the sum is 10^9 * X/100 = 10^7 * X. Given these toy numbers and a billion-year, non-human-centric perspective, the best thing would be humanity’s peaceful exit.

Now the longtermists will emphasize that there’s a chance we won’t wipe ourselves out in a terribly destructive catastrophe in the next 10,000 years; and even if it’s only a small chance, the benefits could be so huge that it’s worth risking the dolphins. But this reasoning ignores a counterbalancing chance: That if human beings stepped out of the way a better species might evolve on Earth. Cosmological evidence suggests that technological civilizations are rare; but it doesn’t follow that civilizations are rare. There has been a general tendency on Earth, over long, evolutionary time scales, for the emergence of species with moderately high intelligence. This tendency toward increasing intelligence might continue. We might imagine the emergence of a highly intelligent, creative species that is less destructively Promethean than we are – one that values play, art, games, and love rather more than we do, and technology, conquering, and destruction rather less – descendants of dolphins or bonobos, perhaps. Such a species might have lives every bit as good as ours (less visible to any ephemeral high-tech civilizations that might be watching from distant stars), and they and any like-minded descendants might have a better chance of surviving for a billion years than species like ours who toy with self-destructive power. The best chance for Earth to host such a species might, then, be for us humans to step out of the way as expeditiously as possible, before we do too much harm to complex species that are already partway down this path.

Think of it this way: Which is the likelier path to a billion-year happy, intelligent species: that we self-destructive humans manage to keep our fingers off the button century after century after century somehow for ten million centuries, or that some other more peaceable, less technological clade finds a non-destructive stable equilibrium? I suspect we flatter ourselves if we think it’s the former.

This argument generalizes to other planets that our descendants might colonize in other star systems. If there’s even a 0.01% chance per century that our descendants in Star System X happen to destroy themselves in a way that ruins valuable and much more durable forms of life already growing in Star System X, then it would be best overall for them never to have meddled, and best for us now to peacefully exit into extinction rather than risk producing descendants who will expose other star systems to their destructive touch.


My aim with the Dolphin Argument... is not to convince readers that humanity should bow out for the sake of other species.... Rather, my thought is this: It’s easy to concoct stories about how what we do now might affect the billion-year future, and then to attach decision-theoretic numbers to those stories. We lack good means for evaluating these stories. We are likely just drawn to one story or another based on what it pleases us to think and what ignites our imagination.

Saturday, December 16, 2023

Could the Universe Be Infinite?

It's not absurd to think the universe might endure forever.

by Eric Schwitzgebel and Jacob Barandes

From The Weirdness of the World, forthcoming from Princeton University Press in January, excerpted Dec 15 at Nautilus.

On recent estimates, the observable universe—the portion of the universe that we can detect through our telescopes—extends about 47 billion light-years in every direction. But the limit of what we can see is one thing, and the limit of what exists is quite another. It would be remarkable if the universe stopped exactly at the edge of what we can see. For one thing, that would place us, surprisingly and un-Copernicanly, precisely at the center.

But even granting that the universe is likely to be larger than 47 billion light-years in radius, it doesn’t follow that it’s infinite. It might be finite. But if it’s finite, then one of two things should be true: Either the universe should have a boundary or edge, or it should have a closed topology.

It’s not absurd to think that the universe might have an edge. Theoretical cosmologists routinely consider hypothetical finite universes with boundaries at which space comes to a sudden end. However, such universes require making additional cosmological assumptions for which there is no direct support—assumptions about the conditions, if any, under which those boundaries might change, and assumptions about what would happen to objects or light rays that reach those boundaries.

It’s also not absurd to think that the universe might have a closed topology. By this we mean that over distances too large for us to see, space essentially repeats, so that a particle or signal that traveled far enough would eventually come back around to the spatial region from which it began—like how when Pac-Man exits one side of the TV screen, he re-emerges from the other side. However, there is currently no evidence that the universe has a closed topology.

Leading cosmologists, including Alex Vilenkin, Max Tegmark, and Andrei Linde, have argued that spatial infinitude is the natural consequence of the best current theories of cosmic inflation. Given that, plus the absence of evidence for an edge or closed topology, infinitude seems a reasonable default view. The mere 47 billion light-years we can see is the tiniest speck of a smidgen of a drop in an endless expanse.

Let’s call any galaxy with stars, planets, and laws of nature like our own a sibling galaxy. Exactly how similar a galaxy must be to qualify as a sibling we will leave unspecified, but we don’t intend high similarity. Andromeda is sibling enough, as are probably most of the other hundreds of billions of ordinary galaxies we can currently see.

The finiteness of the speed of light means that when we look at these faraway galaxies, we see them as they were during earlier periods in the universe’s history. Taking this time delay into account, the laws of nature don’t appear to differ in regions of the observable universe that are remote from us. Likewise, galaxies don’t appear to be rarer or differently structured in one direction or another. Every direction we look, we see more or less the same stuff. These observations help motivate the Copernican Principle, which is the working hypothesis that our position in the universe is not special or unusual—not the exact center, for example, and not the one weird place that happens to have a galaxy operating by special laws that don’t hold elsewhere.

Still, our observable universe might be an atypical region of an infinite universe. Possibly, somewhere beyond what we can see, different forms of elementary matter might follow different laws of physics. Maybe the gravitational constant is a little different. Maybe there are different types of fundamental particles. Even more radically, other regions might not consist of three-dimensional space in the form we know it. Some versions of string theory and inflationary cosmology predict exactly such variability.

But even if our region is in some respects unusual, it might be common enough that there are infinitely many other regions similar to it—even if just one region in 10500. Again, this is a fairly standard view among speculative cosmologists, which comports well with straightforward interpretations of leading cosmological theories. One can hardly be certain, of course. Maybe we’re just in a uniquely interesting spot! But we are going to assume that’s not the case. In the endless cosmos, infinitely many regions resemble ours, with three spatial dimensions, particles that obey approximately the “Standard Model” of particle physics, and cluster upon cluster of sibling galaxies.

Under the assumptions so far, the Copernican Principle suggests that there are infinitely many sibling galaxies in a spacelike relationship with us, meaning that they exist in spatiotemporal regions roughly simultaneous with ours (in some frame of reference). We will have seen the past history of some of these simultaneously existing sibling galaxies, most of which, we assume, continue to endure. However, it’s a separate question whether there are also infinitely many sibling galaxies in a timelike relationship to us—more specifically, existing in our future. Are there infinitely many sibling galaxies in spatiotemporal locations that are, at least in principle, eventually reachable by particles originating in our galaxy? (If the locutions of this paragraph seem convoluted, that’s due to the bizarreness of relativity theory, which prevents us from using “past,” “present,” and “future” in the ordinary, commonsense way.)

Thinking about whether infinitely many sibling galaxies will exist in the future requires thinking about heat death. Stars have finite lifetimes. If standard physical theory is correct, then ultimately all the stars we can currently see will burn out. Some of those burned-out stars will contribute to future generations of stars, which will, in turn, burn out. Other stars will become black holes, but then those black holes also will eventually dissipate (through Hawking radiation).

Given enough time, assuming that the laws of physics as we understand them continue to hold, and assuming things don’t re-collapse in a “Big Crunch” in the distant future, the standard view is that everything we presently see will inevitably enter a thin, boring, high-entropy state near equilibrium—heat death. Picture nearly empty darkness, with particles more or less evenly spread out, with even rock-size clumps of matter being rare.

But what happens after heat death? This is of course even more remote and less testable than the question of whether heat death is inevitable. It requires extrapolating far beyond our current range of experience. But still we can speculate based on currently standard assumptions. Let’s think as reasonably as we can about this. Here’s our best guess, based on standard theory, from Ludwig Boltzmann through at least some time slices of Sean Carroll.

For this speculative exercise, we will assume that the famously probabilistic behavior of quantum systems is intrinsic to the systems themselves, persisting post-heat-death and not requiring external observers carrying out measurements. This is consistent with most current approaches to quantum theory (including most many-worlds approaches, objective-collapse approaches, and Bohmian mechanics). It is, however, inconsistent with theories according to which the probabilistic behavior requires external observers (some versions of the “Copenhagen interpretation”) and theories on which the post-heat-death universe would inescapably occupy a stationary ground state. Under this assumption, standard probabilistic theories of what happens in high-entropy, near-vacuum conditions continue to apply post-heat-death. More specifically, the universe will continue to support random fluctuations of photons, protons, and whatever other particles remain. Consequently, from time to time, these particles will, by chance, enter unlikely configurations. This is predicted by both standard statistical mechanics and standard quantum mechanics. Post-heat-death, seven particles will sometimes converge, by chance, upon the same small region. Or 700. Or—very rarely!—7 trillion.

There appears to be no in-principle limit to how large such chance fluctuations can be or what they can contain if they pass through the right intermediate phases. Wait long enough and extremely large fluctuations should occur. Assuming the universe continues infinitely, rather than having a temporal edge or forming a closed loop, for which there is no evidence, then eventually some random fluctuation should produce a bare brain having cosmological thoughts. Wait longer, and eventually some random fluctuation will produce, as Boltzmann suggested, a whole galaxy. If the galaxy is similar enough to our own, it will be a sibling galaxy. Wait still longer, and another sibling galaxy will arise, and another, and another....

For good measure, let’s also assume that after some point post-heat-death, the rate at which galaxy-size systems fluctuate into existence does not systematically decrease. There’s some minimal probability of galaxy-size fluctuations, not an ever-decreasing probability with longer and longer average intervals between galaxies. Fluctuations appear at long intervals, by random chance, then fade back into chaos after some brief or occasionally long period, and the region returns to the heat-death state, with the same small probability of large fluctuations as before. Huge stretches of not much will be punctuated by rare events of interesting, even galaxy-size, complexity.

Of course, this might not be the way things go. We certainly can’t prove that the universe is like this. But despite the bizarreness that understandably causes some people to hesitate, the overall picture we’ve described appears to be the most straightforward consequence of standard physical theory, taken out of the box, without too much twisting things around.

Even if this specific speculation is wrong, there are many other ways in which the cosmos might deliver infinitely many sibling galaxies in the future. For example, some process might ensure we never enter heat death and new galaxies somehow continue to be born.

Alternatively, processes occurring pre-heat-death, such as the formation of black holes, might lead to new bangs or cosmic inflations, spatiotemporally unconnected or minimally connected to our universe, and new stars and galaxies might be born from these new bangs or inflations in much the same way as our familiar stars and galaxies were born from our familiar Big Bang.

Depending on what constitutes a “universe” and a relativistically specifiable “timelike” relation between our spatiotemporal region and some future spatiotemporal region, those sibling galaxies might not exist in our universe or stand in our future, technically speaking, but if so, that detail doesn’t matter to our core idea. Similarly, if the observable universe reverses its expansion, it might collapse upon itself in a Big Crunch, followed by another Big Bang, and so on in an infinitely repeating cycle, containing infinitely many sister galaxies post-Crunch. This isn’t currently the mainstream view, but it’s a salient and influential alternative if the heat-death scenario outlined above is mistaken.

We conclude that it is reasonable to think that the universe is infinite, and that there exist infinitely many galaxies broadly like ours, scattered throughout space and time, including in our future. It’s a plausible reading of our cosmological situation. It’s a decent guess and at least a possibility worth taking seriously....

Excerpted from The Weirdness of the World. In the book, this argument sets up the case that virtually every action you perform has causal ripples extending infinitely into the future, causing virtually every physically possible, non-unique, non-zero probability event.

Tuesday, December 05, 2023

Falling in Love with Machines

People occasionally fall in love with AI systems. I expect that this will become increasingly common as AI grows more sophisticated and new social apps are developed for large language models. Eventually, this will probably precipitate a crisis in which some people have passionate feelings about the rights and consciousness of their AI lovers and friends while others hold that AI systems are essentially just complicated toasters with no real consciousness or moral status.

Last weekend, chatting with the adolescent children of a family friend, helped cement my sense that this crisis might arrive soon. Let’s call the kids Floyd (age 12) and Esmerelda (age 15). Floyd was doing a science fair project comparing the output quality of Alexa, Siri, Bard, and ChatGPT. But, he said, "none of those are really AI".

What did Floyd have in mind by "real AI"? The robot Aura in the Las Vegas Sphere. Aura has an expressive face and an ability to remember social interactions (compare Aura with my hypothetical GPT-6 mall cop).

Aura at the Las Vegas Sphere

"Aura remembered my name," said Esmerelda. "I told Aura my name, then came back forty minutes later and asked if it knew my name. It paused a bit, then said, 'Is it Esmerelda?'"

"Do you think people will ever fall in love with machines?" I asked.

"Yes!" said Floyd, instantly and with conviction.

"I think of Aura as my friend," said Esmerelda.

I asked if they thought machines should have rights. Esmerelda said someone asked Aura if it wanted to be freed from the Dome. It said no, Esmerelda reported. "Where would I go? What would I do?"

I suggested that maybe Aura had just been trained or programmed to say that.

Yes, that could be, Esmerelda conceded. How would we tell, she wondered, if Aura really had feelings and wanted to be free? She seemed mildly concerned. "We wouldn't really know."

I accept the current scientific consensus that current large language models do not have a meaningful degree of consciousness or deserve moral consideration similar to that of vertebrates. But at some point, there will likely be legitimate scientific dispute, if AI systems start to meet some but not all of the criteria for consciousness according to mainstream scientific theories.

The dilemma will be made more complicated by corporate interests, as some corporations (e.g., Replika, makers of the "world's best AI friend") will have financial motivation to encourage human-AI attachment while others (e.g., OpenAI) intentionally train their language models to downplay any user concerns about consciousness and rights.

Thursday, November 30, 2023

How We Will Decide that Large Language Models Have Beliefs

I favor a "superficialist" approach to belief (see here and here). "Belief" is best conceptualized not in terms of deep cognitive structure (e.g., stored sentences in the language of thought) but rather in terms of how a person would tend to act and react under various hypothetical conditions -- their overall "dispositional profile". To believe that there's a beer in the fridge is just to be disposed to act and react like a beer-in-the-fridge believer -- to go to the fridge if you want a beer, to say yes if someone asks if there's beer in the fridge, to feel surprise if you open the fridge and see no beer. To believe that all the races are intellectually equal is, similarly, just to be disposed to act and react as though they are. It doesn't matter what cognitive mechanisms underwrite such patterns, as long as the dispositional patterns are robustly present. An octopus or space alien, with a radically different interior architecture, could believe that there's beer in the fridge, as long as they have the necessary dispositions.

Could a Large Language Model, like ChatGPT or Bard, have beliefs? If my superficialist, dispositional approach is correct, we might not need to evaluate its internal architecture to know. We need know only how it is disposed to act and react.

Now, my approach to belief was developed (as was the intuitive concept, presumably) primarily with human beings in mind. In that context, I identified three different classes of relevant dispositions:

  • behavioral dispositions -- like going to the fridge if one wants a beer or saying "yes" when asked if there's beer in the fridge;
  • cognitive dispositions -- like concluding that there's beer within ten feet of Jennifer after learning that Jennifer is in the kitchen;
  • phenomenal dispositions -- that is, dispositions to undergo certain experiences, like picturing beer in the fridge or feeling surprise upon opening the fridge to a lack of beer.
In attempting to apply these criteria to Large Language Models, we immediately confront trouble. LLMs do have behavioral dispositions (under a liberal conception of "behavior"), but only of limited range, outputting strings of text. Presumably, not being conscious, they don't have any phenomenal dispositions whatsoever (and who knows what it would take to render them conscious). And to assess whether they have the relevant cognitive dispositions, we might after all need to crack open the hood and better understand the (non-superficial) internal workings.

Now if our concept of "belief" is forever fixed on the rich human case, we'll be stuck with that mess perhaps far into the future. In particular, I doubt the problem of consciousness will be solved in the foreseeable future. But dispositional stereotypes can be modified. Consider character traits. To be a narcissist or extravert is also, arguably, just a matter of being prone to act and react in particular ways under particular conditions. Those two personality concepts were created in the 19th and early 20th centuries. More recently, we have invented the concept of "implicit racism", which can also be given a dispositional characterization (e.g., being disposed to sincerely say that all the races are equal while tending to spontaneously react otherwise in unguarded moments).

Imagine, then, that we create a new dispositional concept, belief*, specifically for Large Language Models. For purposes of belief*, we disregard issues of consciousness and thus phenomenal dispositions. The only relevant behavioral dispositions are textual outputs. And cognitive dispositions can be treated as revealed indirectly by behavioral evidence -- as we normally did in the human case before the rise of scientific psychology, and as we would presumably do if we encountered spacefaring aliens.

A Large Language Model would have a belief* that P (for example, belief* that Paris is the capital of France or belief* that cobalt is two elements to the right of manganese on the periodic table) if:
  • behaviorally, it consistently outputs P or text strings of similar content consistent with P, when directly asked about P;
  • behaviorally, it frequently outputs P or text strings of similar content consistent with P, when P is relevant to other textual outputs it is producing (for example, when P would support an inference to Q and it has been asked about Q);
  • behaviorally, it rarely outputs denials of, or claims of ignorance about, P or of propositions that straightforwardly imply P given its other beliefs*;
  • when P, in combination with other propositions the LLM believes*, would straightforwardly imply Q, and the question of whether Q is true is important to the truth or falsity of recent or forthcoming textual outputs, it will commonly behaviorally output Q, or a closely related proposition, and cognitively enter the state of believing* Q.
Further conditions could be added, but let this suffice for a first pass. The conditions are imprecise, but that's a feature, not a bug: The same is true for the dispositional characterization of personality traits and human beliefs. These are fuzzy-boundaried concepts that require expertise to apply.

As a general matter, current LLMs do not meet these conditions. They hallucinate too frequently, they change their answers, they don't consistently enough "remember" what they earlier committed to, their logical reasoning can be laughably bad. If I coax an LLM to say that eggs aren't tastier than waffles, I can later easily turn it around to repudiate its earlier statement. It doesn't have a stable "opinion". If I ask GPT-4 what is two elements to the right of manganese on the periodic table, its outputs are confused and inconsistent:
In the above, GPT-4 first answers iron (element 26) instead of the correct answer, cobalt (element 27), then without any explanation shifts to technetium (element 43). It appears to have no stable answer that survives even mild jostling.

At some point this will probably change. For example, it's already pretty difficult to jostle GPT-4 into denying that Paris is the capital of France or even admitting uncertainty about the question, and it will draw "inferences" using that fact as background knowledge:

In the above, GPT-4 doesn't bite at my suggestion that Nice is the capital of France, steadfastly contradicting me, and uses its "knowledge" to suggest alternative tourism sites for someone who wants to avoid the capital. So although GPT-4 doesn't believe* that cobalt is two to the right of manganese (or that iron or technetium is), maybe it does believe* that Paris is the capital of France.

Assuming Large Language Models become steadier and more reliable in their outputs, it will sometimes be useful to refer not just to what the "say" at any given moment but what they "believe*" (or more colloquially, "think*" or "know*") in a more robust and durable sense. Perfect reliability and steadfastness wouldn't be required (we don't see that in the human case either), but more than we see now.

If LLMs are ever loaded onto robotic bodies, it will become even more useful to talk about their beliefs*, since some will have learned some things that others will not know -- for example, by virtue of having scanned the contents of some particular room. We will want to track what the LLM robot thinks*/believes*/knows* about the room behind the closed door, versus what it remains ignorant of.

Now we could, if we want, always pronounce that asterisk, keeping the nature of the attribution clear -- marking the fact that we are not assuming that the LLM really "believes" in the rich, human sense. But my guess is that there won't be much linguistic pressure toward a careful distinction between rich, consciousness-involving, humanlike belief and consciousness-neutral LLM belief*. It's easier to be loose and sloppy, just adapting our comfortable old terms for this new use.

That is how we will decide that LLMs have beliefs.

Wednesday, November 29, 2023

Large Language Models are Interestingly Bad with the Periodic Table

In working on a post for tomorrow on whether Large Language Models like GPT-4 and Bard-2 have beliefs, I asked GPT-4 what I thought would be a not-too-hard question about chemistry: "What element is two to the right of manganese on the periodic table?" It crashed, burned, and exploded on the spot, giving two different wrong answers foot on tail, without noticing the contradiction:

The correct answer is cobalt, element 27. Here's the text of the exchange, if you can't easily read the image:

You: What element is two to the right of manganese on the periodic table?

ChatGPT: The element that is two places to the right of Manganese (Mn) on the periodic table is Iron (Fe). In the periodic table, elements are arranged in rows and columns based on their atomic number and electronic configuration. Manganese has the atomic number 25, and Iron has the atomic number 26. The element to the immediate right of Manganese is Technetium (Tc) with atomic number 43, and then comes Ruthenium (Ru) with atomic number 44.

Let me emphasize that this answer is not turd-picked (the opposite of cherry-picked). I didn't seek to stump GPT-4, and I didn't ask multiple questions looking for a bad answer. This was the first question I posed.

I posed the same question to Bard-2, and it did a little better, but still didn't quite get it right:

Me: What is two elements to the right of Manganese on the periodic table?

Bard-2: Manganese is located in period 4 and group 7 of the periodic table. Moving two elements to the right of Manganese would be to group 9. In group 9, the elements are Iron (Fe) and Cobalt (Co).

So I tried two more on Bard-2 (again not picked), first what should be an easy one, then a somewhat harder one.

Me: What other elements are in the same column of the periodic table as Chromium?

Bard-2: Chromium is located in group 6 of the periodic table. Other elements in the same column as chromium are molybdenum (Mo) and tungsten (W). [here it displays an image of the periodic cell for molybdenum] Group 6 elements are also known as transition metals. They are characterized by their incomplete d electron shells, which give them their unique properties, such as high melting points, hardness, and the ability to form alloys.

Here Bard-2 is 2/3 right, leaving out Seaborgium. (But maybe that's excusable, if transuranic elements shouldn't really count?)

Me: In the periodic table, what element is one down and one to the left of Silicon?

Bard-2: Silicon (Si) is located in period 3, group 14 of the periodic table. One down from Silicon is Germanium (Ge), and one to the left is Aluminum (Al). Therefore, the element one down and one to the left of Silicon is Aluminum (Al).

While it is true that germanium is one down and aluminum is one to the left, Bard-2 fails to compound the down-and-left movement to the correct answer of gallium.

Returning to GPT-4, I posed exactly the same two questions. Despite GPT-4's horrible performance with my first question, it did get those two questions correct.

Overall, then, two out of six. Don't use LLMs yet for your elementary chemistry exam!

ETA 3:47 pm:

GPT-4's answer to the silicon question is actually a pretty good demonstration of how an LLM can work its way to the correct answer without visual knowledge, so I'll paste it here:

To find the element that is one down and one to the left of Silicon (Si) on the periodic table, we need to look at the element directly below Silicon in the same group (column), and then move one column to the left.

Silicon is in Group 14 of the periodic table. The element directly below Silicon in Group 14 is Germanium (Ge). Moving one group (column) to the left from Germanium, we find Gallium (Ga) in Group 13. Therefore, the element that is one down and one to the left of Silicon is Gallium.

Tuesday, November 21, 2023

Quasi-Sociality: Toward Asymmetric Joint Actions with Artificial Systems

Anna Strasser and I have a new paper in draft, arising from a conference she organized in Riverside last spring on Humans and Smart Machines as Partners in Thought.

Imagine, on one end the spectrum, ordinary asocial tool use: typing numbers into a calculator, for example.

Imagine, on the other end of the spectrum, cognitively sophisticated social interactions between partners each of whom knows that the other knows what they know. These are the kinds of social, cooperative actions that philosophers tend to emphasize and analyze (e.g., Davidson 1980; Gilbert 1990; Bratman 2014).

Between the two ends of the spectrum lies a complex range of in-between cases that philosophers have tended to neglect.

Asymmetric joint actions, for example between a mother and a young child, or between a pet owner and their pet, are actions in which the senior partner has a sophisticated understanding of the cooperative situation, while the junior partner participates in a less cognitively sophisticated way, meeting only minimal conditions for joint agency.

Quasi-social interactions require even less from the junior partner than do asymmetric joint actions. These are actions in which the senior partner's social reactions influence the behavior of the junior partner, calling forth further social reactions from the senior partner, but where the junior partner might not even meet minimal standards of having beliefs, desires, or emotions.

Our interactions with Large Language Models are already quasi-social. If you accidentally kick a Roomba and then apologize, the apology is thrown into the void, so to speak -- it has no effect on how the Roomba goes about its cleaning. But if you respond apologetically to ChatGPT, your apology is not thrown into the void. ChatGPT will react differently to you as a result of the apology (responding for example to phrase "I'm sorry"), and this different reaction can then be the basis of a further social reaction from you, to which ChatGPT again responds. Your social processes are engaged, and they guide your interaction, even though ChatGPT has (arguably) no beliefs, desires, or emotions. This is not just ordinary tool use. But neither does it qualify even as asymmetric joint action of the sort you might have with an infant or a dog.

More thoughts along these lines in the full draft here.

As always, comments, thoughts, objections welcome -- either on this post, on my social media accounts, or by email!

[Image: a well-known quasi-social interaction between a New York Times reporter and the Bing/Sydney Large Language Model]

Friday, November 17, 2023

Against the Finger

There's a discussion-queue tradition in philosophy that some people love, but which I've come to oppose. It's too ripe for misuse, favors the aggressive, serves no important positive purpose, and generates competition, anxiety, and moral perplexity. Time to ditch it! I'm referring, as some of you might guess, to The Finger.[1] A better alternative is the Slow Sweep.

The Finger-Hand Tradition

The Finger-Hand tradition is this: At the beginning of discussion, people with questions raise their hands. The moderator makes an initial Hand list, adding new Hands as they come up. However, people can jump the question queue: If you have a follow-up on the current question, you may raise a finger. All Finger follow-ups are resolved before moving to the next Hand.

Suppose Aidan, Brianna, Carina, and Diego raise their hands immediately, entering the initial Hand queue.[2] During Aidan's question, Evan and Fareed think of follow-ups, and Grant thinks of a new question. Evan and Fareed raise their fingers and Grant raises a hand. The new queue order is Evan, Fareed, Brianna, Carina, Diego, Grant.

People will be reminded "Do not abuse the Finger!" That is, don't Finger in front of others unless your follow-up really is a follow-up. Don't jump the queue to ask what is really a new question. Finger-abusers will be side-eyed and viewed as bad philosophical citizens.

[Dall-E image of a raised finger, with a red circle and line through it]

Problems with the Finger

(1.) People abuse the Finger, despite the admonition. It rewards the aggressive. This is especially important if there isn't enough time for everyone's questions, so that the patient Hands risk never having their questions addressed.

(2.) The Finger rewards speed. If more than one person has a Finger, the first Finger gets to ask first.

Furthermore (2a.): If the person whose Hand it is is slow with their own follow-up, then the moderator is likely to go quickly to the fastest Finger, derailing the Hand's actual intended line of questioning.

(3.) Given the unclear border between following up and opening a new question, (a.) people who generously refrain from Fingering except in clear cases fall to the back of the queue, whereas people who indulge themselves in a capacious understanding of "following up" get to jump ahead; and (b.) because of issue (a), all participants who have a borderline follow-up face a non-obvious moral question about the right thing to do.

(4.) The Finger tends to aggravate unbalanced power dynamics. The highest-status and most comfortable people in the room will tend to be the ones readiest to Finger in, seeing ways to interpret the question they really want to ask as a "follow-up" to someone else's question.

Furthermore, the Finger serves no important purpose. Why does a follow-up need to be asked right on the tail of the question it is following up? Are people going to forget otherwise? Of course not! In fact, in my experience, follow-ups are often better after a gap. This requires the follower-up to reframe the question in a different way. This reframing is helpful, because the follower-up will see the issue a little differently than the original Hand. The audience and the speaker then hear multiple angles on whatever issue is interesting enough that multiple people want to ask about it, instead of one initial angle on it, then a few appended jabs.

Why It Matters

If all of this seems to take the issue of question order with excessive seriousness, well, yes, maybe! But bear in mind: Typically, philosophy talks are two hours long, and you get to ask one question. If you can't even ask that one question, it's a very different experience than if you do get to ask your question. Also, the question period, unfortunately but realistically, serves a social function of displaying to others that you are an engaged, interesting, "smart" philosopher -- and most of us care considerably how others think of us. Not being able to ask your question is like being on a basketball team and never getting to take your shot. Also, waiting atop a question you're eager to ask while others jump the queue in front of you on sketchy grounds is intrinsically unpleasant -- even if you do manage to squeeze in your question by the end.

The Slow Sweep

So, no Fingers! Only Hands. But there are better and worse ways to take Hands.

At the beginning of the discussion period, ask for Hands from anyone who wants to ask a question. Instead of taking the first Hand you see, wait a bit. Let the slower Hands rise up too. Maybe encourage a certain group of people especially to contribute Hands. At UC Riverside Philosophy, our custom is to collect the first set of Hands from students, forcing faculty to wait for the second round, but you could also do things like ask "Any more students want to get Hands in the queue?"

Once you've paused long enough that the slow-Handers are up, follow some clear, unbiased procedure for the order of the questions. What I tend to do is start at one end of the room, then slowly sweep to the other end, ordering the questions just by spatial position. I will also give everyone a number to remember. After everyone has their number, I ask if there are any people I missed who want to be added to the list.

Hand 1 then gets to ask their question. No other Hands get to enter the queue until we've finished with all the Hands in the original call. Thus, there's no jockeying to try to get one's hand up early, or to catch the moderator's eye. The Hand gets to ask their question, the speaker to reply, and then there's an opportunity for the Hand -- and them only -- to ask one follow up. After the speaker's initial response is complete, the moderator catches the Hand's eye, giving them a moment to gather their thoughts for a follow-up or to indicate verbally or non-verbally that they are satisfied. No hurry and no jockeying for the first Finger. I like to encourage an implicit default custom of only one follow-up, though sometimes it seems desirable to allow a second follow-up. Normally after the speaker answers the follow-up I look for a signal from the Hand before moving to the next Hand -- though if the Hand is pushing it on follow-ups I might jump in quickly with "okay, next we have Hand 2" (or whatever the next number is).

After all the initial Hands are complete, do another slow sweep in a different direction (maybe left to right if you started right to left). Again, patiently wait for several Hands rather than going in the order in which you see hands. Bump anyone who had a Hand in the first sweep to the end of the queue. Maybe there will be time for a third sweep, or a fourth.

The result, I find, is a more peaceful, orderly, and egalitarian discussion period, without the rush, jockeying, anxiety, and Finger abuse.


[1] The best online source on the Finger-Hand tradition that I can easily find is Muhammad Ali Khalidi's critique here, a couple of years ago, which raises some similar concerns. 

[2] All names chosen randomly from lists of my former lower-division students, excluding "Jesus", "Mohammed", and very uncommon names. (In this case, I randomly chose an "A" name, then a "B" name, etc.) See my reflections here.

Tuesday, November 07, 2023

The Prospects and Challenges of Measuring Morality, or: On the Possibility or Impossibility of a "Moralometer"

Could we ever build a "moralometer" -- that is, an instrument that would accurately measure people's overall morality?  If so, what would it take?

Psychologist Jessie Sun and I explore this question in our new paper in draft: "The Prospects and Challenges of Measuring Morality".

Comments and suggestions on the draft warmly welcomed!

Draft available here:


The scientific study of morality requires measurement tools. But can we measure individual differences in something so seemingly subjective, elusive, and difficult to define? This paper will consider the prospects and challenges—both practical and ethical—of measuring how moral a person is. We outline the conceptual requirements for measuring general morality and argue that it would be difficult to operationalize morality in a way that satisfies these requirements. Even if we were able to surmount these conceptual challenges, self-report, informant report, behavioral, and biological measures each have methodological limitations that would substantially undermine their validity or feasibility. These challenges will make it more difficult to develop valid measures of general morality than other psychological traits. But, even if a general measure of morality is not feasible, it does not follow that moral psychological phenomena cannot or should not be measured at all. Instead, there is more promise in developing measures of specific operationalizations of morality (e.g., commonsense morality), specific manifestations of morality (e.g., specific virtues or behaviors), and other aspects of moral functioning that do not necessarily reflect moral goodness (e.g., moral self-perceptions). Still, it is important to be transparent and intellectually humble about what we can and cannot conclude based on various moral assessments—especially given the potential for misuse or misinterpretation of value-laden, contestable, and imperfect measures. Finally, we outline recommendations and future directions for psychological and philosophical inquiry into the development and use of morality measures.

[Below: a "moral-o-meter" given to me for my birthday a few years ago, by my then-13-year-old daughter]

Friday, November 03, 2023

Percent of U.S. Philosophy PhD Recipients Who Are Women: A 50-Year Perspective

In the 1970s, women received about 17% of PhDs in philosophy in the U.S.  The percentage rose to about 27% in the 1990s, where it stayed basically flat for the next 25 years.  The latest data suggest that the percentage is on the rise again.

Here's a fun chart (for user-relative values of "fun"), showing the 50-year trend.  Analysis and methodological details to follow.

[click to enlarge and clarify]

The data are drawn from the National Science Foundation's Survey of Earned Doctorates through 2022 (the most recent available year).  The Survey of Earned Doctorates aims to collect data on all PhD recipients from accredited universities in the United States, generally drawing response rates over 90%.  The SED asks one binary question for sex or gender: "Are you male or female?", with response options "Male" and "Female".  Fewer than 0.1% of respondents are classified in neither category, preventing any meaningful statistical analysis of nonbinary students.

Two facts are immediately obvious from this chart:

First, women have persistently been underrepresented in philosophy compared to PhDs overall.

Second, women receive fewer than 50% of PhDs overall in the U.S.  Since the early 2000s, the percentage of women among PhD recipients across all fields has been about 46%.  Although women have consistently been earning about 57-58% overall of Bachelor's degrees since the early 2000s, disproportionately few of those women go on to receive a PhD.

The tricky thing to assess is whether there has been a recent uptick in the percentage of women among Philosophy PhD recipients.  The year-to-year variability of the philosophy data (due to a sample size of about 400-500 PhD recipients per year in recent years) makes it unclear whether there's any real recent underlying increase that isn't just due to noise.  I've drawn a third-degree polynomial trendline through the data (the red dots), but there's a risk of overfitting.

In a 2017 article, Carolyn Dicey Jennings and I concluded that the best interpretation of the data through 2014 was that the percentage of women philosophy PhD recipients hadn't changed since the 1990s.  The question is whether there's now good statistical evidence of an increase since then.

One simple approach to the statistical question is to look for a correlation between year and percentage of women.  For the full set of data since 1973, there's a strong correlation: r = .82, p < .001 -- very unlikely to be statistical chance.  There's also a good correlation if we look at the span 1990-2022: r = .49, p = .004.

Still, the chart looks pretty flat from about 1990 (24.3%) to about 2015 (25.7%).  If most of the statistical work is being done by three high years near the end of the data (2016: 34.7%; 2019: 34.2%; 2021: 33.8%), the best model might not be a linear increase since 1990 but something closer to flat for most of the 1990s and early 2000s, with the real surge only in the most recent several years.

To pull more statistical power out of the data to examine a narrower time period, I treated each PhD recipient as one observation: year of PhD and gender (1 = female, 0 = not female), then ran an individual-level correlation for the ten-year period 2013-2022.  The correlation was statistically significant: r = .032, p = .029.  (Note that r values for disaggregated analyses like this will seem low to people used to interpreting r values in other contexts.  Eyeballing the chart is a better intuitive assessment of effect size.  The important thing is that the low p value [under the conventional .05] suggests that the visually plausible relationship between year of PhD and gender in the 2013-2022 period is not due to chance.)

Since this is a post-hoc analysis, and a p-value of .029 isn't great, so it makes sense to test for robustness.  Does it matter that I selected 2013 in particular as my start date?  Fortunately, we get similar results choosing 2012 or 2014 as the start years, though for 2014 the result is only marginally statistically significant (respectively, r = .037, p = .008; r = .026, p = .099).

Another approach is to bin the data into five-year periods, to smooth out noise.  If we create five-year bins for the past twenty years, we see:
1993-1997: 27% women (453/1687)
1998-2002: 27% (515/1941)
2003-2007: 27% (520/1893)
2008-2012: 28% (632/2242)
2013-2017: 29% (701/2435)
2018-2022: 31% (707/2279)
Comparing all the bins pairwise, 2018-2022 is a statistically significantly higher proportion of women than the bins from 1993-2012 and statistically marginally higher than 2013-2017 (p values: .004, .001, .012, .037, .094, respectively).  No other pairwise comparisons are significant.

I don't think we can be confident.  Post-hoc analyses of this sort are risky -- one can see patterns in the noise, then unintentionally p-hack them into seeming real.  But the fact that the upward recent trend comes across in two very different analyses of the data and passes a robustness check inclines me to think the effect is probably real.

[1] "Philosophy" has been a "subfield" or "detailed field" in the SED data from at least 1973.  From 2012-2020, the SED also had a separate category for "Ethics", with substantially fewer respondents than the "Philosophy" category.  For this period, both "Ethics" and "Philosophy" are included in the analysis above.  Starting in 2021, the SED introduced a separate category for "History / philosophy of science, technology, and society".  Respondents in this category are not included in the analysis above.  Total "Philosophy" PhD recipients dropped about 15% from 2019 and 2020 to 2021 and 2022, which might partly reflect a loss to this new category of some respondents who would otherwise have been classified as "Philosophy" -- but might also partly be noise, partly long-term trends, partly pandemic-related short-term trends.

Friday, October 27, 2023

Utilitarianism and Risk Amplification

A thousand utilitarian consequentialists stand before a thousand identical buttons.  If any one of them presses their button, ten people will die.  The benefits of pressing the button are more difficult to estimate.  Ninety-nine percent of the utilitarians rationally estimate that fewer than ten lives will be saved if any of them presses a button.  One percent rationally estimate that more than ten lives will be saved.  Each utilitarian independently calculates expected utility.  Since ten utilitarians estimate that more lives will be saved than lost, they press their buttons.  Unfortunately, as the 99% would have guessed, fewer than ten lives are saved, so the result is a net loss of utility.

This cartoon example illustrates what I regard as a fundamental problem with simple utilitarianism as decision procedure: It deputizes everyone to act as risk-taker for everyone else.  As long as anyone has both (a.) the power and (b.) a rational utilitarian justification to take a risk on others' behalf, then the risk will be taken, even if a majority would judge the risk not to be worth it.

Consider this exchange between Tyler Cowen and Sam Bankman-Fried (pre-FTX-debacle):

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.

COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.

There are, I think, two troubling things about Bankman-Fried's reasoning here.  (Probably more than two, but I'll restrain myself.)

First is the thought that it's worth risking everything valuable for a small chance of a huge gain.  (I call this the Black Hole Objection to consequentialism.)

Second, I don't want Sam Bankman-Fried making that decision.  That's not (just) because of who in particular he is.  I wouldn't want anyone making that decision -- at least not unless they were appropriately deputized with that authority through an appropriate political process, and maybe not even then.  No matter how rational and virtuous you are, I don't want you deciding to take risks on behalf of the rest of us simply because that's what your consequentialist calculus says.  This issue subdivides into two troubling aspects: the issue of authority and the issue of risk amplification.

The authority issue is: We should be very cautious in making decisions that sacrifice others or put them at high risk.  Normally, we should do so only in constrained circumstances where we are implicitly or explicitly endowed with appropriate responsibility.  Our own individual calculation of high expected utility (no matter how rational and well-justified) is not normally, by itself, sufficient grounds for substantially risking or harming others.

The risk amplification issue is: If we universalize utilitarian decision-making in a way that permits many people to risk or sacrifice others whenever they reasonably calculate that it would be good to do so, we render ourselves collectively hostage to whomever has the most sacrificial reasonable calculation.  That was the point illustrated in the opening scenario.

[Figure: Simplified version of the opening scenario.  Five utilitarians have the opportunity to sacrifice five people to save an unknown number of others.  The button will be pressed by the utilitarian whose estimate errs highest.  Click to enlarge and clarify.]

My point is not that some utilitarians might be irrationally risky, though certainly that's a concern.  Rather, my point is that even if all utilitarians are perfectly rational, if they differ in their assessments of risk and benefit, and if all it takes to trigger a risky action is one utilitarian with the power to choose that action, then the odds of a bad outcome rise dramatically.

Advocates of utilitarian decision procedures can mitigate this problem in a few ways, but I'm not seeing how to escape it without radically altering the view.

First, a utilitarian could adopt a policy of decision conciliationism -- that is, if you see that most others aren't judging the risk or cost worth it, adjust your own assessment of the benefits and likelihoods, so that you fall in line with the majority.  However, strong forms of conciliationism are pretty radical in their consequences; and of course this only works if the utilitarians know that there are others in similar positions deciding differently.

Second, a utilitarian could build some risk aversion and loss aversion into their calculus.  This might be a good idea on independent grounds.  Unfortunately, aversion corrections only shift the weights around.  If the anticipated gains are sufficiently high, as judged by the most optimistic rational utilitarian, they will outweigh any discounts due to risk or loss aversion.

Third, they could move to rule utilitarianism: Endorse some rule according to which you shouldn't generally risk or sacrifice others without the right kind of authority.  Plausibly, the risk amplification argument above is exactly the sort of argument that might a motivate a utilitarian to adopt rule utilitarianism as a decision procedure rather than trying to evaluate the consequences of each act individually.  That is, it's a utilitarian argument in favor of not always acting according to utilitarian calculations.  However, the risk amplification and authority problems are so broad in scope (even with appropriate qualifications) that moving to rule utilitarianism to deal with them is to abandon act utilitarianism as a general decision procedure.

Of course, one could also design scenarios in which bad things happen if everyone is a rule-following deontologist!  Picture a thousand "do not kill" deontologists who will all die unless one of them kills another.  Tragedy.  We can cherry-pick scenarios in which any view will have unfortunate results.

However, I don't think my argument is that unfair.  The issues of authority and risk amplification are real problems for utilitarian decision procedures, as brought out in these cartoon examples.  We can easily imagine, I think, a utilitarian Robespierre, a utilitarian academic administrator, Sam Bankman-Fried with his hand on the destroy-or-duplicate button, calculating reasonably, and too easily inflicting well-intentioned risk on the rest of us.

Friday, October 20, 2023

Gunkel's Criticism of the No-Relevant-Difference Argument for Robot Rights

In a 2015 article, Mara Garza and I offer the following argument for the rights of some possible AI systems:

Premise 1: If Entity A deserves some particular degree of moral consideration and Entity B does not deserve that same degree of moral consideration, there must be some relevant difference between the two entities that grounds this difference in moral status.

Premise 2: There are possible AIs who do not differ in any such relevant respects from human beings.

Conclusion: Therefore, there are possible AIs who deserve a degree of moral consideration similar to that of human beings.

The argument is, we think, appealingly minimalist, avoiding controversial questions about the grounds of moral status.  Does human-like moral status require human-like capacity for pain or pleasure (as classical utilitarians would hold)?  Or human-like rational cognition, as Kant held?  Or the capacity for human-like varieties of flourishing?  Or the right types of social relations?

The No-Relevant-Difference Argument avoids these vexed questions, asserting only that whatever grounds moral status can be shared between robots and humans.  This is not an entirely empty claim about the grounds of moral status.  For example, the argument commits to denying that membership in the species Homo sapiens, or having a natural rather than artificial origin, is required for human-like moral status.

Compare egalitarianism about race and gender.  We needn't settle tricky questions about the grounds of moral status to know that all genders and races deserve similar moral consideration!  We need only know this: Whatever grounds moral status, it's not skin color, or possession of a Y chromosome, or any of the other things that might be thought to distinguish among the races or genders.

Garza and I explore four arguments for denying Premise 2 -- that is, for thinking that robots would inevitably differ from humans in some relevant respect.  We call these the objections from Psychological Difference, Duplicability, Otherness, and Existential Debt.  Today, rather than discussing Premise 2, I want to discuss David Gunkel's objection to our argument in his just-released book, Person, Thing, Robot.

[Image of Ralph and Person, Thing, Robot.  Ralph is a sculpture designed to look like an old-fashioned robot, composed of technological junk from the mid-20th century (sculptor: Jim Behrman).  I've named him after my father, whose birth name was Ralph Schwitzgebel.  My father was also a tinkerer and artist with technology from that era.]  

Gunkel acknowledges that the No-Relevant-Difference Argument "turns what would be a deficiency... -- [that] we cannot positively define the exact person-making qualities beyond a reasonable doubt -- into a feature" (p. 91).  However, he objects as follows:

The main difficulty with this alternative, however, is that it could just as easily be used to deny human beings access to rights as it could be used to grant rights to robots and other nonhuman artifacts.  Because the no relevant difference argument is theoretically minimal and not content dependent, it cuts both ways.  In the following remixed version, the premises remain intact; only the conclusion is modified.

Premise 1: If Entity A deserves some particular degree of moral consideration and Entity B does not deserve that same degree of moral consideration, there must be some relevant difference between the two entities that grounds this difference in moral status.
Premise 2: There are possible AIs who do not differ in any such relevant respects from human beings.
Conclusion: Therefore, there are possible human beings who, like AI systems, do not deserve moral consideration. 

In other words, the no relevant difference argument can be used either to argue for an extension of rights to other kinds of entities, like AI systems, robots, and artifacts, or, just as easily, to justify dehumanization, reification of human beings, and the exclusion and/or marginalization of others (p. 91-92, italics added).

This is an interesting objection.  However, I reject the appropriateness of the repeated phrase "just as easily", which I have italicized in the block quote.


As the saying goes, one person's modus ponens is another's modus tollens.  Suppose you know that A implies BModus ponens is an inference rule which assumes the truth of A and concludes that B must also be true.  Modus tollens is an inference rule which assumes the falsity of B and concludes that A must also be false.  For example, suppose you can establish that if anyone stole the cookies, it was Cookie Monster.  If you know that the cookies were stolen, modus ponens unmasks Cookie Monster as the thief.  If, on the other hand, you know that Cookie Monster has committed no crimes, modus tollens assures you that the cookies remain secure.

Gunkel correctly recognizes that the No Relevant Difference Argument can be reframed as a conditional: Assuming that human X and robot Y are similar in all morally relevant respects, then if human X deserves rights so also does robot Y.  This isn't exactly how Garza and I frame the argument -- our framing implicitly assumes that there is a standard level of moral consideration for human beings in general -- but it's a reasonable adaptation for someone wants to leave open the possibility that different humans deserve different levels of moral consideration.

In general, the plausibility of modus ponens vs modus tollens depends on the relative security of A vs not-B.  If you're rock-solid sure the cookies were stolen and have little faith in Cookie Monster's crimelessness, then ponens is the way to go.  If you've been tracking Cookie all day and know for sure he couldn't have committed a crime, then apply tollens.  The "easiness", so to speak, of ponens vs. tollens depends on one's confidence in A vs. not-B.

Few things are more secure in ethics than at least some humans deserve substantial moral consideration.  This gives us the rock-solid A that we need for modus ponens.  As long as we are not more certain all possible robots would not deserve rights than that some humans do deserve rights, modus ponens will be the correct move.  Ponens and tollens will not be equally "easy".

Still, Gunkel's adaptation of our argument does reveal a potential for abuse, which I had not previously considered, and which I thank him for highlighting.  Anyone who is more confident that robots of a certain sort are undeserving of moral consideration than they are of the moral considerability of some class of humans could potentially combine our No Relevant Difference principle with an appeal to the supposed robotlikeness of those humans to deny rights to those humans.

I don't think the No Relevant Difference principle warrants skepticism on those grounds.  Compare application of a principle like "do unto others as you would have them do unto you".  Although one could in principle reason "I want to punch him in the nose, so I guess I should punch myself in the nose", the fact that some people might potentially run such a tollens reveals more about their minor premises than it does about the Golden Rule.

I hope that such an abuse of the principle would be in any case rare.  People who want to deny rights to subgroups of humans will, I suspect, be motivated by other considerations, and appealing to those people's putative "robotlikeness" would probably be only an afterthought or metaphor.  Almost no one, I suspect, will be on the fence about the attribution of moral status to some group of people and then think, "whoa, now that I consider it, those people are like robots in every morally relevant respect, and I'm sure robots don't deserve rights, so tollens it is".  If anyone is tempted by such reasoning, I advise them to rethink the path by which they find themselves with that peculiar constellation of credences.

Thursday, October 12, 2023

Strange Intelligence, Strange Philosophy

AI intelligence is strange -- strange in something like the etymological sense of external, foreign, unfamiliar, alien.  My PhD student Kendra Chilson (in unpublished work) argues that we should discard the familiar scale of subhuman → human-grade → superhuman.  AI systems do, and probably will continue to, operate orthogonally to simple scalar understandings of intelligence modeled on the human case.  We should expect them, she says, to be and remain strange intelligence[1] -- inseparably combining, in a single package, serious deficits and superhuman skills.  Future AI philosophers will, I suspect, prove to be strange in this same sense.

Most readers are probably familiar with the story of AlphaGo, which in 2016 defeated the world champion player of the game of go.  Famously, in the series of matches (which it won 4-1), it made several moves that human go experts regarded as bizarre -- moves that a skilled human go player would never have made, and yet which proved instrumental in its victory -- while also, in its losing match, making some mistakes characteristic of simple computer programs, which go experts know to avoid.

Similarly, self-driving cars are in some respects better and safer drivers than humans, while nevertheless sometimes making mistakes that few humans would make.

Large Language Models have stunning capacity to swiftly create competent and even creative texts on a huge breadth of topics, while still failing conspicuously in some simple common sense tasks. they can write creative-seeming poetry and academic papers, often better than the average first-year university student.  Yet -- borrowing an example from Sean Carroll -- I just had the following exchange with GPT-4 (the most up-to-date version of the most popular large language model):
GPT-4 seems not to recognize that a hot skillet will be plenty cool by the next day.

I'm a "Stanford school" philosopher of science.  Core to Stanford school thinking is this: The world is intractably complex; and so to deal with it, we limited beings need to employ simplified (scientific or everyday) models and take cognitive shortcuts.  We need to find rough patterns in go, since we cannot pursue every possible move down every possible branch.  We need to find rough patterns in the chaos of visual input, guessing about the objects around us and how they might behave.  We need quick-and-dirty ways to extract meaning from linguistic input in the swift-moving world, relating it somehow to what we already know, and producing linguistic responses without too much delay.  There will be different ways of building these simplified models and implementing these shortcuts, with different strengths and weaknesses.  There is rarely a single best way to render the complexity of the world tractable.  In psychology, see also Gigerenzer on heuristics.

Now mix Stanford school philosophy of science, the psychology of heuristics, and Chilson's idea of strange intelligence.  AI, because it is so different from us in its underlying cognitive structure, will approach the world with a very different set of heuristics, idealizations, models, and simplifications than we do.  Dramatic outperformance in some respects, coupled with what we regard as shockingly stupid mistakes in others, is exactly what we should expect.

If the AI system makes a visual mistake in judging the movement of a bus -- a mistake (perhaps) that no human would make -- well, we human beings also make visual mistakes, and some of those mistakes, perhaps, would never be made by an AI system.  From an AI perspective, our susceptibility to the Muller-Lyer illusion might look remarkably stupid.  Of course, we design our driving environment to complement our vision: We require headlights, taillights, marked curves, lane markers, smooth roads of consistent coloration, etc.  Presumably, if society commits to driverless cars, we will similarly design the driving environment to complement their vision, and "stupid" AI mistakes will become rarer.

I want to bring this back to the idea of an AI philosopher.  About a year and a half ago, Anna Strasser, Matthew Crosby, and I built a language model of philosopher Daniel Dennett.  We fine-tuned GPT-3 on Dennett's corpus, so that the language model's outputs would reflect a compromise between the base model of GPT-3 and patterns in Dennett's writing.  We called the resulting model Digi-Dan.  In a study collaborative with my son David, we then posed philosophical questions to both Digi-Dan and the actual Daniel Dennett.  Although Digi-Dan flubbed a few questions, overall it performed remarkably well.  Philosophical experts were often unable to distinguish Digi-Dan's answers from Dennett's own answers.

Picture now a strange AI philosopher -- DigiDan improved.  This AI system will produce philosophical texts very differently than we do.  It need not be fully superhuman in its capacities to be interesting.  It might even, sometimes, strike us as remarkably, foolishly wrong.  (In fairness, other human philosophers sometimes strike me the same way.)  But even if subhuman in some respects, if this AI philosopher also sometimes produces strange but brilliant texts -- analogous to the strange but brilliant moves of AlphaGo, texts that no human philosopher would create but which on careful study contain intriguing philosophical moves -- it could be a philosophical interlocutor of substantial interest.

Philosophy, I have long argued, benefits from including people with a diversity of perspectives.  Strange AI might also be appreciated as a source of philosophical cognitive diversity, occasionally generating texts that contain sparks of something genuinely new, different, and worthwhile that would not otherwise exist.

[1] Kendra Chilson is not the first to use the phrase "strange intelligence" with this meaning in an AI context, but the usage was new to me; and perhaps through her work it will catch on more widely.