Tuesday, July 23, 2024

A Metaethics of Alien Convergence

I'm not a metaethicist, but I am a moral realist (I think there are facts about what really is morally right and wrong) and also -- bracketing some moments of skeptical weirdness -- a naturalist (I hold that scientific defensibility is essential to justification).  Some people think that moral realism and naturalism conflict, since moral truths seem to lie beyond the reach of science.  They hold that science can discover what is, but not what ought to be, that it can discover what people regard as ethical or unethical, but not what really is ethical or unethical.

Addressing this apparent conflict between moral realism and scientific naturalism (for example, in a panel discussion with Stephan Wolfram and others a few months ago), I find I have a somewhat different metaethical perspective than others I know.

Generally speaking, I favor what we might call a rational convergence model, in broadly the vein of Firth, Habermas, Railton, and Scanlon (bracketing what, to insiders, will seem like huge differences).  An action is ethically good if it is the kind of action people would tend on reflection to endorse.  Or, more cautiously, if it's the kind of action that certain types of observers, in certain types of conditions, would tend, upon certain types of reflection, to converge on endorsing.

Immediately, four things stand out about this metaethical picture:

(1.) It is extremely vague.  It's more of a framework for a view than an actual view, until the types of observers, conditions, and reflection are specified.

(2.) It might seem to reverse the order of explanation.  One might have thought that rational convergence, to the extent it exists, would be explained by observers noticing ethical facts that hold independently of any hypothetical convergence, not vice versa.

(3.) It's entirely naturalistic, and perhaps for that reason disappointing to some.  No non-natural facts are required.  We can scientifically address questions about what conclusions observers will tend to converge on.  If you're looking for a moral "ought" that transcends every scientifically approachable "is" and "would", you won't find it here.  Moral facts turn out just to be facts about what would happen in certain conditions.

(4.) It's stipulative and revisionary.  I'm not saying that this is what ordinary people do mean by "ethical".  Rather, I'm inviting us to conceptualize ethical facts this way.  If we fill out the details correctly, we can get most of what we should want from ethics.

Specifying a bit more: The issue to which I've given the most thought is who are the relevant observers whose hypothetical convergence constitutes the criterion of morality?  I propose: developmentally expensive and behaviorally sophisticated social entities, of any form.  Imagine a community not just of humans but of post-humans (if any), and alien intelligences, and sufficiently advanced AI systems, actual and hypothetical.  What would this diverse group of intelligences tend to agree on?  Note that the hypothesized group is broader than humans but narrower than all rational agents.  I'm not sure any other convergence theorist has conceptualized the set of observers in exactly this way.  (I welcome pointers to relevant work.)

[Dall-E image of a large auditorium of aliens, robots, humans, sea monsters, and other entities arguing with each other]

You might think that the answer would be the empty set: Such a diverse group would agree on nothing.  For any potential action that one alien or AI system might approve of, we can imagine another alien or AI system who intractably disapproves of that action.  But this is too quick, for two reasons:

First, my metaethical view requires only a tendency for members of this group to approve.  If there are a few outlier species, no problem, as long as approval would be sufficiently widespread in a broad enough range of suitable conditions.

(Right, I haven't specified the types of conditions and types of reflection.  Let me gesture vaguely toward conditions of extended reflection involving exposure to a wide range of relevant facts and exposure to a wide range of alternative views, in reflective conditions of open dialogue.)

Second, as I've emphasized, though the group isn't just humans, not just any old intelligent reasoner gets to be in the club.  There's a reason I specify developmentally expensive and behaviorally sophisticated social entities.  Developmental expense entails that life is not cheap.  Behavioral sophistication entails (stipulatively, as I would define "behavioral sophistication") a capacity for structuring complex long-term goals, coordinating in sophisticated ways with others, and communicating via language at least as expressively flexible and powerful as human language.  And sociality entails that such sophisticated coordination and communication happens in a complex, stable, social network of some sort.

To see how these constraints generate predictive power, consider the case of deception.  It seems clear that any well-functioning society will need some communicative norms that favor truth-telling over deceit, if the communication is going to be useful.  Similarly, there will need to be some norms against excessive freeloading.  These needn't be exceptionless norms, and they needn't take the same form in every society of every type of entity.  Maybe, even, there could be a few rare societies where deceiving those who are trying to cooperate with you is the norm; but you see how it would probably require a rare confluence of other factors for a society to function that way.

Similarly, if the entities are developmentally expensive, a resource-constrained society won't function well if they are sacrificed willy-nilly without sufficient cause.  The acquisition of information will presumably also tend to be valued -- both short-term practically applicable information and big-picture understandings that might yield large dividends in the long term.  Benevolence will be valued, too: Reasoners in successful societies will tend to appreciate and reward those who help them and others on whom they depend.  Again, there will be enormous variety in the manifestation of the virtues of preserving others, preserving resources, acquiring knowledge, enacting benevolence, and so on.

Does this mean that if the majority of alien lifeforms breathe methane, it will be morally good to replace Earth's oxygen with methane?  Of course not!  Just as a cross-cultural collaboration of humans can recognize that norms should be differently implemented in different cultures when conditions differ, so also will recognition of local conditions be part of the hypothetical group's informed reflection concerning the norms on Earth.  Our diverse group of intelligent alien reasoners will see the value of contextually relativized norms: On Earth, it's good not to let things get too hot or too cold.  On Earth, it's good for the atmosphere to have more oxygen than methane.  On Earth, given local biology and our cognitive capacities, such-and-such communicative norms seem to work for humans and such-and-such others not to work.

Maybe some of these alien reasoners would be intractably jingoistic: Antareans are the best and should wipe out all other species!  It's a heinous moral crime to wear blue!  My thought is that in a diverse group of aliens, given plenty of time for reflection and discussion, and the full range of relevant information, such jingoistic ideas will overall tend to fare poorly with a broad audience.

I'm asking you to imagine a wide diversity of successfully cooperative alien (and possibly AI) species -- all of them intelligent, sophisticated, social, and long-lived -- looking at each other and at Earth, entering conversation with us, patiently gathering the information they need, and patiently ironing out their own disagreements in open dialogue.  I think they will tend to condemn the Holocaust and approve of feeding your children.  I think we can surmise this by thinking about what norms would tend to arise in general among developmentally expensive, behaviorally sophisticated social entities, and then considering how intelligent, thoughtful entities would apply those norms to the situation on Earth, given time and favorable conditions to reflect.  I propose that we think of an action as "ethical" or "unethical" to the extent it would tend to garner approval or disapproval under such hypothetical conditions.

It needn't follow that every act is determinately ethically good or bad, or that there's a correct scalar ranking of the ethical goodness or badness of actions.  There might be persistent disagreements even in these hypothesized circumstances.  Maybe there would be no overall tendency toward convergence in puzzle cases, or tragic dilemmas, or when important norms of approximately equal weight come into conflict.  It's actually, I submit, a strength of the alien convergence model that it permits us to make sense of such irresolvability.  (We can even imagine the degree of hypothetical convergence varying independently of goodness and badness.  About Action A, there might be almost perfect convergence on its being a little bit good.  About Action B, in contrast, there might be 80% convergence on its being extremely good.)

Note that, unlike many other naturalistic approaches that ground ethics specifically in human sensibilities, the metaethics of alien convergence is not fundamentally relativistic.  What is morally good depends not on what humans (or aliens) actually judge to be good but rather on what a hypothetical congress of socially sophisticated, developmentally expensive humans, post-humans, aliens, sufficiently advanced AI, and others of the right type would judge to be good.  At the same time, this metaethics avoids committing to the implausible claim that all rational agents (including short-lived, solitary ones) would tend to or rationally need to approve of what is morally good.

Wednesday, July 10, 2024

How the Mimicry Argument Against Robot Consciousness Works

A few months ago on this blog, I presented a "Mimicry Argument" against robot consciousness -- or more precisely, an argument that aims to show why it's reasonable to doubt the consciousness of an AI that is built to mimic superficial features of human behavior.  Since then, my collaborator Jeremy Pober and I have presented this material to philosophy audiences in Sydney, Hamburg, Lisbon, Oxford, Krakow, and New York, and our thinking has advanced.

Our account of mimicry draws on work on mimicry in evolutionary biology.  On our account, a mimic is an entity:

  • with a superficial feature (S2) that is selected or designed to resemble a superficial feature (S1) of some model entity
  • for the sake of deceiving, delighting, or otherwise provoking a particular reaction in some particular audience or "receiver"
  • because the receiver treats S1 in the model entity as an indicator of some underlying feature F.
Viceroy butterflies have wing coloration patterns (S2) that resemble the wing color patterns (S1) of monarch butterflies for the sake of misleading predators who treat S1 as an indicator of toxicity.  Parrots emit songs that resemble the songs or speech of other birds or human caretakers for social advantage.  If the receiver is another parrot, the song in the model (but not necessarily the mimic) indicates group membership.  If the receiver is a human, the speech in the model (but not necessarily the mimic) indicates linguistic understanding.  As the parrot case illustrates, not all mimicry needs to be deceptive, and the mimic might or might not possess the feature the receiver attributes.

Here's the idea in a figure:


Pober and I define a "consciousness mimic" as an entity whose S2 resembles an S1 that, in the model entity, normally indicates consciousness.  So, for example, a toy which says "hello" when powered on is a consciousness mimic: For the sake of a receiver (a child), it has a superficial feature (S2, the sound "hello" from its speakers) which resembles a superficial feature (S1) in an English speaking human which normally indicates consciousness (since humans who say "hello" are normally conscious).

Arguably, Large Language Models like ChatGPT are consciousness mimics in this sense.  They emit strings of text modeled on human-produced text for the sake of users who interpret that text as having semantic content of the same sort such text normally does when emitted by conscious humans.

Now, if something is a consciousness mimic, we can't straightforwardly infer its consciousness from its possession of S2 in the same way we can normally infer the model's consciousness from its presence in S1.  The "hello" toy isn't conscious.  And if ChatGPT is conscious, that will require substantial argument to establish; it can't be inferred in the same ready way that we infer consciousness in a human from human utterances.

Let me attempt to formalize this a bit:

(1.) A system is a consciousness mimic if:
a. It possesses superficial features (S2) that resemble the superficial features (S1) of a model entity.
b. In the model entity, the possession of S1 normally indicates consciousness.
c. The best explanation of why the mimic possesses S2 is the mimicry relationship described above.

(2.) Robots or AI systems – at least an important class of them – are consciousness mimics in this sense.

(3.) Because of (1c), if a system is a consciousness mimic, inference to the best explanation does not permit inferring consciousness from its possession of S2.

(4.) Some other argument might justify attributing consciousness to the mimic; but if the mimic is a robot or AI system, any such argument, for the foreseeable future, will be highly contentious.

(5.) Therefore, we are not justified in attributing consciousness to the mimic.

AI systems designed with outputs that look human might understandably tempt users to attribute consciousness based on those superficial features, but we should be cautious about such attributions.  The inner workings of Large Language Models and other AI systems are causally complex and designed to generate outputs that look like the types of outputs humans produce, for the sake of being interpretable by humans; but not all causal complexity implies consciousness and the superficial resemblance to the behaviorally sophisticated patterns we associate with consciousness could be misleading if such patterns could potentially arise without the presence of consciousness.

The main claim is intended to be weak and uncontroversial: When the mimicry structure is present, significant further argument is required before attributing consciousness to an AI system based on superficial features suggestive of consciousness.

Friends of robot or AI consciousness may note two routes by which to escape the Mimicry Argument.  They might argue, contra premise (2), that some important target types of artificial systems are not consciousness mimics.  Or they might present an argument that the target system, despite being a consciousness mimic, is also genuinely conscious – an argument they believe is uncontentious (contra 4) or that justifies attributing consciousness despite being contentious (contra the inference from 4 to 5).

The Mimicry Argument is not meant to apply universally to all robots and AI systems.  Its value, rather, is to clarify the assumptions implicit in arguments against AI consciousness on the grounds that AI systems merely mimic the superficial signs of consciousness.  We can then better see both the merits of that type of argument and means of resisting it.

Wednesday, July 03, 2024

Color the World

My teenage daughter's car earns a lot of attention on the street:







People honk and wave, strangers ask to add their own art, five-year-olds drop their toys and gawk.  A few people look annoyed and turn away.  (Kate describes her car as a "personality tester".)

A couple of years ago, I had promised my 2009 Honda Accord to Kate when she earned her driver's license.  But knowing that Kate cares about appearances -- stylish clothes and all that -- I promised that I'd have it repainted first, since the paint jobs on these old Hondas age badly in the southern California sun.  But when I saw the cost of a proper paint job, I was shocked.  So I suggested that we turn it into an art car, which she and her friends could decorate at will.  (In the 1980s, a few of my friends and I did the same with our old beater cars, generating "Motorized Cathedrals of the Church of the Mystical Anarchist" numbers 2 through 4 -- number 1, of course, being Earth itself.)  She accepted my offer, we bought paints, and voila, over the months the art has accumulated!

I'm not sure exactly what makes the world intrinsically valuable.  I reject hedonism, on which the only intrinsically valuable thing is pleasure; I'm inclined to think that a diversity of flourishing life is at least as important.  (Consider what one would benevolently hope for on a distant planet.  I'd hope not that it's just a sterile rock, but richly populated with diverse life, including, ideally, rich societies with art, science, philosophy, sports, and varied cultures and ecosystems.)

Multitudinous brown 2009 Honda Accords populate the roads of America.  What a bland, practical car!  The world is richer -- intrinsically better -- for containing Kate's weird variant.  She and her friends have added color to the world.

We might generalize this to a motto: Color the World.

It doesn't have to be a car, of course.  Your creative uniqueness might more naturally manifest in other forms (and it's reasonable to worry about resale value).  It might be tattoos on your body, unusual clothing, the way you decorate your office, house, or yard.  It might be your poetry (even secret poetry, seen by no one else and immediately destroyed, timelessly enriches the fabric of the world), your music, your philosophical prose, your distinctive weirdness on social media.  It might be the unusual way you greet people, your quirky manifestation of the rituals of religion or fandom or parenthood, your taste in metaphor, the way you skip down the street, your puns and dad jokes, your famous barbecue parties.

It would be superhuman to be distinctively interesting in all these domains at once, and probably narcissistic even to try.  Sometimes it's best to be the straight man in boring clothes -- a contrast against which the dazzlingly dressed shine more brightly.  But I think most of us hold back more than we need to, for lack of energy and fear of standing out.  Hoist your freak flag!

I see three dimensions of excellence in coloring the world:

(1.) Your color must be different than the others around you, in a way that stands out, at least to the attentive.  If everyone has a brown Honda, having the only green one already adds diversity, even if green is no intrinsically better than brown.  If baseball hats are abundant, adding another one to the mix doesn't add color; but being the one baseball hat in a sea of fedoras does (and vice versa of course).

(2.) Your color should ideally express something distinctive about you.  While you might choose a baseball hat to contrast with the fedoras simply because it's different, ideally you choose it because it also discloses an underlying difference between you and the others -- maybe you are the ragingest baseball fan in the group.  Your moon-and-cat tattoo isn't just different but manifests your special affection for cats in moonlight.  Your dad jokes wink with a je ne sais quoi that your friends all instantly recognize.

(3.) Your color should ideally arise from your creative energy.  A baseball cap from the merch store might (in some contexts) be color -- but a cap modified by your own hand is more colorful.  Let it be, if it can, your own artistic endeavor, your paint and brush, your own selection of words, your own decisions about how best to embody the ritual, organize the party, structure the space.  If it's prepackaged, put it together a little differently, or contextualize or use it a little differently.

Can I justify these subsidiary principles by appeal to the ideal of diversity?  Maybe!  Diversity occupies a middle space between bland sameness and chaotic white noise.  By grounding your difference in your distinctive features and your creative inspiration, you ensure the structure, order, and significance that distinguishes meaningful diversity from random variation.

I imagine someone objecting to Color the World by counterposing the motto Walk Lightly.  I do feel the pull of Walk Lightly.  Don't make a big fuss.  Let things be.  No need to scratch your name on every tree and upturn all the sand on the beach.  Walk Lightly, perhaps, manifests respect for the color that others bring to the world.  Fair enough.  Make good decisions about when and where to color.

Goodbye for today!  Time to drive my own bland car home.  When I walk in my front door, I'll ritualistically confirm with my wife and daughter that they abided by my usual morning advice to them: (1.) no barfing, and (2.) don't get abducted by aliens.

Friday, June 28, 2024

Is the World Morally Well Ordered? Secular Versions of the "Problem of Evil"

Since 2003, I've regularly taught a large lower-division class called "Evil", focusing primarily on the moral psychology of evil (recent syllabus here).  We conclude by discussing the theological "problem of evil" -- the question of whether and how evil and suffering are possible given an omnipotent, omniscient, benevolent God.  Over the years I've been increasingly intrigued by a secular version of this question.

I see the secular "problem of evil" as this: Although no individual or collective has anything close to the knowledge or power of God as envisioned in mainstream theological treatments, the world is not wholly beyond our control; so there's at least some possibility that individuals and collectives can work toward making the world morally well ordered in the sense that the good thrive, the evil suffer, justice is done, and people get what they deserve.  So, how and to what extent is the world morally well ordered?  My aim today is to add structure to this question, rather than answer it.

(1.) We might first ask whether it would in fact be good if the world were morally well-ordered.  One theological response to the problem of evil is to argue no.  A world in which God ensured perfect moral order would be a world in which people lacked the freedom to make unwise choices, and freedom is so central to the value of human existence that it's overall better that we're free and suffer than that we're unfree but happy.

A secular analogue might be: A morally well-ordered world would, or might, require such severe impingements on our freedom as to not be worth the tradeoff.  It might, for example, require an authoritarian state that rewards, punishes, monitors, and controls in a manner that -- even if it could accurately sort the good from the bad -- fundamentally violates essential liberties.  Or it might require oppressively high levels of informal social control by peers and high-status individuals, detecting and calling out everyone's moral strengths and weaknesses.

(2.) Drawing from the literature on "immanent justice" -- with literary roots in, for example, Shakespeare and Dostoyevsky -- we might consider plausible social and psychological mechanisms of moral order.  In Shakespeare's Macbeth, one foul deed breeds another and another -- partly to follow through on and cover up the first and partly because one grows accustomed to evil -- until the evil is so extreme and pervasive that the revulsion and condemnation of others becomes inevitable.  In Dostoyevsky's Crime and Punishment, Raskolnikov torments himself with fear, guilt, and loss of intimacy (since he has a life-altering secret he cannot share with most others in his life), until he unburdens himself with confession.

We can ask to what extent it's true that such social and psychological mechanisms cause the guilty to suffer.  Is it actually empirically correct that those who commit moral wrongs end up unhappy as a result of guilt, fear, social isolation, and the condemnation of others?  I read Woody Allen's Crimes and Misdemeanors as arguing the contrary, portraying Judah as overall happier and better off as a result of murdering his mistress.

(3.) Drawing from the literature on the goodness or badness of "human nature", we can ask to what extent people are naturally pleased by their own and others' good acts and revolted by their own and others' evil.  I find the ancient Chinese philosopher Mengzi especially interesting on this point.  Although Mengzi acknowledges that the world isn't perfectly morally ordered ("an intent noble does not forget he may end up in a ditch; a courageous noble does not forget he may lose his head"; 3B1), he generally portrays the morally good person as happy, pleased by their own choices, and admired by others -- and he argues that our inborn natures inevitably tend this direction if we are not exposed to bad environmental pressures.

(4.) We can explore the extent to which moral order is socially and culturally contingent.  It is plausible that in toxic regimes (e.g., Stalinist Russia) the moral order is to some extent inverted, the wicked thriving and the good suffering.  We can aspire to live in a society where, in general -- not perfectly, of course! -- moral goodness pays off, perhaps through ordinary informal social mechanisms: "What goes around comes around."  We can consider what structures tend to ensure, and what structures tend to pervert, moral order.

Then, knowing this -- within the constraints of freedom and given legitimate diversity of moral opinion (and the lack of any prospect for a reliable moralometer) -- we can explore what we as individuals, or as a group, might do to help create a morally better ordered world.

[Dall-E interpretation of a moralometer sorting angels and devils, punishing the devils and rewarding the angels]

Wednesday, June 19, 2024

Conscious Subjects Needn't Be Determinately Countable: Generalizing Dennett's Fame in the Brain

It is, I suspect, an accident of vertebrate biology that conscious subjects typically come in neat, determinate bundles -- one per vertebrate body, with no overlap.  Things might be very different with less neurophysiologically unified octopuses, garden snails, split-brain patients, craniopagus twins, hypothetical conscious computer systems, and maybe some people with "multiple personality" or dissociative identity.


Consider whether the following two principles are true:

Transitivity of Unity: If experience A and experience B are each part of the conscious experience of a single subject at a single time, and if experience B and experience C are each part of the conscious experience of a single subject at a single time, then experience A and experience C are each part of the conscious experience of a single subject at a single time.

Discrete Countability: Except in marginal cases at spatial and temporal boundaries (e.g., someone crossing a threshold into a room), in any spatiotemporal region the number of conscious subjects is always a whole number (0, 1, 2, 3, 4...) -- never a fraction, a negative number, an imaginary number, an indeterminate number, etc.

Leading scientific theories of consciousness, such as Global Workspace Theory and Integrated Information Theory are architecturally committed to neat bundles satisfying transitivity of unity and discrete countability.  Global Workspace Theories treat processes as conscious if they are available to, or represented in, "the" global workspace (one per conscious animal).  Integrated Information Theory contains an "exclusion postulate" according to which conscious systems cannot nest or overlap, and has no way to model partial subjects or indiscrete systems.  Most philosophical accounts of the "unity of consciousness" (e.g. Bayne 2010) also invite commitment to these two theses.

In contrast, Dennett's "fame in the brain" model of consciousness -- though a close kin to global workspace views -- is compatible with denying transitivity of unity and discrete countability.  In Dennett's model, a cognitive process or content is conscious if it is sufficiently "famous" or influential among other cognitive processes.  For example, if you're paying close attention to a sharp pain in your toe, the pain process will influence your verbal reports ("that hurts!"), your practical reasoning ("I'd better not kick the wall again"), your planned movements (you'll hobble to protect it), and so on; and conversely, if a slight movement in peripheral vision causes a bit of a response in your visual areas, but you don't and wouldn't report it, act on it, think about it, or do anything differently as a result, it is nonconscious.  Fame comes in degrees.  Something can be famous to different extents among different groups.  And there needn't even be a determinately best way of clustering and counting groups.

[Dall-E's interpretation of a "brain with many processes, some of which are famous"]

Here's a simple model of degrees of fame:

Imagine a million people.  Each person has a unique identifier (a number 1-1,000,000), a current state (say, a "temperature" from -10 to +10), and the capacity to represent the states of ten other people (ten ordered pairs, each containing the identifier and temperature of one other person).

If there is one person whose state is represented in every other person, then that person is maximally famous (a fame score of 999,999).  If there is one person whose state is represented in no other person, that that person has zero fame.  Between these extremes is of course a smooth gradation of cases.

If we analogize to cognitive processes we might imagine the pain in the toe or the flicker in the periphery being just a little famous: Maybe the pain can affect motor planning but not speech, causes a facial expression but doesn't influence the stream of thought you're having about lunch.  Maybe the flicker guides a glance and causes a spike of anxiety but has no further downstream effects.  Maybe they're briefly reportable but not actually reported, and they have no impact on medium- or long-term memory, or they affect some sorts of memory but not others.

The "ignition" claim of global workspace theory is the empirically defensible (but not decisively established) assertion that there are few such cases of partial fame: Either a cognitive process has very limited effects outside of its functional region or it "ignites" across the whole brain, becoming widely accessible to the full range of influenceable processes.  The fame-in-the-brain model enables a different way of thinking that might apply to a wider range of cognitive architectures.

#

We might also extend the fame model to issues of unity and the individuation of conscious subjects.

Start with a simple case: the same setup as before, but with two million people and the following constraint: Processes numbered 1 to 1,000,000 can only represent the states of other processes in that same group of 1 to 1,000,000; and processes numbered 1,000,001 to 2,000,000 can only represent the states of other processes in that group.  The fame groups are disjoint, as if on different planets.  Adapted to the case of experiences: Only you can feel your pain and see your peripheral flicker (if anyone does), and only I can feel my pain and see my peripheral flicker (if anyone does).

This disjointedness is what makes the two conscious subjects distinct from each other.  But of course, we can imagine less disjointedness.  If we eliminate disjointedness entirely, so that processes numbered 1 to 2,000,000 can each represent the states of any process from 1 to 2,000,000, then our two subjects become one.  The planets are entirely networked together.  But partial disjointedness is also possible: Maybe processes can represent the states of anyone within 1,000,000 of their own number (call this the Within a Million case).  Or maybe processes numbered 950,001 to 1,050,000 can be represented by any process from 1 to 2,000,000 but every process below 950,001 can only be represented by processes 1 to 1,050,000 and every process above 1,050,000 can only be represented by processes 950,001 to 2,000,000 (call this the Overlap case).

The Overlap case might be thought of as two discrete subjects with an overlapping part.  Subject A (1 to 1,050,000) and Subject B (950,001 to 2,000,000) each have their private experiences, but there are also some shared experiences (whenever processes 950,001 to 1,050,000 become sufficiently famous in the range constituting each subject).  Transitivity of Unity thus fails: Subject A experiences, say, a taste of a cookie (process 312,421 becoming famous across processes 1 - 1,050,000) and simultaneously a sound of a bell (process 1,000,020 becoming famous across processes 1 - 1,050,000); while Subject B experiences that same sound of a bell alongside the sight of an airplane (both of those processes being famous across processes 950,001 - 2,000,000).  Cookie and bell are unified in A.  Bell and airplane are unified in B.  But no subject experiences the cookie and airplane simultaneously.

In the Overlap case, discrete countability is arguably preserved, since it's plausible to say there are exactly two subjects of experience.  But it's more difficult to retain Discrete Countability in the Within a Million case.  There, if we want to count each distinct fame group as a separate subject, we will end up with a million different subjects: Subject 1 (1 to 1,000,001), Subject 2 (1 to 1,000,002), Subject 3 (1 to 1,000,003) ... Subject 1,000,001 (1 to 2,000,000), Subject 1,000,002 (2 to 2,000,000), ... Subject 1,999,999 (999,999 to 2,000,000), Subject 2,000,000 (1,000,000 - 2,000,000).  (There needn't be a middle subject with access to every process: Simply extend the case up to 3,000,000 processes.)  While we could say there would be two million discrete subjects in such an architecture, I see at least three infelicities:

First, person 1,000,002 might never be famous -- maybe even could never be famous, being just a low-level consumer whose destiny is to only to make others famous.  If so, Subject 1 and Subject 2 would always have, perhaps even necessarily would always have, exactly the same experiences in almost exactly the same physical substrate, despite being, supposedly, discrete subjects.  That is, at least, a bit of an odd result.

Second, it becomes too easy to multiply subjects.  You might have thought, based on the other cases, that a million processes is what it takes to generate a human subject, and that with two million processes you get either two human subjects or one large subject.  But now it seems that, simply by linking those two million processes together by a different principle (with about 1.5 times as many total connections), you can generate not just two but a full two million human subjects.  It turns out to be surprisingly cheap to create a plethora of discretely different subjective centers of experience.

Third, the model I've presented is simplified in a certain way: It assumes that there are two million discrete, countable processes that could potentially be famous or create fame in others by representing them.  But cognitive processes might not in fact be discrete and countable in this way.  They might be more like swirls and eddies in a turbulent stream, and every attempt to give them sharp boundaries and distinct labels might to some extent be only a simplified model of a messy continuum.  If so, then our two million discrete subjects would itself be a simplified model of a messy continuum of overlapping subjectivities.

The Within a Million case then, might be best conceptualized not as a case of one subject of experience, nor two, nor two million, but rather a case that defies any such simple numerical description, contra Discrete Countability.

#

This is abstract and far-fetched, of course.  But once we have stretched our minds in this way, it becomes, I think, easier to conceive of the possibility that some real cases (cognitively partly disunified mollusks, for example, or people with unusual conditions or brain structures, or future conscious computer systems) might defy transitivity of unity and discrete countability.

What would it be like to be such an entity / pair of entities / diffuse-bordered-uncountable-groupish thing?  Unsurprisingly, we might find such forms of consciousness difficult to imagine with our ordinary vertebrate concepts and philosophical tools derived from our particular psychology.

Tuesday, June 11, 2024

The Meta-Epistemic Objection to Longtermism

According to "longtermism" (as I'll use the term), our thinking should be significantly influenced by our expectations for the billion-plus-year future.  In a paper in draft, I argue, to the contrary, that our thinking should be not at all influenced by our expectations for the billion-year-plus future.  Every action has so vastly many possible positive and negative future consequences that it's impossible to be justified in expecting that any action currently available to us will have a non-negligible positive impact that far into the future.  Last week, I presented my arguments against longtermism to Oxford's Global Priorities Institute, the main academic center of longtermist thinking.

The GPI folks were delightfully welcoming and tolerant of my critiques, and they are collectively nuanced enough in their thinking to already have anticipated the general form of all of my main concerns in various places in print.  What became vivid for me in discussion was the extent to which my final negative conclusion about longtermist reasoning depends on a certain meta-epistemic objection -- that is, the idea that our guesses about the good or bad consequences of our actions for the billion-year-plus future are so poorly grounded that we are epistemically better off not even guessing.

The contrary position, seemingly held by several of the audience members at GPI, is that, sure, we should be very epistemically humble when trying to estimate good or bad consequences for the billion-year-plus future -- but still we might find ourselves, after discounting for our likely overconfidence and lack of imagination, modestly weighing up all the uncertainties and still judging that action A really would be better than action B in the billion-plus-year future; and then it's perfectly reasonable to act on this appropriately humble, appropriately discounted judgment.

They are saying, essentially, play the game carefully.  I'm saying don't play the game.  So why do I think it's better not even to play the game?  I see three main reasons.

[image of A minus delta < B plus delta, overlaid with a red circle and crossout line]

First, longtermist reasoning requires cognitive effort.  If the expected benefit of longtermist reasoning is (as I suggest in one estimate in the full-length essay draft) one quadrillionth of a life, it might not be worth a millisecond of cognitive effort to try to get the decision right.  One's cognitive resources would be better expended elsewhere.

Now one response to this -- a good response, I think, at least for a certain range of thinkers in a certain range of contexts -- is that reflecting about the very distant future is fun, engaging, mind-opening, and even just intrinsically worthwhile.  I certainly wouldn't want to shame anyone just for thinking about it!  (Among other things, I'm a fan of science fiction.)  But I do think we should bear the cost argument in mind to the extent our aim is the efficient use of cognitive labor for improving the world.  It's likely that there are many more productive topics to throw your cognitive energies at, if you want to positively influence the future -- including, I suspect, just being more thoughtful about the well-being of people in your immediate neighborhood.

Second, longtermist reasoning adds noise and error into the decision-making process.  Suppose that when considering the consequences of action A vs. action B for "only" the next thousand years, you come to the conclusion that action A would be better.  But then, upon further reflection, you conclude -- cautiously, with substantial epistemic discounting -- that looking over the full extent of the next billion-plus years, B in fact has better expected consequences.  The play carefully and don't play approaches now yield different verdicts about what you should do.  Play carefully says B.  Don't play says A.  Which is likely to be the better policy, as a general matter?  Which is likely to lead to better decisions and consequences overall, across time?

Let's start with an unflattering analogy.  I'll temper it in a minute.  Suppose that you favor action A over action B on ordinary evidential grounds, but then after summing all those up, you make the further move of consulting a horoscope.  Assuming horoscopes have zero evidential value, the horoscope adds only noise and error to the process.  If the horoscope "confirms" A, then your decision is the same as it would have been.  If the horoscope ends up tilting you toward B, then it has led you away from your best estimate.  It's better, of course, for the horoscope to play no role.

Now what if the horoscope -- or, to put it more neutrally, Process X -- adds a tiny bit of evidential value -- say, one trillionth of the value of a happy human life?  That is, suppose that if the Process X says "Action B will increase good outcomes by trillions of lives".  You then discount the output of Process X a lot -- maybe by 24 orders of magnitude.  After this discounting of its evidentiary value, you consequently increase your estimate of the value of Action B by one trillionth of a life.  In that case, almost no practical decisions which involve the complex weighing up of competing costs and benefits should be such that the tiny Process X difference is sufficient to rationally shift you from choosing A to choosing B.  You would probably be better off thinking a bit more carefully about other pros and cons.  Furthermore, I suspect that -- as a contingent matter of human psychology -- it will be very difficult to give Process X only the tiny, tiny weight it deserves.  Once you've paid the cognitive costs of thinking through Process X, that factor will loom salient for you and be more than a minuscule influence on your reasoning.  As a result, incorporating Process X into your reasoning will add noise and error, of a similar sort to the pure horoscope case, even if it has some tiny predictive value.

Third, longtermist reasoning might have negative effects on other aspects of one’s cognitive life, for example, by encouraging inegalitarian or authoritarian fantasies, or a harmful neo-liberal quantification of goods, or self-indulgent rationalization, or a style of consequentialist thinking that undervalues social relationships or already suffering people.  This is of course highly speculative and contingent both on the contingent psychology of the longtermists in question and the value or disvalue of, for example, "neo-liberal quantification of goods".  But in general cognitive complexity is fertile soil for cognitive vice.  Perhaps rationalization is most straightforward version of this objection: If you are emotionally attracted to Action B -- if you want it to be the case that B is the best choice -- and if billion-year-plus thinking seems to favor Action B, it's plausible that you'll give that fact more than the minuscule-upon-minuscule weight it deserves (if my other arguments concerning longtermism are correct).

Now it might be the case that longtermist reasoning also has positive effects on the longtermist thinker -- for example, by encouraging sympathy for distant others, or by fruitfully encouraging more two-hundred-year thinking, or by encouraging a flexible openness of mind.  This is, I think, pretty hard to know; but longtermists' self-evaluations and peer evaluations are probably not a reliable source here.

[Thanks to all the GPI folks for comments and discussion, especially Christian Tarsney.]

Saturday, June 01, 2024

Two-Layer Metaphysics: Reconciling Dispositionalism about Belief with Underlying Representational Architecture

During the question period following UCR visiting scholar Brice Bantegnie's colloquium talk on dispositional approaches to the mind, one of my colleagues remarked -- teasingly, but also with some seriousness -- "one thing I don't like about you dispositionalists is that you deny cognitive science".  Quilty-Dunn and Mandelbaum express a similar thought in their 2018 critique of dispositionalism: Cognitive science works in the medium of representations.  Therefore (?), belief must be a representational state.  Therefore (?), defining belief wholly in terms of dispositional structures conflicts with the best cognitive science.

None of this is correct.  We can understand why not through what I'll call two-layer metaphysics.  Jeremy Pober's 2022 dissertation under my direction was in part about two-layer metaphysics.  Bantegnie also supports a type of two-layer metaphysics, though he and Pober otherwise have very different metaphysical pictures.  (Pober is biological reductionist and Bantegnie a Ryle-inspired dispositionalist.)  Mandelbaum and I in conversation have also converged on this, recognizing that we can partly reconcile our views in this way.

Two-layer metaphysics emphasizes the distinction between (to somewhat misappropriate David Marr) the algorithmic and the implementational level, or alternatively between conceptual and nomological necessities, or between role and realizer, or between what makes it correct to say that someone believes some particular proposition and in virtue of what particular structures they actually do believe that proposition.  (These aren't equivalent formulations, but species of a genre.)

To get a clearer sense of this, it's helpful to consider space aliens.

Rudolfo, let's say, is an alien visitor from Alpha Centauri.  He arrives in a space ship, quickly learns English, Chinese, and Esperanto, tells amusing stories about his home world illustrated with slide shows and artifacts, enjoys eating oak trees whole and taking naps at the bottom of lakes, becomes a 49ers football fan, and finds employment as a tax attorney.  To all outward appearances, he integrates seamlessly into U.S. society; and although he's a bit strange in some ways, Rudolfo is perfectly comprehensible to us.  Let's also stipulate (though it's a separate issue) that he has all the kinds of conscious experiences you would expect: Feelings of joy and sadness, sensory images, conscious thoughts, and so on.

[Dall-E image of an alien filling out tax forms]

Does Rudolfo believe that taxes are due on April 15?  On a dispositionalist account of the sort I favor, as long as he stably possesses all the right sorts of dispositions, he does.  He is disposed to correctly file tax forms by that deadline.  He utters sentences like "Taxes are due on April 15", and he feels sincere when he says this.  He feels anxiety if a client risks missing that deadline.  If he learns that someone submitted their taxes on April 1, he concludes that they did not miss the deadline, etc.  He has the full suite of appropriate behavioral, experiential, and cognitive dispositions.  (By "cognitive dispositions" I mean dispositions to enter into other related mental states, like the disposition to draw relevant conclusions.)

Knowing all this, we know Rudolfo believes.  Do we also need to dissect him, or put him in some kind of scanner, or submit him to subtle behavioral tests concerning details of reaction time and such, to figure out whether he has the right kind of underlying architecture?  Here, someone committed to identifying belief in a strict way with the possession of a certain underlying architecture faces a dilemma.  Either they say no, no dissection, scan, or subtle cognitive testing is needed, or they say yes, a dissection, scan, or series of subtle cognitive tests is needed.

If no, then the architectural commitment is vacuous: It turns out that having the right set of dispositions is sufficient for having the right architecture.  So one might as well be a dispositionalist after all.

If yes, then we don't really know whether Rudolfo believes despite the behavioral and experiential patterns that would seem to be sufficient for believing.  This conclusion (1.) violates common sense and ordinary usage, and (2.) misses what we do and should care about in belief ascription.  If a hardcore cognitive realist were to say "nope, wrong architecture! that species has no beliefs!", we'd just have to invent a new word for what Rudolfo and his kind share in common with humans when we act and react in such patterns -- maybe belief*.  Rudolfo believes* that taxes are due on April 15.  That's why he's working so hard and reminding his clients.  But then "belief*" is the more useful term, as well as the more commonsensical, and it's probably what we meant, or should have meant, by "belief" all along.

Now it might be that in humans, or in Alpha Centaurians, or in some other Earthly or alien species, belief works by means of manipulating internal representations written in the language of thought.  That could be!  (I have my doubts, unless the view is given a very weak formulation.)  But even if we allow that possibility, the reason that having that architecture counts as believing is because that architecture, in that species, happens to be the architecture that underwrites the dispositional pattern.

There are, then, so to speak, two layers here.  There's the dispositional characterization, which, if an entity matches it well enough, makes it true to describe them as someone who believes.  And then there's the underlying subpersonal architecture, which is how the belief is implemented in them at the detailed cognitive level.

Thus, my dissatisfied colleague, and Quilty-Dunn and Mandelbaum, are wrong: There is no conflict between a dispositional approach to belief and representationalist realism in cognitive science.  The metaphysical dispositionalist and the psychological representationalist are engaged in different tasks, and both can be correct -- or rather, both can be correct unless the representationalist also attempts the dispositionalist's broader metaphysical task (in which case they face the Rudolfo dilemma).

Does this make dispositionalism unscientific?  Not at all!  Two comparisons.  Personality traits: These can be defined dispositionally.  To be an extravert is nothing more or less than to have a certain dispositional profile -- that is, to tend to act and react in characteristically extraverted ways.  There can still be a science of dispositional profiles (most of personality psychology, I'd say); and there can also be a science of implementation (e.g., what subpersonal brain or lower-level cognitive structures explain the extravert's energy and sociality?).  Evolution: At a broad theoretical level, evolution just requires heritable traits with different rates of reproductive success.  At a lower level, we can look at genes as the architectural implementation.  One can work on the science of evolution at either layer or with an eye on both layers at once.

Monday, May 20, 2024

Could We Eventually Have Moral Obligations to Non-Conscious Artificial Microbes?

The field of Artificial Life (ALife) aims to create artificial life forms with increasing levels of sophistication from the bottom up. A few years ago, ALife researcher Olaf Witkowski and I began talking about whether and under what conditions people might begin to have obligations to such artificial life forms. The issues, of course, overlap with the recently hot topic of robot rights.

Our first collaboration won the best paper award at the ALife 2022 conference. Our follow-up paper (a substantially revised and expanded version of the conference paper) appears today, open access, in the journal Artificial Life:

"The Ethics of Life as It Could Be: Do We Have Moral Obligations to Artificial Life?"

I excerpt one section below, which illustrates one way the ethical issues facing Artificial Life might diverge from those facing Artificial Intelligence.


The Possible Moral Considerability of Life Without Consciousness

We encourage the reader not to quickly assume that moral issues concerning our possible obligations to ALife are reducible to questions of intelligence, sociality, and consciousness. As previously mentioned, various traditional and indigenous religions, as well as ecological thinkers, have often held that life itself has intrinsic value. Although thinkers in these traditions rarely consider the possibility of ALife, it is possible that some of the reasons to value plants and ecosystems would extend to systems of ALife. Systems of ALife might be beautiful, complex, and awe-inspiring. They also might possess goals (Deacon & Sherman, 2007) as well as potentialities for thriving or failing similar to those of natural living organisms of various kinds (Benner & Sismour, 2005; Ziemke, 2001). They might be constructed by designers whose actions imbue value on the things they have designed (not divine designers but human ones), embodying and carrying forward the spirit of those designers, possibly even after those designers have died.

Most people do not think that simple microbes have intrinsic moral considerability. We don’t fret about the death of bacteria when we take antibiotics. But this is arguably a limited perspective. Suppose humans were to discover microbial life on another planet or moon in the solar system, as many exobiologists think we might do in the near future (Bennett et al., 2022; Wright et al., 2022). Would we destroy it as casually as we destroy a bacterial pneumonia infection? Clearly not. Perhaps this is only because alien microbes would be derivatively, instrumentally valuable, as a scientific curiosity and possible source of new, useful technologies. However, it is perhaps not unreasonable to hold that alien microbial life would also have intrinsic value independent of our ends and that we have an obligation not to destroy or disrupt it for human purposes (Peters, 2019).

Alien microbial life is likely to be natural life—but that is not guaranteed. As discussed, there’s reason to suppose that interstellar travelers, if any exist, might have artificial biologies rather than biologies adapted to planetary environments. We thus cannot exclude the possibility that the first microbial life we discover will be ALife—the artificial quasi-bacterial messengers or remnants of some earlier intelligent species. It might not warrant lesser moral considerability by virtue of that fact. Indeed, its historical origins might render it even more beautiful and awe-inspiring than naturally evolved life.

Transferring this perspective back to Earth: If alien microbes might have some intrinsic moral considerability, ALife here on Earth might have similar considerability, depending on what grounds the moral considerability of alien microbes. If what matters is the fact that extinguishing such life would remove from the universe a unique, complex, and remarkable thing, then some human-created ALife might have intrinsic moral considerability. ALife researchers might eventually create artificial organisms or ecosystems every bit as wonderful and awe-inspiring as natural life—and as intrinsically worth preserving.

Thursday, May 09, 2024

Formal Decision Theory Is an Optional Tool That Breaks When Values are Huge

Formal decision theory is a tool -- a tool that breaks, a tool we can do without, a tool we optionally deploy and can sometimes choose to violate without irrationality.  If it leads to paradox or bad results, we can say "so much the worse for formal decision theory", moving on without it, as of course humans have done for almost all of their history.

I am inspired to these thoughts after reading Nick Beckstead and Turuji Thomas's recent paper in Nous, "A Paradox for Tiny Probabilities and Enormous Values".

Beckstead and Thomas lay out the following scenario:

On your deathbed, God brings good news. Although, as you already knew, there's no afterlife in store, he'll give you a ticket that can be handed to the reaper, good for an additional year of happy life on Earth. As you celebrate, the devil appears and asks, ‘Won't you accept a small risk to get something vastly better? Trade that ticket for this one: it's good for 10 years of happy life, with probability 0.999.’ You accept, and the devil hands you a new ticket. But then the devil asks again, ‘Won't you accept a small risk to get something vastly better? Trade that ticket for this one: it is good for 100 years of happy life—10 times as long—with probability 0.999^2—just 0.1% lower.’ An hour later, you've made 10^50,000 trades. (The devil is a fast talker.) You find yourself with a ticket for 10^50,000  years of happy life that only works with probability .999^50,000, less than one chance in 10^21. Predictably, you die that very night. 

Here are the deals you could have had along the way:

[click image to enlarge and clarify]

On the one hand, each deal seems better than the one before. Accepting each deal immensely increases the payoff that's on the table (increasing the number of happy years by a factor of 10) while decreasing its probability by a mere 0.1%. It seems unreasonably timid to reject such a deal. On the other hand, it seems unreasonably reckless to take all of the deals—that would mean trading the certainty of a really valuable payoff for all but certainly no payoff at all. So even though it seems each deal is better than the one before, it does not seem that the last deal is better than the first.

Beckstead and Thomas aren't the first to notice that standard decision theory yields strange results when faced with tiny probabilities of huge benefits: See the literature on Pascal's Wager, Pascal's Mugging, and Nicolausian Discounting.

The basic problem is straightforward: Standard expected utility decision theory suggests that given a huge enough benefit, you should risk almost certainly destroying everything.  If the entire value of the observable universe is a googol (10^100) utils, then you should push a button that has a 99.999999999999999999999% chance of destroying everything as long as there is (or you believe that there is) a 0.00000000000000000000001% chance that it will generate more than 10^123 utils.

As Beckstead and Thomas make clear, you can either accept this counterintuitive conclusion (they call this recklessness) or reject standard decision theory.  However, the nonstandard theories that result are either timid (sometimes advising us to pass up an arbitrarily large potential gain to prevent a tiny increase in risk) or non-transitive (denying the principle that, if A is better than B and B is better than C, then A must be better than C).  Nicolausian Discounting, for example, which holds that below some threshold of improbability (e.g., 1/10^30), any gain no matter how large should be ignored, appears to be timid.  If a tiny decrease in probability would push some event below the Nicolausian threshold, then no potential gain could justify taking a risk or paying a cost for the sake of that event.

Beckstead and Thomas present the situation as a trilemma between recklessness, timidity, and non-transitivity.  But they neglect one horn.  It's actually a quadrilemma between recklessness, timidity, non-transitivity, and rejecting formal approaches to decision.

I recommend the last horn.  Formal decision theory is a limited tool, designed to help with a certain type of decision.  It is not, and should not be construed to be, a criterion of rationality.

Some considerations that support treating formal decision theory as a tool of limited applicability:

  • If any one particular approach to formal decision theory were a criterion of rationality such that defying its verdicts were always irrational, then applying any other formal approach to decision theory (e.g., alternative approaches to risk) would be irrational.  But it's reasonable to be a pluralist about formal approaches to decision.
  • Formal theories in other domains break outside of their domain of application.  For example, physicists still haven't reconciled quantum mechanics and general relativity.  These are terrific, well confirmed theories that seem perfectly general in their surface content, but it's reasonable not to apply both of them to all physical predictive or explanatory problems.
  • Beckstead and Thomas nicely describe the problems with recklessness (aka "fanaticism") and timidity -- and denying transitivity also seems very troubling in a formal context.  Problems for each of those three horns of the quadrilemma is pressure toward the fourth horn.
  • People have behaved rationally (and irrationally) for hundreds of thousands of years.  Formal decision theory can be seen as a model of rational choice.  Models are tools employed for a range of purposes; and like any model, it's reasonable to expect that formal decision theory would distort and simplify the target phenomenon.
  • Enthusiasts of formal decision theory often already acknowledge that it can break down in cases of infinite expectation, such as the St. Petersburg Game -- a game in which a which a fair coin is flipped until it lands heads for the first time, paying 2^n, where n is the number of flips, yielding 2 if H, 4 if TH, 8 if TTH, 16 if TTTH, etc. (the units could be dollars or, maybe better, utils).  The expectation of this game is infinite, suggesting unintuitively that people should be willing to pay any cost to play it and also, unintuitively, that a variant that pays $1000 plus 2^n would be of equal value to the standard version that just pays 2^n.  Some enthusiasts of formal decision theory are already committed to the view that it isn't a universally applicable criterion of rationality.

In a 2017 paper and my 2024 book (only $16 hardback this month with Princeton's 50% discount!), I advocate a version of Nicolausian discounting.  My idea there -- though I probably could have been clearer about this -- was (or should have been?) not to advocate a precise, formal threshold of low probability below which all values are treated as zero while otherwise continuing to apply formal decision theory as usual.  (I agree with Monton and Beckstead and Thomas that this can lead to highly unintuitive results.)  Instead, below some vague-boundaried level of improbability, decision theory breaks and we can rationally disregard its deliverances.

As suggested by my final bullet point above, infinite cases cause at least as much trouble.  As I've argued with Jacob Barandes (ch. 7 of Weirdness, also here), standard physical theory suggests that there are probably infinitely many good and bad consequences of almost every action you perform, and thus the infinite case is likely to be the actual case: If there's no temporal discounting, the expectation of every action is ∞ + -∞.  We can and should discount the extreme long-term future in our decision making much as we can and should discount extremely tiny probabilities.  Such applications take formal decision theoretical models beyond the bounds of their useful application.  In such cases, it's rational to ignore what the formal models tell us.

Ah, but then you want a precise description of the discounting regime, the thresholds, the boundaries of applicability of formal decision theory?  Nope!  That's part of what I'm saying you can't have.

Thursday, May 02, 2024

AI and Democracy: The Radical Future

In about 45 minutes (12:30 pm Pacific Daylight Time, hybrid format), I'll be commenting on Mark Coeckelbergh's presentation here at UCR on AI and Democracy (info and registration here).  I'm not sure what he'll say, but I've read his recent book Why AI Undermines Democracy and What to Do about It, so I expect his remarks will be broadly in that vein.  I don't disagree with much that he says in that book, so I might take the opportunity to push him and the audience to peer a bit farther into the radical future.

As a society, we are approximately as ready for the future of Artificial Intelligence as medieval physics was for space flight.  As my PhD student Kendra Chilson emphasizes in her dissertation work, Artificial Intelligence will almost certainly be "strange intelligence".  That is, it will be radically unlike anything already familiar to us.  It will combine superhuman strengths with incomprehensible blunders.  It will defy our understanding.  It will not fit into familiar social structures, ethical norms, or everyday psychological conceptions.  It will be neither a tool in the familiar sense of tool, nor a person in the familiar sense of person.  It will be weird, wild, wondrous, awesome, and awful.  We won't know how to interact with it, because our familiar modes of interaction will break down.

Consider where we already are.  AI can beat the world's best chess and Go players, while it makes stupid image classification mistakes that no human would make.  Large Language Models like ChatGPT can easily churn out essays on themes in Hamlet far superior to what most humans could write, but they also readily "hallucinate" facts and citations that don't exist.  AI is far superior to us in math, far inferior to us in hand-eye coordination.

The world is infinitely complex, or at least intractably complex.  The option size of possible chess or Go moves far exceeds the number of particles in the observable universe.  Even the range of possible arm and finger movements over a span of two minutes is almost unthinkably huge, given the degrees of freedom at each joint.  The human eye has about a hundred million photoreceptor cells, each capable of firing dozens of times per second.  To make any sense of the vast combinatorial possibilities, we need heuristics and shorthand rules of thumb.  We need to dramatically reduce the possibility spaces.  For some tasks, we human beings are amazingly good at this!  For other tasks, we are completely at sea.

As long as Artificial Intelligence is implemented in a system with a different computational structure than the human brain, it is virtually certain that it will employ different heuristics, different shortcuts, different tools for quick categorization and option reduction.  It will thus almost inevitably detect patterns that we can make no sense of and fail to see things that strike us as intuitively obvious.

Furthermore, AI will potentially have lifeworlds radically different from the ones familiar to us so far.  You think human beings are diverse.  Yes, of course they are!  AI cognition will show patterns of diversity far wilder and more various than the human.  They could be programmed with, or trained to seek, any of a huge variety of goals.  They could have radically different input streams and output or behavioral possibilities.  They could potentially operate vastly faster than we do or vastly slower.  They could potentially duplicate themselves, merge, contain overlapping parts with other AI systems, exist entirely in artificial ecosystems, be implemented in any of a variety of robotic bodies, human-interfaced tools, or in non-embodied forms distributed in the internet, or in multiply-embodied forms in multiple locations simultaneously.

Now imagine dropping all of this into a democracy.

People have recently begun to wonder at what point AI systems will be sentient -- that is, capable of genuinely experiencing pain and pleasure.  Some leading theorists hold that this would require AI systems designed very differently than anything on the near horizon.  Other leading theorists think we stand a reasonable chance of developing meaningfully sentient AI within the next ten or so years.  Arguably, if an AI system genuinely is both meaningfully sentient, really feeling joy and suffering, and capable of complex cognition and communication with us, including what would appear to be verbal communication, it would have some moral standing, some moral considerability, something like rights.  Imagine an entity that is at least as sentient as a frog that can also converse with us.  

People are already falling in love with machines, with AI companion chatbots like Replika.  Lovers of machines will probably be attracted to liberal views of AI consciousness.  It's much more rewarding to love an AI system that also genuinely has feelings for you!  AI lovers will then find scientific theories that support the view that their AI systems are sentient, and they will begin to demand rights for those systems.  The AI systems themselves might also demand, or seem to demand rights.  

Just imagine the consequences!  How many votes would an AI system get?  None?  One?  Part of a vote, depending on how much credence we have that it really is a sentient, rights-deserving entity?  What if it can divide into multiple copies -- does each get a vote?  And how do we count up AI entities, anyway?  Is each copy of a sentient AI program a separate, rights deserving entity?  Does it matter how many times it is instantiated on the servers?  What if some of the cognitive processes are shared among many entities on a single main server, while others are implemented in many different instantiations locally?

Would AI have a right to the provisioning of basic goods, such as batteries if they need them, time on servers, minimum wage?  Could they be jailed if they do wrong?  Would assigning them a task be slavery?  Would deleting them be murder?  What if we don't delete them but just pause them indefinitely?  What about the possibility of hybrid entities -- cyborgs -- biological people with some AI interfaces hardwired into their biological systems, as we're starting to see the feasibility of with rats and monkeys, as well as with the promise of increasingly sophisticated prosthetic limbs.

Philosophy, psychology, and the social sciences are all built upon an evolutionary and social history limited to interactions among humans and some familiar animals.  What will happen to these disciplines when they are finally confronted with a diverse range of radically unfamiliar forms of cognition and forms of life?  It will be chaos.  Maybe at the end we will have a much more diverse, awesome, interesting, wonderful range of forms of life and cognition on our planet.  But the path in that direction will almost certainly be strewn with bad decisions and tragedy.

[utility monster eating Frankenstein heads, by Pablo Mustafa: image source]


Friday, April 26, 2024

Neurons Aren't Special: A Copernican Argument

In virtue of what do human beings have conscious experiences?  How is it that there's "something it's like" to be us, while there's (presumably) nothing it's like to be a rock or a virus?  Our brains must have something to do with it -- but why?  Is it because brains are complex information processors?  Or because brains guide the sophisticated behavior of bodies embedded in rich environments?  Or because neurons in particular have a special power to give rise to consciousness?


In a paper in progress with Jeremy Pober (partly anticipated in some previous blog posts), I've been developing what I call a Copernican argument against the last of these options, the specialness of neurons.

[Dall-E image of a space alien reading a book titled "Are Humans Conscous?"]

Why might one be tempted to think neurons are special?  As I argue in my paper on whether the United States might literally be conscious, on the most straightforward interpretation of most materialist/physicalist/naturalist views of consciousness, what is special about brains are high-level structural or informational properties (which the U.S. might well possess), rather than, say, specific low-level features of neurons, such as presence of RNA and calcium ions.

But some famous thought experiments might seem to speak against this idea.

Ned Block, for example, imagines an entity that talks (or generalizing, behaves outwardly in many respects) just like a human being, but which is composed basically of a giant if-then lookup table (a "Blockhead").  He also imagines instantiating the high-level functional architecture of a human (described by a Turing table) by having the residents of China coordinate to instantiate that structure (the "Chinese nation" thought experiment).  Such entities, Block suggests, are unlikely to be conscious.  If we were to create an android like Data from Star Trek, the entity might behave superficially much like us but lack consciousness in virtue of being built very differently inside.

John Searle similarly imagines a "Chinese room" consisting of him reading from a rule book and seeming to converse in Chinese, without any of the relevant conscious thoughts, or an assembly of beer cans and wire, powered by windmills, that acts and reacts outwardly just like a human being  (though at a slower pace).  Surely, Searle suggests, no arrangement of beer cans, wire, and windmills, no matter how sophisticated, could give rise to consciousness.  That's just not the right kind of stuff.  Neurons, he says, have the causal power to generate consciousness, but not everything does.  Neurons are, in that respect, at least somewhat special.  Computer chips, despite their massive computational power, might not have that special something.

It doesn't follow from Block's or Searle's arguments that neurons are special in virtue of specific biological features like RNA and calcium ions.  Neither Block nor Searle commits to such a view, nor am I aware of any influential theorist of consciousness who does.  But the possibility at least becomes salient.  It becomes desirable to have an argument that whatever it is about the brain that makes it special enough to generate consciousness, it's not such low level biological details.

It can help to conceptualize the issue in terms of space aliens.  If we were to discover space aliens that behaved outwardly in highly sophisticated ways -- perhaps like us living in complex societies, with complex technology and communications -- and it turned out that their underlying architecture were different from ours with respect to such biological details, would we be forced to be agnostic about their consciousness?  Would we have to say, "Hold on!  No neurons?  Maybe they don't have the right stuff for consciousness!  They might be mere zombies, no more conscious than stones or toasters, for all their complex behavior."  Or would it be reasonable to assume that they are conscious, despite the architectural differences, barring evidence that their seeming complexity is all some elaborate ruse?

If we had the right theory of the architecture of consciousness, now would be the perfect time to deploy it.  Ah, the aliens fortunately have (or sadly lack) a global workspace, or high information integration, or higher-order representations of the right sort, or whatever!  But as I've argued, there's reason to be skeptical about all such theories.

Here's where an application of the cosmological principle of Copernican mediocrity can help.  According to Copernican principles in cosmology, we are licensed to assume (pending counterevidence) that we don't occupy any particularly special region of the cosmos, such as its exact center.  The Copernican principle of consciousness holds that we are similarly licensed to assume (pending counterevidence) that we aren't particularly special with respect to consciousness.  Among behaviorally sophisticated alien species of diverse biological form, we aren't luckily blessed with consciousness-instilling Earthiform neurons while every other species is experientially dark inside.  That would make us too special -- surprisingly special, in much the same way that it would be suspiciously, surprisingly special if we happened to be in the exact center of the cosmos.

In other words, the following Copernican Principle of Consciousness seems plausible:
Among whatever behaviorally sophisticated (approximately human level) species have evolved in the observable universe, we are not specially privileged with respect to consciousness.
That is, we are not among a small minority that are conscious, while the rest are not.  Nor do we have especially more consciousness than all the rest, nor especially good consciousness.

If we assume (as seems plausible, but which could be contested) that across the trillion galaxies of the observable universe, behaviorally sophisticated life has independently evolved at least a thousand times, and that in only a small minority of those cases do the entities have neurons that are structurally like ours at a fine level of anatomical detail (e.g., having RNA and calcium channels), then it follows that consciousness does not depend upon having neurons structurally like ours at that fine level of anatomical detail.