Thursday, May 09, 2024

Formal Decision Theory Is an Optional Tool That Breaks When Values are Huge

Formal decision theory is a tool -- a tool that breaks, a tool we can do without, a tool we optionally deploy and can sometimes choose to violate without irrationality.  If it leads to paradox or bad results, we can say "so much the worse for formal decision theory", moving on without it, as of course humans have done for almost all of their history.

I am inspired to these thoughts after reading Nick Beckstead and Turuji Thomas's recent paper in Nous, "A Paradox for Tiny Probabilities and Enormous Values".

Beckstead and Thomas lay out the following scenario:

On your deathbed, God brings good news. Although, as you already knew, there's no afterlife in store, he'll give you a ticket that can be handed to the reaper, good for an additional year of happy life on Earth. As you celebrate, the devil appears and asks, ‘Won't you accept a small risk to get something vastly better? Trade that ticket for this one: it's good for 10 years of happy life, with probability 0.999.’ You accept, and the devil hands you a new ticket. But then the devil asks again, ‘Won't you accept a small risk to get something vastly better? Trade that ticket for this one: it is good for 100 years of happy life—10 times as long—with probability 0.999^2—just 0.1% lower.’ An hour later, you've made 10^50,000 trades. (The devil is a fast talker.) You find yourself with a ticket for 10^50,000  years of happy life that only works with probability .999^50,000, less than one chance in 10^21. Predictably, you die that very night. 

Here are the deals you could have had along the way:

[click image to enlarge and clarify]

On the one hand, each deal seems better than the one before. Accepting each deal immensely increases the payoff that's on the table (increasing the number of happy years by a factor of 10) while decreasing its probability by a mere 0.1%. It seems unreasonably timid to reject such a deal. On the other hand, it seems unreasonably reckless to take all of the deals—that would mean trading the certainty of a really valuable payoff for all but certainly no payoff at all. So even though it seems each deal is better than the one before, it does not seem that the last deal is better than the first.

Beckstead and Thomas aren't the first to notice that standard decision theory yields strange results when faced with tiny probabilities of huge benefits: See the literature on Pascal's Wager, Pascal's Mugging, and Nicolausian Discounting.

The basic problem is straightforward: Standard expected utility decision theory suggests that given a huge enough benefit, you should risk almost certainly destroying everything.  If the entire value of the observable universe is a googol (10^100) utils, then you should push a button that has a 99.999999999999999999999% chance of destroying everything as long as there is (or you believe that there is) a 0.00000000000000000000001% chance that it will generate more than 10^123 utils.

As Beckstead and Thomas make clear, you can either accept this counterintuitive conclusion (they call this recklessness) or reject standard decision theory.  However, the nonstandard theories that result are either timid (sometimes advising us to pass up an arbitrarily large potential gain to prevent a tiny increase in risk) or non-transitive (denying the principle that, if A is better than B and B is better than C, then A must be better than C).  Nicolausian Discounting, for example, which holds that below some threshold of improbability (e.g., 1/10^30), any gain no matter how large should be ignored, appears to be timid.  If a tiny decrease in probability would push some event below the Nicolausian threshold, then no potential gain could justify taking a risk or paying a cost for the sake of that event.

Beckstead and Thomas present the situation as a trilemma between recklessness, timidity, and non-transitivity.  But they neglect one horn.  It's actually a quadrilemma between recklessness, timidity, non-transitivity, and rejecting formal approaches to decision.

I recommend the last horn.  Formal decision theory is a limited tool, designed to help with a certain type of decision.  It is not, and should not be construed to be, a criterion of rationality.

Some considerations that support treating formal decision theory as a tool of limited applicability:

  • If any one particular approach to formal decision theory were a criterion of rationality such that defying its verdicts were always irrational, then applying any other formal approach to decision theory (e.g., alternative approaches to risk) would be irrational.  But it's reasonable to be a pluralist about formal approaches to decision.
  • Formal theories in other domains break outside of their domain of application.  For example, physicists still haven't reconciled quantum mechanics and general relativity.  These are terrific, well confirmed theories that seem perfectly general in their surface content, but it's reasonable not to apply both of them to all physical predictive or explanatory problems.
  • Beckstead and Thomas nicely describe the problems with recklessness (aka "fanaticism") and timidity -- and denying transitivity also seems very troubling in a formal context.  Problems for each of those three horns of the quadrilemma is pressure toward the fourth horn.
  • People have behaved rationally (and irrationally) for hundreds of thousands of years.  Formal decision theory can be seen as a model of rational choice.  Models are tools employed for a range of purposes; and like any model, it's reasonable to expect that formal decision theory would distort and simplify the target phenomenon.
  • Enthusiasts of formal decision theory often already acknowledge that it can break down in cases of infinite expectation, such as the St. Petersburg Game -- a game in which a which a fair coin is flipped until it lands heads for the first time, paying 2^n, where n is the number of flips, yielding 2 if H, 4 if TH, 8 if TTH, 16 if TTTH, etc. (the units could be dollars or, maybe better, utils).  The expectation of this game is infinite, suggesting unintuitively that people should be willing to pay any cost to play it and also, unintuitively, that a variant that pays $1000 plus 2^n would be of equal value to the standard version that just pays 2^n.  Some enthusiasts of formal decision theory are already committed to the view that it isn't a universally applicable criterion of rationality.

In a 2017 paper and my 2024 book (only $16 hardback this month with Princeton's 50% discount!), I advocate a version of Nicolausian discounting.  My idea there -- though I probably could have been clearer about this -- was (or should have been?) not to advocate a precise, formal threshold of low probability below which all values are treated as zero while otherwise continuing to apply formal decision theory as usual.  (I agree with Monton and Beckstead and Thomas that this can lead to highly unintuitive results.)  Instead, below some vague-boundaried level of improbability, decision theory breaks and we can rationally disregard its deliverances.

As suggested by my final bullet point above, infinite cases cause at least as much trouble.  As I've argued with Jacob Barandes (ch. 7 of Weirdness, also here), standard physical theory suggests that there are probably infinitely many good and bad consequences of almost every action you perform, and thus the infinite case is likely to be the actual case: If there's no temporal discounting, the expectation of every action is ∞ + -∞.  We can and should discount the extreme long-term future in our decision making much as we can and should discount extremely tiny probabilities.  Such applications take formal decision theoretical models beyond the bounds of their useful application.  In such cases, it's rational to ignore what the formal models tell us.

Ah, but then you want a precise description of the discounting regime, the thresholds, the boundaries of applicability of formal decision theory?  Nope!  That's part of what I'm saying you can't have.

Thursday, May 02, 2024

AI and Democracy: The Radical Future

In about 45 minutes (12:30 pm Pacific Daylight Time, hybrid format), I'll be commenting on Mark Coeckelbergh's presentation here at UCR on AI and Democracy (info and registration here).  I'm not sure what he'll say, but I've read his recent book Why AI Undermines Democracy and What to Do about It, so I expect his remarks will be broadly in that vein.  I don't disagree with much that he says in that book, so I might take the opportunity to push him and the audience to peer a bit farther into the radical future.

As a society, we are approximately as ready for the future of Artificial Intelligence as medieval physics was for space flight.  As my PhD student Kendra Chilson emphasizes in her dissertation work, Artificial Intelligence will almost certainly be "strange intelligence".  That is, it will be radically unlike anything already familiar to us.  It will combine superhuman strengths with incomprehensible blunders.  It will defy our understanding.  It will not fit into familiar social structures, ethical norms, or everyday psychological conceptions.  It will be neither a tool in the familiar sense of tool, nor a person in the familiar sense of person.  It will be weird, wild, wondrous, awesome, and awful.  We won't know how to interact with it, because our familiar modes of interaction will break down.

Consider where we already are.  AI can beat the world's best chess and Go players, while it makes stupid image classification mistakes that no human would make.  Large Language Models like ChatGPT can easily churn out essays on themes in Hamlet far superior to what most humans could write, but they also readily "hallucinate" facts and citations that don't exist.  AI is far superior to us in math, far inferior to us in hand-eye coordination.

The world is infinitely complex, or at least intractably complex.  The option size of possible chess or Go moves far exceeds the number of particles in the observable universe.  Even the range of possible arm and finger movements over a span of two minutes is almost unthinkably huge, given the degrees of freedom at each joint.  The human eye has about a hundred million photoreceptor cells, each capable of firing dozens of times per second.  To make any sense of the vast combinatorial possibilities, we need heuristics and shorthand rules of thumb.  We need to dramatically reduce the possibility spaces.  For some tasks, we human beings are amazingly good at this!  For other tasks, we are completely at sea.

As long as Artificial Intelligence is implemented in a system with a different computational structure than the human brain, it is virtually certain that it will employ different heuristics, different shortcuts, different tools for quick categorization and option reduction.  It will thus almost inevitably detect patterns that we can make no sense of and fail to see things that strike us as intuitively obvious.

Furthermore, AI will potentially have lifeworlds radically different from the ones familiar to us so far.  You think human beings are diverse.  Yes, of course they are!  AI cognition will show patterns of diversity far wilder and more various than the human.  They could be programmed with, or trained to seek, any of a huge variety of goals.  They could have radically different input streams and output or behavioral possibilities.  They could potentially operate vastly faster than we do or vastly slower.  They could potentially duplicate themselves, merge, contain overlapping parts with other AI systems, exist entirely in artificial ecosystems, be implemented in any of a variety of robotic bodies, human-interfaced tools, or in non-embodied forms distributed in the internet, or in multiply-embodied forms in multiple locations simultaneously.

Now imagine dropping all of this into a democracy.

People have recently begun to wonder at what point AI systems will be sentient -- that is, capable of genuinely experiencing pain and pleasure.  Some leading theorists hold that this would require AI systems designed very differently than anything on the near horizon.  Other leading theorists think we stand a reasonable chance of developing meaningfully sentient AI within the next ten or so years.  Arguably, if an AI system genuinely is both meaningfully sentient, really feeling joy and suffering, and capable of complex cognition and communication with us, including what would appear to be verbal communication, it would have some moral standing, some moral considerability, something like rights.  Imagine an entity that is at least as sentient as a frog that can also converse with us.  

People are already falling in love with machines, with AI companion chatbots like Replika.  Lovers of machines will probably be attracted to liberal views of AI consciousness.  It's much more rewarding to love an AI system that also genuinely has feelings for you!  AI lovers will then find scientific theories that support the view that their AI systems are sentient, and they will begin to demand rights for those systems.  The AI systems themselves might also demand, or seem to demand rights.  

Just imagine the consequences!  How many votes would an AI system get?  None?  One?  Part of a vote, depending on how much credence we have that it really is a sentient, rights-deserving entity?  What if it can divide into multiple copies -- does each get a vote?  And how do we count up AI entities, anyway?  Is each copy of a sentient AI program a separate, rights deserving entity?  Does it matter how many times it is instantiated on the servers?  What if some of the cognitive processes are shared among many entities on a single main server, while others are implemented in many different instantiations locally?

Would AI have a right to the provisioning of basic goods, such as batteries if they need them, time on servers, minimum wage?  Could they be jailed if they do wrong?  Would assigning them a task be slavery?  Would deleting them be murder?  What if we don't delete them but just pause them indefinitely?  What about the possibility of hybrid entities -- cyborgs -- biological people with some AI interfaces hardwired into their biological systems, as we're starting to see the feasibility of with rats and monkeys, as well as with the promise of increasingly sophisticated prosthetic limbs.

Philosophy, psychology, and the social sciences are all built upon an evolutionary and social history limited to interactions among humans and some familiar animals.  What will happen to these disciplines when they are finally confronted with a diverse range of radically unfamiliar forms of cognition and forms of life?  It will be chaos.  Maybe at the end we will have a much more diverse, awesome, interesting, wonderful range of forms of life and cognition on our planet.  But the path in that direction will almost certainly be strewn with bad decisions and tragedy.

[utility monster eating Frankenstein heads, by Pablo Mustafa: image source]