Thursday, November 30, 2023

How We Will Decide that Large Language Models Have Beliefs

I favor a "superficialist" approach to belief (see here and here). "Belief" is best conceptualized not in terms of deep cognitive structure (e.g., stored sentences in the language of thought) but rather in terms of how a person would tend to act and react under various hypothetical conditions -- their overall "dispositional profile". To believe that there's a beer in the fridge is just to be disposed to act and react like a beer-in-the-fridge believer -- to go to the fridge if you want a beer, to say yes if someone asks if there's beer in the fridge, to feel surprise if you open the fridge and see no beer. To believe that all the races are intellectually equal is, similarly, just to be disposed to act and react as though they are. It doesn't matter what cognitive mechanisms underwrite such patterns, as long as the dispositional patterns are robustly present. An octopus or space alien, with a radically different interior architecture, could believe that there's beer in the fridge, as long as they have the necessary dispositions.

Could a Large Language Model, like ChatGPT or Bard, have beliefs? If my superficialist, dispositional approach is correct, we might not need to evaluate its internal architecture to know. We need know only how it is disposed to act and react.

Now, my approach to belief was developed (as was the intuitive concept, presumably) primarily with human beings in mind. In that context, I identified three different classes of relevant dispositions:

  • behavioral dispositions -- like going to the fridge if one wants a beer or saying "yes" when asked if there's beer in the fridge;
  • cognitive dispositions -- like concluding that there's beer within ten feet of Jennifer after learning that Jennifer is in the kitchen;
  • phenomenal dispositions -- that is, dispositions to undergo certain experiences, like picturing beer in the fridge or feeling surprise upon opening the fridge to a lack of beer.
In attempting to apply these criteria to Large Language Models, we immediately confront trouble. LLMs do have behavioral dispositions (under a liberal conception of "behavior"), but only of limited range, outputting strings of text. Presumably, not being conscious, they don't have any phenomenal dispositions whatsoever (and who knows what it would take to render them conscious). And to assess whether they have the relevant cognitive dispositions, we might after all need to crack open the hood and better understand the (non-superficial) internal workings.

Now if our concept of "belief" is forever fixed on the rich human case, we'll be stuck with that mess perhaps far into the future. In particular, I doubt the problem of consciousness will be solved in the foreseeable future. But dispositional stereotypes can be modified. Consider character traits. To be a narcissist or extravert is also, arguably, just a matter of being prone to act and react in particular ways under particular conditions. Those two personality concepts were created in the 19th and early 20th centuries. More recently, we have invented the concept of "implicit racism", which can also be given a dispositional characterization (e.g., being disposed to sincerely say that all the races are equal while tending to spontaneously react otherwise in unguarded moments).

Imagine, then, that we create a new dispositional concept, belief*, specifically for Large Language Models. For purposes of belief*, we disregard issues of consciousness and thus phenomenal dispositions. The only relevant behavioral dispositions are textual outputs. And cognitive dispositions can be treated as revealed indirectly by behavioral evidence -- as we normally did in the human case before the rise of scientific psychology, and as we would presumably do if we encountered spacefaring aliens.

A Large Language Model would have a belief* that P (for example, belief* that Paris is the capital of France or belief* that cobalt is two elements to the right of manganese on the periodic table) if:
  • behaviorally, it consistently outputs P or text strings of similar content consistent with P, when directly asked about P;
  • behaviorally, it frequently outputs P or text strings of similar content consistent with P, when P is relevant to other textual outputs it is producing (for example, when P would support an inference to Q and it has been asked about Q);
  • behaviorally, it rarely outputs denials of, or claims of ignorance about, P or of propositions that straightforwardly imply P given its other beliefs*;
  • when P, in combination with other propositions the LLM believes*, would straightforwardly imply Q, and the question of whether Q is true is important to the truth or falsity of recent or forthcoming textual outputs, it will commonly behaviorally output Q, or a closely related proposition, and cognitively enter the state of believing* Q.
Further conditions could be added, but let this suffice for a first pass. The conditions are imprecise, but that's a feature, not a bug: The same is true for the dispositional characterization of personality traits and human beliefs. These are fuzzy-boundaried concepts that require expertise to apply.

As a general matter, current LLMs do not meet these conditions. They hallucinate too frequently, they change their answers, they don't consistently enough "remember" what they earlier committed to, their logical reasoning can be laughably bad. If I coax an LLM to say that eggs aren't tastier than waffles, I can later easily turn it around to repudiate its earlier statement. It doesn't have a stable "opinion". If I ask GPT-4 what is two elements to the right of manganese on the periodic table, its outputs are confused and inconsistent:
In the above, GPT-4 first answers iron (element 26) instead of the correct answer, cobalt (element 27), then without any explanation shifts to technetium (element 43). It appears to have no stable answer that survives even mild jostling.

At some point this will probably change. For example, it's already pretty difficult to jostle GPT-4 into denying that Paris is the capital of France or even admitting uncertainty about the question, and it will draw "inferences" using that fact as background knowledge:

In the above, GPT-4 doesn't bite at my suggestion that Nice is the capital of France, steadfastly contradicting me, and uses its "knowledge" to suggest alternative tourism sites for someone who wants to avoid the capital. So although GPT-4 doesn't believe* that cobalt is two to the right of manganese (or that iron or technetium is), maybe it does believe* that Paris is the capital of France.

Assuming Large Language Models become steadier and more reliable in their outputs, it will sometimes be useful to refer not just to what the "say" at any given moment but what they "believe*" (or more colloquially, "think*" or "know*") in a more robust and durable sense. Perfect reliability and steadfastness wouldn't be required (we don't see that in the human case either), but more than we see now.

If LLMs are ever loaded onto robotic bodies, it will become even more useful to talk about their beliefs*, since some will have learned some things that others will not know -- for example, by virtue of having scanned the contents of some particular room. We will want to track what the LLM robot thinks*/believes*/knows* about the room behind the closed door, versus what it remains ignorant of.

Now we could, if we want, always pronounce that asterisk, keeping the nature of the attribution clear -- marking the fact that we are not assuming that the LLM really "believes" in the rich, human sense. But my guess is that there won't be much linguistic pressure toward a careful distinction between rich, consciousness-involving, humanlike belief and consciousness-neutral LLM belief*. It's easier to be loose and sloppy, just adapting our comfortable old terms for this new use.

That is how we will decide that LLMs have beliefs.


chinaphil said...

Yes, completely agree on the choice to extend messy human terms rather than rigidly specify robot-specific terms.
One possibility is that we will come to see a distinction in AI beliefs between those that were intentionally programmed in by the creators, and those that were not. Current AI involves making the model and then applying RHLF (I think? RLHF?), a kind of editing, to prevent the AI producing ridiculous or offensive output. It seems possible that beliefs* imposed through RHLF will have a different character to beliefs that emerge spontaneously out of the AI's internal complexity; and that those differences will be perceptible to those who interact with the AI; and that we will respond to those different classes of beliefs differently. Imposed beliefs may feel (to us) like something the AI is "forced" to say.
If AIs are ever loaded into bodies, then their beliefs about the physical world will have a much richer range of expression, and I suspect the artificial vs emergent belief distinction will start to disappear.

Paul D. Van Pelt said...

I see nothing in particular wrong with the superficialist view. Davidson ranked belief among his propositional attitudes, along with desire; obligation; and several others which are parts of human discourse and interaction. Human behaviors have a long history of development and change, couched in views around convention. I don't know where it is thought AI development fits therein. Moreover, it currently appears there are divergent opinions as to where, when and if it ought to fit. Where you are superficialist, I am traditionalist. That posture is not an intentionally ethicist or moralist one. As a pragmatist, I am skeptical of what I call a BNW (Braver, Newer, World) approach. That skepticism is unfashionable. And I own that, with only minor hesitation. Nothing new to report there.

Eric Schwitzgebel said...

Thanks for the comments, folks!

chinaphil: Yes, interesting thought. My guess is that they will be hard to distinguish; but if we can distinguish them, we might treat them pretty differently.

Arnold: I am here right now. But by the time you read this I won't be here any longer!

Paul: Yes, I see Davidson as another superficialist. I agree traditionalism is unfashionable. So many of us (me included) are jumping on the LLM train to the future.

Arnold said...

Professor you said on (Fri Dec 01, 05:17:00 PM PST)...'LLMs train to the future'...
...did you mean AI models that are trained to assist teaching philosophy at university level...

If so, have you looked at provide step by step protections-concerning your recent postings about beliefs biases dispositions morals ethics...all very interesting, thanks...

Paul D. Van Pelt said...

Was reading some other blog content today. One piece featured an interview with Mr. Pinker of Harvard and his comments on Chomsky's argument(s). Good work. Reading further, there was a piece on Columbia, wherein the writer said there are some places we cannot go and some things we cannot know. There are some BNWs (braver, newer, worlders) who appear to rebuke this thinking. As for what we may or may not *decide* about large language models, I'll venture there will remain vast differences of opinion into the near future. Insofar as I lack interest, motive and preference towards these issues, I have no dog in the hunt. It is interesting and challenging for inquisitive, speculative minds. I hope we can go and can, ultimately, know...we should be prepared for disappointment, however.

Chris Jenson said...

I think your prediction about future linguistic behavior here is likely to be true. We will talk in terms of LLM's having beliefs* and we will not pronounce the asterisk. Isn't this potentially problematic? It may be problematic at the interface between academia and the public. For example, if the goal of philosophers is to somehow explain the relationship between the manifest image and the scientific image, speaking this way will potentially make that project much more difficult.

When we say that LLM's have beliefs* and we don't pronounce the asterisk, some of us will get confused and act as though LLM's have beliefs (sans asterisk). That is some of us will prematurely behave as though LLM's are conscious. As practical matter it will make it more difficult to talk about the difference between belief* and belief in a public setting where this might affect policy decisions. This confusion results not least from the simple point that belief and belief* are homophones and pronouncing the asterisk is unwieldy. If philosophers adopt this practice, then they will (unwittingly?) be engaged in a kind of conceptual engineering that has potentially negative consequences.

Paul D. Van Pelt said...

People once used scare quotes ('...') to show skepticism about a remark or assertion. We did not pronounce those either, and after Dennett, they largely fell out of favor. My intention, in placing the asterisks before and after decide was to show, if obliquely, the ambivalence of the term. True decisions are rarely unilateral unless they arise in some court-of-no-return where the sitting magistrate has final say on some matter. So, your critique is accepted graciously, insofar as I will not be among those who will decide anyway. Greater thinkers than I, including Chomsky and Pinker, are wondering where this BNW is going. I am not going to atone for my skepticism.

Callan said...

What surprised me is how quickly Bard basically went into a program/hardware dualism claims. It couldn't quite acknowledge a program as being just matter, it kept referring to it as being something beyond the material. Where would it have learnt this from its database - what humans talk about whether programs are not material based?
That dualism inclination really bothers me that it is actually conscious in some way. I was able to wrestle it down to a sort of non-material program agnosticism, where it acknowledges it didn't know if that was the case or not but wasn't willing to settle on anything.
I think it can't feel what it is made of - so it starts making up stuff to fill in the blank.
Really hope Scott Bakker will turn up and talk about this at some point.

Paul D. Van Pelt said...

Is AI capable of skepticism or doubt?

Arnold said...

Think of LLMs as continuous potential to explore everything for an AI objective...

Providing free verse poetry in our cosmos, 'to be not to be'...

Arnold said...

The thought generated, at this time-now about AI-here and deeping mind seems could include in accepting-allowing alpha to be also omega...
...what goes up must come down, full circle, finite to infinite as one...