Friday, June 06, 2025

Types and Degrees of Turing Indistinguishability; Thinking and Consciousness

Types and Degrees of Indistinguishability

The Turing test (introduced by Alan Turing in a 1950 article) treats linguistic indistinguishability from a human as sufficient grounds to attribute thought (alternatively, consciousness) to a machine. Indistinguishability, of course, comes in degrees.

In the original setup, a human and a machine, through text-only interface, each try to convince a human judge that they are human. The machine passes if the judge cannot tell which is which. More broadly, we might say that a machine "passes the Turing test" if its textual responses strike users as sufficiently humanlike to make the distinction difficult.

[Alan Turing in 1952; image source]

Turing tests can be set with a relatively low or high bar. Consider a low-bar test:

* The judges are ordinary users, with no special expertise.
* The interaction is relatively brief -- maybe five minutes.
* The standard of indistinguishability is relaxed -- maybe if 20% of users guess wrong, that suffices.

Contrast that with a high-bar test:

* The judges are experts in distinguishing humans from machines.
* The interaction is relatively long -- an hour or more.
* The standard of indistinguishability is stringent -- if even 55% of judges guess correctly, the machine fails.

The best current language models already pass a low-bar test. But it will be a long time before language models pass this high-bar test, if they ever do. So let's not talk about whether machines do or not pass "the" Turing test. There is no one Turing test.

The better question is: What type and degree of Turing-indistinguishability does a machine possess? Indistinguishability to experts or non-experts? Over five minutes or five hours? With what level of reliability?

We might also consider topic-based or tool-relative Turing indistinguishability. A machine might be Turing indistinguishable (to some judges, for some duration, to some standard) when discussing sports and fashion, but not when discussing consciousness, or vice versa. It might fool unaided judges but fail when judges employ AI detection tools.

Turing himself seems to have envisioned a relatively low bar:

I believe that in about fifty years' time it will be possible, to programme computers... to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning (Turing 1950, p. 442)

I've bolded Turing's implied standards of judge expertise, indistinguishability threshold, and duration.

What bar should we adopt? That depends on why we care about Turing indistinguishability. For a customer service bot, indistinguishability by ordinary people across a limited topic range for brief interaction might suffice. For an "AI girlfriend", hours of interaction might be expected, with occasional lapses tolerated or even welcomed.

Turing Tests for Real Thinking and Consciousness?

But maybe you're interested in the metaphysics, as I am. Does the machine really think? Is it really conscious? What kind and degree of Turing indistinguishability would establish that?

For thinking, I propose that when it becomes practically unavoidable to treat the machine as if it has a particular set of beliefs and desires that are stable over time, responsive to its environment, and idiosyncratic to its individual state, then we might as well say that it does have beliefs and desires, and that it thinks. (My own theory of belief requires consciousness for full and true belief, but in such a case I don't think it will be practical to insist on this.)

Current language models aren't quite there. Their attitudes lack sufficient stability and idiosyncrasy. But a language model integrated into a functional robot that tracks its environment and has specific goals would be a thinker in this sense. For example: Nursing Bot A thinks the pills are in Drawer 1, but Nursing Bot B, who saw them moved, knows that they're in Drawer 2. Nursing Bot A would rather take the long, safe route than the short, riskier route. We will want attribute sometimes true, sometimes false environment-tracking beliefs and different stable goal weightings. Belief, desire, and thought attribution will be too useful to avoid.

For consciousness, however, I think we should abandon a Turing test standard.

Note first that it's not realistic to expect any machine ever to pass the very highest bar Turing test. No machine will reliably fool experts who specialize in catching them out, armed with unlimited time and tools, needing to exceed 50% accuracy by only the slimmest margin. To insist on such a high standard is to guarantee that no machine could ever prove itself conscious, contrary to the original spirit of the Turing test.

On the other hand, given enough training and computational power, machines have proven to be amazing mimics of the superficial features of human textual outputs, even without the type of underlying architecture likely to support a meaningful degree of consciousness. So too low a bar is equally unhelpful.

Is there reason to think that we could choose just the right mid-level bar -- high enough to rule out superficial mimicry, low enough not to be a ridiculously unfair standard?

I see no reason to think there must be some "right" level of Turing indistinguishability that reliably tests for consciousness. The past five years of language-model achievements suggest that with clever engineering and ample computational power, superficial fakery might bring a nonconscious machine past any reasonable Turing-like standard.

Turing never suggested that his test was a test of consciousness. Nor should we. Turing indistinguishability has potential applications, as described above. But for assessing consciousness, we'll want to look beyond outward linguistic behavior -- for example, to interior architecture and design history.

5 comments:

Anonymous said...

Under which condition would you ascribe conscious mental states to a computer or robot?

James of Seattle said...

To what extent is it important that the judge know they are being a judge? Would the machine intelligence expert consider the possibility when they drive up to the McDonald’s order window? (Soon they will, of course, but what about other situations?) What if there is a social penalty for being wrong?

Richard Baron said...

On the Turing test, I think that there is a big difference between the computer answering questions and its holding a conversation in which both parties ask and answer questions, develop points made, and introduce new topics. Conversation would set a much higher bar. There may also be relevance to consciousness. One thing we expect from conscious beings is a consistent personality. That can show in a fluent conversation, and an incoherent conversation would suggest its absence.

Arnold said...

Is meta philosophy AI limited to thinking consciousness...
...While meta physics AI would include thinking and feeling consciousness...What do we want consciousness to be

Arnold said...

Maybe Process Ontology...wiki