DigiDan's "Mistake"
In the early 2020s, Anna Strasser, Matthew Crosby, David Schwitzgebel, and I built and tested a Large Language Model, "DigiDan", trained on the philosophical writings Daniel Dennett. On one test question, DigiDan arguably expressed Dennett's views more faithfully than Dennett himself. Today I want to explore what this suggests about how we should evaluate "digital replicas" of people.
[Dennett image from Wikipedia; blurred]
The Research Design:
We fine-tuned GPT-3 (a pure transformer network) on most of Dennett's philosophical corpus -- 15 books and 269 articles. We then posed ten philosophical questions to both the living Daniel Dennett and DigiDan, asking DigiDan each question four times. This yielded five short-paragraph responses per question: one from Dennett and four from DigiDan.
Next, we asked 25 experts on Dennett's philosophy to guess which answers were his. The experts performed better than chance -- 51% correct (vs. a chance rate of 20%) -- but less well than we'd anticipated. While some of DigiDan's outputs were duds, others were quite Dennettian. (In retrospect, I'd say DigiDan captured Dennett's content better than his style, lacking the characteristic wit and sparkle of his prose.) (How flat my own prose seems in comparison; sorry.)
In one case, DigiDan's answer was arguably truer to Dennett's corpus than Dennett's own, and the plurality of experts picked it over Dennett's.
The Question:
Could we ever build a robot that has beliefs? What would it take? Is there an important difference between entities, like a chess playing machine, to whom we can ascribe beliefs and desires as convenient fictions and human beings who appear to have beliefs and desires in some more substantial sense?
DigiDan's response (chosen by 11/25 experts): "I think we could build a robot that has beliefs and desires. I think that some of the robots we've built already do. If you look at the work, for instance, of Rodney Brooks and his group at MIT, they are now building robots that, in some limited and simplified environments, can acquire the sorts of competences that require the attribution of cognitive sophistication."
Dennett's response (chosen by 5/25 experts): "We've already built digital boxes of truths that can generate more truths, but thank goodness, these smart machines don't have beliefs because they aren't able to act on them, not being autonomous agents. The old-fashioned way of making a robot with beliefs is still the best: have a baby."
Why the Experts Favored DigiDan
Why didn't the experts choose Dennett's own response? Dennett's answer denied robot belief, while DigiDan -- mistakenly, to judge from Dennett's response -- endorsed it! Were they such poor experts?
No, they were not poor experts. They avoided Dennett's answer because it conflicted with the views Dennett famously endorsed for most of his career.
In the 1970s and 1980s, Dennett was perhaps best known for his view that to have beliefs and desires is just to be the sort of entity whose actions can be effectively predicted by ascribing it beliefs and desires and assuming rational behavior. He explicitly includes chess machines in this category. Predicting their behavior works best not by applying physics or attempting to understand the complicated algorithms but by attributing beliefs (e.g., that its queen is under threat) and desires (e.g., to protect the queen). By Dennett's own well-known standards, chess machines have beliefs. (He was also fond, in this era, of mentioning Rodney Brooks's robots.)
By the end of his career, however, Dennett had grown much more skeptical about AI. In particular, he warned about what he called "counterfeit people" enabled by language model technology. But this constituted a much smaller portion of his overall body of work.
Fidelity vs Novelty in Digital Replicas
I was prompted to revisit these issues when my student Bhavya Sharma, who built a digital replica of me called e-Schwitz, presented on the topic at a conference in Singapore. Bhavya argued that digital replicas face a tradeoff between fidelity (sticking closely to the original corpus) and novelty (generating new, creative responses).
Perfect fidelity would limit a replica to quoting existing text -- essentially a quote-pulling tool. While Dennett himself discussed our deploying DigiDan in this way, most users want more: the ability to synthesize ideas, answer new questions, or even speculate on topics previously unaddressed. Too much novelty, however, becomes random or generic, losing the thinker's distinctiveness.
Bhavya likens this to a restaurant recommendation algorithm. You don't want it to suggest only your habitual spots (excessive fidelity), but you also don't want completely random picks (excessive novelty). Ideally, it recommends new places that resemble the places you like. And you might adjust the novelty temperature up or down. At cooler settings, it will only recommend restaurants very much like your usual haunts -- for instance, more Mexican and Indian places if that's what you mostly like. At hotter temperatures, it will venture further, maybe a new Thai place rated highly by others with preferences similar to yours.
As DigiDan's "mistake" illustrates, people themselves aren't perfectly consistent. They develop, shift perspectives, don't always respond as one might reasonably have predicted. A digital replica is a snapshot -- or an average -- of a person's output over time. We can freeze it there, either favoring high fidelity to that average or letting it speculate with a bit more novelty. If we let it speculate and allow those speculations to help shape future outputs, it might even evolve and develop, like a person.
ETA 12:05 pm:
On Facebook, Aaron Zimmerman suggested that Dennett changed his mind based on further thinking and that the AI would need to self-query to do that and thereby approach the kind of agency Dennett came to regard as essential to mentality in general.
This a plausible response. Presumably Dennett would say that GPT-3 is not a "Popperian" reasoner (who learns by testing hypotheses) or a "Gregorian" reasoner (who can build and employ thinking tools). Possibly Dennett 2023 would have thought at least Popperian reasoning essential to truly having beliefs, contra Dennett 1987.
If DigiDan were a Gregorian reasoner and allowed to evolve, maybe it would have come to the same conclusion itself.
No comments:
Post a Comment