Can AI Police Itself? Experts Say Chatbots Can Detect Each Other’s Gaffes.

AI chatbots often make up inaccurate answers that they predict humans want to hear. One solution: Have them call each other out.

The Washington Post

June 20, 2024

3 Min Read
hand holding smartphone with digital AI chatbot

AI chatbots have become increasingly comfortable in the art of human conversation. The trouble is, experts say, they’re prone to giving inaccurate or nonsensical answers, known as “hallucinations.”

Now, researchers have come up with a potential solution: using chatbots to sniff out errors other chatbots have made.

Sebastian Farquhar, a computer scientist at the University of Oxford, co-authored a study published Wednesday in the journal Nature that posits chatbots such as ChatGPT or Google’s Gemini can be used to weed out AI untruths.

Chatbots use large language models, or LLMs, that consume vast amounts of text from the internet and can be used for various tasks, including generating text by predicting the next word in a sentence. The bots find patterns through trial and error, and human feedback is then used to fine-tune the model.

But there’s a drawback: Chatbots cannot think like humans and do not understand what they say.

To test this, Farquhar and his colleagues asked a chatbot questions, then used a second chatbot to review the responses for inconsistencies, similar to the way police might try to trip up a suspect by asking them the same question over and over. If the responses had vastly different meanings, that meant they were probably garbled.

Related:How to Monitor AI with AI

He said the chatbot was asked a set of common trivia questions, as well as elementary school math word problems.

The researchers cross-checked the accuracy of the chatbot evaluation by comparing it against human evaluation on the same subset of questions. They found the chatbot agreed with the human raters 93 percent of the time, while the human raters agreed with one another 92 percent of the time - close enough that chatbots evaluating each other was “unlikely to be concerning,” Farquhar said.

Farquhar said that for the average reader, identifying some AI errors is “pretty hard.”

He often has difficulty spotting such anomalies when using LLMs for his work because chatbots are “often telling you what you want to hear, inventing things that are not only plausible but would be helpful if true, something researchers have labeled ‘sycophancy,’” he said in an email.

Unreliable answers are a barrier to the widespread adoption of AI chatbots, especially in medical fields such as radiology where they “could pose a risk to human life,” the researchers said. They could also lead to fabricated legal precedents or fake news.

Not everyone is convinced that using chatbots to evaluate the responses of other chatbots is a great idea.

In an accompanying News and Views article in Nature, Karin Verspoor, a professor of computing technologies at RMIT University in Melbourne, Australia, said there are risks in “fighting fire with fire.”

The number of errors produced by an LLM appear to be reduced if a second chatbot groups the answers into semantically similar clusters, but “using an LLM to evaluate an LLM-based method does seem circular, and might be biased,” Verspoor wrote.

“Researchers will need to grapple with the issue of whether this approach is truly controlling the output of LLMs, or inadvertently fueling the fire by layering multiple systems that are prone to hallucinations and unpredictable errors,” she added.

Farquhar sees it “more like building a wooden house with wooden crossbeams for support.”

“There’s nothing unusual about having reinforcing components supporting each other,” he said.

About the Author(s)

The Washington Post

The latest technology news from The Washington Post.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like