(Bloomberg Opinion/Parmy Olson) -- For a hot minute, Microsoft Corp. looked like it would eat Google’s lunch. Its languishing search engine, Bing, was being revolutionized with a sophisticated new chatbot system from OpenAI. Those hopes have now diminished because of one unexpected truth: Nobody — not even AI scientists — truly understands the breadth of capabilities of artificial intelligence when it is unleashed in the wild.
Early users of Bing have reported unhinged, emotional, even threatening responses to some of their queries from the AI system, which called one user a “bad researcher” and told another newspaper writer that he was “not happily married.” Bing — whose bot entity goes by the name Sydney — has effectively put Google’s embarrassing Bard error in the shade. However, these flaws are just the tip of a much larger iceberg.
The extraordinary technology underpinning the chatbots Bard and OpenAI’s ChatGPT comes from so-called large language models (LLMs) — computer programs trained on billions of words on the public internet that can generate humanlike text. If ChatGPT is the car, the language model underpinning it is the engine, and OpenAI has been selling access to it since 2020. But amid the recent arms race for search bots, those engines are also being shared freely — too freely — and passing on the flaws we’re now seeing in Bing and Bard to a wider audience and in a way that could be much harder to detect.
Thousands of software developers have been exploring ways of integrating language models into businesses, summarizing customer feedback into a single comment, answering website questions or generating digital ad copy. OpenAI would not disclose how many developers have accessed its LLM, known as GPT-3, but one competitor says it’s likely in the hundreds of thousands. Users can pay hundreds or thousands of dollars a month to use it. And while there are dozens of free, open-source LLMs, OpenAI’s is seen as the gold standard. Given Google’s considerable resources, its language model LaMDA could soon be just as popular.
Google has kept the highly advanced model under wraps for years, explaining to staff that its reputation could suffer if it moved too fast on releasing its AI. But earlier this month, as Microsoft announced it would soon power Bing with OpenAI’s language model, Google seemed to reverse that position. Not only did it launch Bard the next day, it also said that in March, it would start allowing third parties to use LaMDA, something unthinkable just months prior. This strategy could come to haunt Google, Microsoft and OpenAI just as it did Facebook in 2018, when it was forced to shut down access to mountains of user data following the Cambridge Analytica scandal. All it took was one rogue user.
One of the big risks is bias. Last week the streaming site Twitch shut down an animated spoof of Seinfeld which had animation and dialogue entirely generated by AI. It turned out the characters had made transphobic and homophobic remarks when their dialogue was created by a “less-sophisticated version” of GPT-3.
GPT-3 was trained on billions of words from an array of sources including 7,000 unpublished books, Wikipedia entries and news articles, which left it vulnerable to picking up on the odd example of biased or hateful material. OpenAI has stripped much of that out of its model with the help of human moderators, but that work isn’t foolproof, and it seems to be especially vulnerable to technical glitches. Bias is also almost impossible to detect when it’s deeply buried in an LLM, a complex layered network of billions of parameters that acts like a black box even to its own creators.
Misinformation, a problem that has beset ChatGPT, also afflicts language models. Tech news site CNET generated 77 articles on financial advice last November using an LLM. (CNET didn't name which one it used.). After the site rechecked the articles, it issued corrections on 41 of them.
OpenAI doesn’t disclose what it calls the “hallucination rate” of its language models or of ChatGPT, but a January 2022 report in tech news site Protocol cited researchers as saying it was between 21% and 41%. My own experience of using ChatGPT puts misinformation at between 5% and 10%. Even if the rate is that low, companies using LLMs need to take everything the programs say with a huge grain of salt, and know that it’s nearly impossible to scrutinize the model for what it might get wrong.
Misuse is perhaps the biggest unknown, since any successful bad actors who use language models will keep their work secret. OpenAI bans its GPT-3 customers from using the model to promote violence or spam. Perpetrators will get a Content Policy Violation email. But bad actors could, theoretically, ignore all that. Stephane Baele, an associate professor in security and political violence at the University of Exeter, used GPT-3 to generate fake ISIS propaganda as part of a study last year. He recalls getting a request for an explanation from OpenAI, and replied to explain what he was doing. “We said, ‘This is academic research,’” he recalls. “We didn’t hear back.”
Could a bad actor generating real propaganda simply reply in a similar way, and use a faked academic email address? OpenAI declined to comment on that hypothetical scenario. It says it has stopped “hundreds” of actors attempting to misuse GPT-3 for a wide range of purposes, including disinformation, and that it is constantly tweaking its language models to filter out harmful content.
But OpenAI isn’t alone. There are other LLMs that bad actors can use. In July 2022 a consortium of scientists released an open-sourced, multilingual LLM called Bloom, which isn’t as sophisticated as OpenAI’s but also won’t shut out users who break its “responsible AI license.” If someone does break that agreement, Bloom’s creators will communicate with the user or potentially take legal action, according to Carlos Munos Ferrandis, who is tech and regulatory affairs council for Hugging Face, an AI company that supported the creation of Bloom. That seems like a risk that plenty of propagandists and other rogue actors would be willing to take. Bloom has been downloaded 15,000 times over the last 30 days, according to another representative.
In early 2019, OpenAI released a 70-page report on the social impact of language models, and said it would not release its latest LLM because it could be misused. That view has changed drastically since then. Sure, its language models have become more accurate and less biased, its safety filters more effective. But commercial pressures and the growing sway of Microsoft, which invested $1 billion in 2019 and another $10 billion this year into OpenAI, seem to have steered it toward making a riskier bet on commercializing its technology. Google, with its plans to sell access to LaMDA is now doing the same.
With Google’s stumble and Microsoft Bing’s bizarre comments, both companies need to slow down their AI arms race. Their revolutionary chatbots aren’t ready to go wide — and neither are the engines powering them.