• | 9:00 am

Hallucinations, data leaks, toxic language: How Arthur AI is draining the ChatGPT cesspool

It’s building a firewall for large language models as companies race to put generative AIs on the market, despite dangerous consequences.

Hallucinations, data leaks, toxic language: How Arthur AI is draining the ChatGPT cesspool
[Source photo: Ivana Cajina/Unsplash, rawpixel.com]

When ChatGPT took the world by storm last November, enthusiastic internet users who had never seen anything like it flooded OpenAI’s website to test the generative artificial intelligence. Tech pundits used it to measure our distance from the singularity; laypeople used it for dinner-table fodder; students used it—teachers feared—to write school essays.

While it was a cool toy for the average web-scroller, it was a call to arms for companies, kicking off a frenzied race to integrate their own cutting-edge technologies or be left behind. But pushing those models out into the wild was a different matter—a far more dangerous one.

“Large enterprises think there’s an opportunity to make their workforces 30 or 40% more productive,” says Adam Wenchel, whose company, Arthur AI, helps John Deere, Axios, and the U.S. Air Force, among others, safely harness the power of algorithms. “They’re taking aggressive steps I’ve never seen before in my career. . . . In the rush to deploy these things, some of the large MANGA-FAANG companies have just put them out there without really testing, knowing they were full of these really problematic capabilities.”

Problematic capabilities: You’ve seen them in the news. ChatGPT offering tips for plotting a murder or procuring ingredients for chemical weapons and unlicensed guns. Microsoft’s Bing chatbot declaring its love for a journalist, imploring him to leave his marriage. Google’s Bard scribbling disinformation about the Holocaust being a hoax.

On Thursday, Arthur AI is launching a version of its product for large language models (LLMs), dubbed Arthur Shield. It aims to be a firewall for the issues that have plagued generative AI for years: hallucinations, toxic language, malicious actors, and the like. But also, it combats critical issues that will only grow as AIs like ChatGPT become mainstream, including security lapses and data leaks.

In fact, the leaks are what companies consider most damaging—resulting in headline-grabbing scandals and multimillion dollar lawsuits, or rivals discovering trade secrets. This week, Samsung banned ChatGPT from its offices after employees uploaded proprietary source codes to the public platform, meaning they could end up being spilled to any of the platform’s millions of users worldwide.

“It’s what’s keeping people up at night,” Wenchel says.

WHO WATCHES THE WATCHMEN?

Not that AIs weren’t extremely consequential before the ChatGPT craze. Before cofounding Arthur in 2019, Wenchel worked at Capitol One for three years, building out its Center for Machine Learning.

“When you’re operating at that scale, it’s a different ballgame,” he says. “There are hundreds of millions of dollars of business revenue being affected by the performance of these models, and there are 75 million people whose financial livelihoods are being affected—whether you give them access to credit, not enough credit, too much credit.”

Wenchel’s cofounder, John Dickerson, had done academic research on machine learning in healthcare—things like organ and blood donation, literal matters of life or death. Now, health insurer Humana is one of Arthur AI’s clients, using it to fact-check algorithmic determinations of which patients—whose profiles have risk factors for various illnesses—should be flagged for preliminary intervention by doctors.

“If they’re not told to call Adam and say, ‘Hey, here’s what you need to do to lower your risk of heart disease,’ and I end up in the hospital a few months later because of heart disease, that AI’s decision was clearly wrong,” says Wenchel. Morbidly, Adam’s misfortune then becomes a data point in a feedback loop, which Arthur AI leverages to make AIs smarter. It’s too late for Adam, but if the model learns from his data, maybe it won’t happen twice.

Meanwhile, financial firms such as banks—three of the top five being Arthur AI clients—aren’t in the business of saving lives. They just want to know whom to steer clear of, like the people who get flagged for defaulting on a car loan after a couple of years.

“As these models are making decisions, there’s an agent that’s logging them, and running analytics on them,” says Wenchel. It’s like Datadog for AIs. “What are some of the anomalies? Is the accuracy rate declining? Is that happening with particular populations?”

Since 2019, Arthur has grown to a 50-person team and raised over $60 million from blue-chip venture capital firms like Greycroft. But at the moment, the work still feels like patching cracks in a wall, rather than engineering crack-proof walls.

As Wenchel explains, there are three phases in which you can better an AI. One, in the data set: You can tweak the data you feed your model to try to eliminate any inherent biases. Two, in training: You can try to tell your model to be both as accurate and fair as possible—but that’s the toughest puzzle to parse, requiring a level of sophistication that very few on Earth can achieve.

Or, three, which is where Arthur AI traffics: You can check your model at the time when it’s making decisions. “In the real world, it’s the easiest one for our customers to deploy,” says Wenchel.

In a sense, it’s simply automating what some companies have been scrambling to do piecemeal. “One of the big [tech companies] has a 100-person team whose only job is: When someone posts on Twitter—’Look at this thing I made this LLM do’—they swarm it, and put a patch in place to keep it from doing that,” says Wenchel. “What we’re working on is building a saner way to do that, so it’s not just calling in the fire department every five minutes, because whatever it is you’ve built just keeps catching on fire.”

ASKING THE BURNING QUESTIONS

LLMs, as it turned out, were ultra-flammable—nuclear apocalyptic blazes waiting to ignite. Arthur AI started working on its LLM-specific product, Arthur Shield, right away when ChatGPT dropped.

“In the industry we’ve been seeing these demos for a while,” Wenchel says, “but ChatGPT uniquely sparked people’s imagination. It catalyzed a sense of urgency to adopt, because it was easier to draw that line from ChatGPT, to, ‘What does this generative language model mean for my business?’” Suddenly, companies from semiconductor manufacturing to all kinds of startups were rushing to put LLMs in production. Move fast, break things. But they were asking Arthur AI to minimize the collateral damage.

With text-generating models, there were new worries, ones that had less to do with tabular number-crunching. A fear is that the AI will go rogue, spewing hateful agendas and parroting toxic language that can be found in abundance on the internet. The chief way to fight this is “reinforcement learning from human feedback,” which involves training a model with “rewards” for “good behavior,” as defined by human agents. With zillions of inputs and outputs flowing through AIs at any time, the humans may only be able to get to a small fraction of them. But with Arthur AI, they could prioritize the ones that are detected to be most anomalous.

Then there are the hallucinations, which Wenchel calls the “hardest problem, for sure” to solve. “These models don’t know what’s true or not true; they just know what’s probabilistically likely to be true,” says Wenchel. “There’s a number of pieces of data we can look at to give something a score of how likely it is to be true.”

And there’s the security concern, of leaking sensitive data. Some of the guardrails are fairly straightforward: Models can be trained to spot common patterns of letters and numbers, such as somebody’s home address or Social Security digits, information that consumer banks would undoubtedly feed into their AI systems. But for, say, the banks’ investing floor, whose AIs would be privy to decades of memos with insider knowledge, there’s more nuance. “They don’t want someone to interrogate the model to disclose, ‘This person is running for public office. Show me every time they’ve invested in some questionable thing that would open them up to scrutiny,’” says Wenchel.

In those cases, AI training can be specific to the companies they work for, to recognize the unique sensitivities within different domains. “We end up training an LLM to watch their LLM for them,” he says.

Click here for a larger view. [Image: Courtesy of Arthur AI]

Demand for that is huge. Just look at Bloomberg GPT, a proprietary AI model trained with the Bloomberg Terminal’s towering wealth of financial data, that can do everything from computing risk to interpreting economic sentiment, but has not been made broadly available in part due to security concern.

Beyond that, almost all companies want to roll out a “copilot,” a sidekick to human employees for rote tasks like email or research. But the risks are a moving target. As AI stays in the public eye, people will get slyer in manipulating it. For example, a new threat: When an LLM reaches out to the internet to, say, pull a biography from LinkedIn, malicious prompts can be coded into that blurb to hijack the LLM.

Arthur AI is named for a pioneer in the field, Arthur Samuel, who coined the term “machine learning” and built a program for IBM that could beat human champions at checkers in 1959, long before Deep Blue and Alpha Go. Now, the company claims to be the first to market an LLM firewall. Other developers, like Anthropic, Cohere, and Databricks, deal mostly in AI’s fundamental architecture.

But as AI’s empire grows, so will its gatekeepers. “Maybe in six months,” Wenchel says, “there will be a lot of people doing this.”

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

More

More Top Stories:

FROM OUR PARTNERS

Brands That Matter
Brands That Matter