• | 9:00 am

The AI industry is working hard to ‘ground’ enterprise AI in fact

Generative AI tools have a nasty habit of spewing falsehoods, and companies are looking to AI providers for ways to keep things real.

The AI industry is working hard to ‘ground’ enterprise AI in fact
[Source photo: Feodora Chiosea/Getty Images]

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

THE AI INDUSTRY ASSURES CORPORATE AMERICA IT’LL “GROUND” AI MODELS IN REALITY

Consumer generative AI tools such as ChatGPT have attracted lots of attention among the public, but many AI companies have angled their offerings toward the larger and more predictable revenues offered by the enterprise market. And enterprises are indeed spending heavily to put AI models to work in all kinds of business functions and workflow. But the enterprise market is especially intolerant of generative AI models’ biggest deficiency: their proclivity for generating falsehoods (i.e. “hallucinations”).

That’s why AI model providers are increasingly talking about how they intend to help their corporate customers “ground” their AI models in their own specialized, proprietary data. Foundation models used by enterprises are pretrained with scraped and/or acquired datasets, much like those that power consumer chatbots. But the foundation-model provider might take the additional step of letting the model train on the enterprise’s proprietary data, such as product information or the text of thousands (or millions!) of customer service chats.

The consulting giant KPMG said this week that it has worked with Microsoft Azure to develop an AI search tool that can access internal corporate data without moving or replicating the source data. And over the weekend, OpenAI announced the $200 million acquisition of Rockset, which develops data infrastructure that lets AI models access and process time-sensitive and reliable information (new information about market conditions, perhaps). OpenAI intends to build the Rockset technology into its own stack, which could empower its enterprise customers’s AI models with far more timely and reliable information.

Generative AI models can also be grounded in reliable data from third-party sources. Google announced this week that its Vertex AI models will be grounded in timely data from partners including Moody’s, MSCI, Thomson Reuters, and Zoominfo.

Expect to see AI companies continue to find ways to make their foundation models work smarter for enterprises.

THE WAR OVER AI TRAINING DATA SPREADS TO MUSIC GENERATORS

Are generative AI tools just another type of aggregator? Facebook won the digital advertising wars by giving users a single place to get their news (removing the need to visit publisher sites). Publishers suffered, and are still suffering. Now content owners and publishers are just beginning to realize the potential harm from a new kind of aggregator: generative AI tools that vacuum up and train on content created by journalists, authors, photographers, videographers, and musicians. When AI tools reproduce that content in whole or in part, users might not feel the need to visit the original source. It’s the same old disintermediation story.

That’s exactly why Forbes recently cried foul after the AI search tool Perplexity used the news site’s original reporting to create (then promote) a custom blog post (using its new Pages feature). Days later, a Wired investigation found that a web crawler used by Perplexity was scraping content from websites that were broadcasting a “do not crawl” signal.

The victims of these aggregators—the content creators—are increasingly lawyering up. We’ve already seen lawsuits against AI image-generation companies whose products generated images that closely resembled copyrighted images the companies had vacuumed up for training their models. The New York Times Company sued OpenAI and its backer, Microsoft, for doing basically the same thing, except with news articles. The suit says OpenAI and Microsoft encoded the newspaper’s articles into their language models’ memory so that ChatGPT and Bing Chat (now called Copilot) could access and regurgitate the information—in some cases verbatim, and without proper citation.

The next variation on the theme (inevitably) involves AI music generators. Some major record label groups—Sony, Warner, and Universal—along with their industry group, the Recording Industry Association of America, have filed suit against two of the leading AI music generation apps, Udio and Suno, for allegedly scraping and training on copyrighted songs, then generating AI songs that sound remarkably similar to the source material. Like the LLMs and image generators that train on huge amounts of text and images scraped from the web, AI music-generation apps rely on huge amounts of music that the app makers collected for free.

The main defense of the AI companies in these cases is that their use of creative content is covered under the “fair use” clauses in the copyright law. The clauses carve out a safe harbor for people who use the content to create something “transformative” or substantially new. “Our system is explicitly designed to create music reflecting new musical ideas,” Udio said in a statement sent to Fast Company. “We are completely uninterested in reproducing content in our training set.”

In reality, generative AI models are usually trained on a careful mix of scraped content, synthetic training data, and content licensed from content owners. Fearing legal reprisal, many AI companies are spending more on licensed training data, and this includes music. The Financial Times reported Wednesday that YouTube is currently in talks with Sony, Warner, and Universal to license the songs they own for the purpose of training AI models.

DHS ANNOUNCES ITS FIRST 10 “AI EXPERT” HIRES

The Department of Homeland Security (DHS) is responsible for protecting the U.S. from everything from drug smuggling to election interference. And the agency under Secretary Alejandro Mayorkis has been aggressively adopting AI tools to do its work. It’s also been on a mission to bring in new brainpower to marshall the AI. The agency just announced the first 10 hires of its 50-member “AI Corps,” which is modeled after the U.S. Digital Service that came together under Obama and featured numerous Silicon Valley hires. Secretary Mayorkis said DHS is using a special hiring authority to pay market prices for the AI talent it needs.

Most of the new hires are people with public service backgrounds. Some came over to DHS from other agencies, such as Defense or old federal agency jobs, such as at the Pentagon and NIST. But a few come from startups, and one was at Google. (That’s Sean Harvey, who led YouTube’s trust and safety team focused on global elections and misinformation.) The skill sets hired so far are mainly about applying, not developing, AI models.

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

Mark Sullivan is a senior writer at Fast Company, covering emerging tech, AI, and tech policy. Before coming to Fast Company in January 2016, Sullivan wrote for VentureBeat, Light Reading, CNET, Wired, and PCWorld More

More Top Stories:

FROM OUR PARTNERS