- | 9:00 am
Here are the companies OpenAI has made deals with to train ChatGPT
In just the past two weeks, OpenAI has inked deals with Reddit and News Corp. as it looks for fresh data to train its AI. Here are more of the deals that have been announced so far.
OpenAI’s chatbots scored a big new data source following the company’s deal with News Corp. on Wednesday. With the stroke of a pen, ChatGPT and the company’s other services added the Wall Street Journal, New York Post, MarketWatch, Barron’s, and other publications to its database.
The deal, which did not include Fox News content, was the latest in a growing series of big data sharing agreements OpenAI has signed in an effort to educate its systems and expand the technology’s expertise. Just last week, the company signed a similar deal with Reddit to incorporate its content into ChatGPT and new products.
The deals come after some media outlets, including The New York Times Company, have sued OpenAI and Microsoft for using their publications’ copyrighted stories without permission in training chatbots. Filed in Federal District Court in Manhattan, the suit alleges millions of NYTimes articles were used to train chatbots, which have begun to compete with the news outlet as information sources. A collective of well-known authors has also sued the company, alleging “systematic theft on a mass scale.”
Inputting data is only half the battle, of course. OpenAI will have to figure out how to deal with biases in the information it incorporates and how to weed out information that’s sarcastic or pure parody. (Earlier this week, Google showed it still has a long way to go on this front, with the company’s AI Search sharing a farcical Reddit post as fact when it suggested “mix[ing] about 1/8 cup of non-toxic glue into the sauce” to keep the cheese from sliding off of your pizza slice.)
So, who all has partnered with OpenAI, giving the company access to their content libraries? Here’s a comprehensive look.
THE ASSOCIATED PRESS
Last July, the AP and OpenAI announced a deal letting the AI giant license AP’s archive of news stories going back through 1985. AP, in the meantime, was given the opportunity to leverage OpenAI’s tech.
AXEL SPRINGER
The German publisher was the first major media outlet to partner with OpenAI and open its archives to the chatbot. Axel Springer controls a huge assortment of outlets, including Politico, Business Insider, and German outlets Bild and Welt.
DOTDASH MEREDITH
Dotdash Meredith is one of the largest digital publishers in the U.S., so its licensing deal, signed in May, gave OpenAI access to more than 40 brands, including People, Travel & Leisure, Entertainment Weekly, Allrecipes, Real Simple, Food & Wine, Parents, Investopedia, Better Homes & Garden, and InStyle.
The deal came after the company’s parent firm IAC had pushed to create a coalition uniting big publishers as they strove to protect copyrighted materials from AI firms. That effort ultimately fell apart.
THE FINANCIAL TIMES
The FT partnered with OpenAI in April. The licensing deal gave the ChatGPT maker the ability to use FT materials to create text, images, and code. The deal also let ChatGPT respond to questions with short summaries from FT articles, with links back to FT.com.
LE MONDE
In March, the French media organization struck a multiyear licensing agreement with OpenAI for its content library. Photos were not part of the deal and OpenAI agreed that references to Le Monde articles would be highlighted and accompanied by a logo, hyperlink, and the titles of the articles used as references.
NEWS CORP.
News Corp.’s multiyear deal will give OpenAI access to a catalog of some of the most respected financial reporting around, with stories from the Wall Street Journal, MarketWatch, Barron’s, and more. It will also grant access to the New York Post as well as the U.K. publications The Times and The Sun plus multiple Australian publications including The Herald Sun and The Courier Mail.
The agreement does not include content from Fox News or News Corp.’s other businesses, such as its digital real estate services or HarperCollins, however.
PRISA MEDIA
At the same time it struck a deal with Le Monde, OpenAI also partnered with Spanish news outlet Prisa Media, which has brands in Spain, Latin America, and the U.S., including El Pais and El Huffpost, the Spanish version of the Huffington Post.
With more than 1 million posts per day, Reddit is an ongoing source of content for ChatGPT to devour. It also will give the chatbot data for a wide range of topics, from “Ask Me Anything” sessions with celebrities and people in unusual jobs to sports discussion. (The NSFW forums could provide some data as well, but we’re not going to speculate what those could be used for.)
Reddit also struck a $60 million content licensing deal with Google in February.
SHUTTERSTOCK
OpenAI’s partnership with the stock photography website goes back to 2021. In 2023, OpenAI announced it was extending its partnership for another six years, with Shutterstock giving the company a large swath of training data for its AI, including Shutterstock’s image, video, and music libraries, and associated metadata.
These deals could be just the tip of the iceberg. As OpenAI continues to grow ChatGPT, it will need more data for its large language models. Several major publishers, from book houses to news outlets, are still on the sidelines but could be swayed to sign a partnership in the months to come as their revenues fall and OpenAI offers lucrative contracts.