• | 8:00 am

OpenAI says its ChatGPT voice isn’t a Scarlett Johansson rip-off. Johansson disagrees

The friendly rasp of ChatGPT’s ‘Sky’ voice is getting the AI company into hot water.

OpenAI says its ChatGPT voice isn’t a Scarlett Johansson rip-off. Johansson disagrees
[Source photo: Getty Images]

Last week, OpenAI launched ChatGPT 4o, a new model of its chatbot assistant that converses in almost real time. Users could choose from five voices, including Sky, whose friendly intonation had a slight rasp vaguely reminiscent of Scarlett Johansson—an actor who, not coincidentally, had voiced an AI assistant in Her, a 2013 film that follows a man who falls in love with his computer’s operating system.

OpenAI has long denied that the voice was a mimic of the actor’s, but according to an official statement from Johansson, the story is a lot more complicated than OpenAI has let on.

“Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system,” Johansson said in a statement first reported by NPR. “He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people.” However, the actress declined the offer after much consideration “for personal reasons.”

According to Johansson, “two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.”

A day after this story was published, an OpenAI spokesperson sent a statement by Altman, answering Johansson’s statement: “The voice of Sky is not Scarlett Johansson’s, and it was never intended to resemble hers. We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms. Johansson that we didn’t communicate better.”

Prior to Johansson’s statement, however, OpenAI appeared to be doing some preemptive damage control. On Sunday, May 19, the company quietly published an article that goes into deep detail about how it chose the voices that come with the ChatGPT’s audio interface. A day later, the company posted a tweet linking to that explainer with the following message: “We’ve heard questions about how we chose the voices in ChatGPT, especially Sky,” the company said. “We are working to pause the use of Sky while we address them.”

OpenAI had released Sky in September 2023, along with Breeze, Cove, Ember, and Juniper—five distinct voices that, according to OpenAI, were selected in a long process that involved “industry-leading casting and directing professionals to narrow down over 400 submissions.” OpenAI says it made sure to take “the right steps” to cast these voices, with each actor receiving “compensation above top-of-market rates” that “will continue for as long as their voices are used in our products.” Nobody noticed Sky’s similarity with Johansson’s voice back then and the actress didn’t address it.

THE VALUE OF A VOICE

OpenAI’s CEO, Sam Altman, hasn’t been shy about his love for the film Her. Just last week, he tweeted a simple “her”—a not-so-subtle reference connecting the movie to the company’s new ChatGPT 4o model.

After the presentation, a deluge of articles referencing Her washed over the internet. Johansson was named in all of them. The discourse peaked when Johansson’s husband, Colin Jost, made a joke about it on Saturday Night Live.

Altman painted a target on OpenAI’s back with his tweet. It no doubt spurred the company to write its explainer to preempt whatever legal drama was brewing. Johansson is now requesting that OpenAI outline the entire process for creating Sky’s voice. “As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAl, setting out what they had done and asking them to detail the exact process by which they created the ‘Sky’ voice,” Johansson said in her statement.

That it would take a legal threat to force OpenAI’s hand is not surprising. After all, this is a company that claims to “respect creators” but has already been sued for allegedly infringing copyright law on more than one occasion.

Johansson is not afraid to litigate; she has already sued another AI company for using her likeness). And she’s in the right here, given the facts we know. “In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity,” she wrote in the final paragraph of her statement. “I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.”

THE CRUX IN THE NEW CHATGPT’S UX

Was Sky’s vocal similarity to Johansson’s intentional? In its blog post OpenAI claims it was merely a coincidence:

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice. Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents.

Regardless, choosing a voice to represent ChatGPT was clearly a highly considered decision, and for good reason. Audio design is always an important part of user experience design. Audio design teams can spend weeks and even months creating the right sound for an app because a well-timed click, snap, and moof can make or break the overall user experience.

In the case of the new ChatGPT 4o’s voice mode, the audio design is the most important thing. It is the entire user experience and may well become the way most people interact with OpenAI’s models.

The unfortunate thing about this whole fiasco is that Johansson’s voice wasn’t even needed to make using ChatGPT 4o feel special. The predictive text model is neither sentient or truly intelligent, but it has just enough smarts and nuance to convey emotion in its voice. The way it talks to you, confides in you, flirts with you, explains, listens, and laughs has all the elements, the cadence, and, crucially, the natural speed that our brains identify as human.

This post was updated on May 21 with a statement from Sam Altman.

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

Jesus Diaz founded the new Sploid for Gawker Media after seven years working at Gizmodo, where he helmed the lost-in-a-bar iPhone 4 story. He's a creative director, screenwriter, and producer at The Magic Sauce and a contributing writer at Fast Company. More

More Top Stories:

FROM OUR PARTNERS

Brands That Matter
Brands That Matter