- | 8:00 am
Perplexity CEO Aravind Srinivas responds to plagiarism and infringement accusations
Recent reports raise questions about how the answer engine works, including its use of third-party content crawlers.
The AI search startup Perplexity is in hot water in the wake of a Wired investigation revealing that the startup has been crawling content from websites that don’t want to be crawled.
Perplexity’s “answer engine” works by crawling large swaths of information on the web and then creating a big database (an index) of content it grabs from web pages. Instead of typing keywords into a search box, users type or speak questions into Perplexity’s web portal or mobile apps, and receive a narrative answer with citations and links to the web content it draws upon.
Websites can use something called a Robots Exclusion Protocol to keep their content away from web crawlers, which bots are supposed to honor, though compliance is voluntary. Wired, along with an independent researcher, says it has proof that Perplexity has been ignoring those codes and scraping content from off-limits sites anyway.
“Perplexity is not ignoring the Robot Exclusions Protocol and then lying about it,” said Perplexity cofounder and CEO Aravind Srinivas in a phone interview Friday. “I think there is a basic misunderstanding of the way this works,” Srinivas said. “We don’t just rely on our own web crawlers, we rely on third-party web crawlers as well.”
Srinivas said the mysterious web crawler that Wired identified was not owned by Perplexity, but by a third-party provider of web crawling and indexing services. Srinivas would not say the name of the third-party provider, citing a Nondisclosure Agreement. Asked if Perplexity immediately called the third-parter crawler to tell them to stop crawling Wired content, Srinivas was non-committal. “It’s complicated,” he said.
Srinivas also noted that the Robot Exclusion Protocol, which was first proposed in 1994, is “not a legal framework.” He suggested that the emergence of AI requires a new kind of working relationship between content creators, or publishers, and sites like his.
Wired also claims that it was able to get the Perplexity answer engine to closely paraphrase Wired articles by prompting the tool with the headlines or substance of Wired articles. At times Perplexity even paraphrased the Wired stories incorrectly. In one case, the Perplexity “answer” falsely claimed that a California police officer had committed a crime.
Srinivas suggested that Wired used prompts designed to get the Perplexity tool to behave that way, and that normal users wouldn’t see those kinds of results. “We have never said that we have never hallucinated,” he added.
Earlier in June, Forbes accused Perplexity of stealing its content. Perplexity had released a new product called “Pages” in May that lets a user create an article or blog post based on a series of questions they’ve asked the answer engine, or based on a single prompt on a specific subject. Users can add AI-generated or uploaded images, then tweak the text or add formatting before publishing to the web. One of Perplexity’s own Pages used content from a Forbes scoop but didn’t credit the publisher. Perplexity even created an AI-voiced podcast based on the Forbes reporting, but again didn’t credit the site.
Being fastidious about citing sources has been one of the Perplexity’s core principles since launch—which made the potential omission of citations in the Pages product even more glaring. Srinivas told Fast Company that after Forbes raised the issue, his company immediately pushed out an update to Pages that puts attributions within the text of the generated article.
Srinivas frequently says that his product will only be good as the internet ecosystem that it draws from. “We are happy creating a less-market cap, lower-margin business, as long as we are profitable and successful—and [we] make sure that the whole internet wins,” he told the audience at a Fast Company’s Most Innovative Companies Gala in May. “Perplexity would be useless if people were not able to create new content on the web.”
He has said that the company is now working on ‘revenue-sharing’ agreements with selected publishers. The publishers have not been named, so no telling if Conde Nast (Wired’s owner) or Forbes is involved in the initiative. The content crawling and indexing issues that Wired turned up could force the company to accelerate its plans to cut fair deals with publishers.
Despite publishers’ wariness, there’s still a lot of good will for Perplexity, which is taking on the unenviable task of challenging Google with a new kind of search. But it can’t afford to squander much more of it.