01-02-25 | 9:00 am

It’s 2025. Let’s stop grading generative AI on a curve

Let this be the year that the tech industry holds its use of AI to the highest standard of all: mundane dependability.

[Source photo: Rawpixel.com/Adobe Stock; SAMYA/Adobe Stock]

By HARRY MCCRACKEN 6 MINUTE READ

Compared to other tech giants, Apple’s approach to generative AI is strikingly measured. Its new “A pple Intelligence” errs on the side of sidestepping functionality that could go awry or be misused: Image Playground, for example, produces only cheerfully synthetic Pixar-esque graphics, not anything that might be mistaken for a photograph.

But even Apple can do only so much to tamp down AI’s tendency to run wild. Another Apple Intelligence feature uses the technology to summarize notifications. Last month, its précis of a BBC article about Luigi Mangione, charged with having murdered United Healthcare CEO Brian Thompson, mistakenly stated that Mangione had shot himself. Understandably, the BBC was not pleased. Reacting to both that botched summary and an earlier one inaccurately claiming that Israel Prime Minister Benjamin Netanyahu had been arrested, a journalists’ group said that Apple should just ditch the notification summaries, period.

It’s dystopian that a new AI feature—from Apple of all companies—has spewed misinformation about some of the world’s most-watched news stories. And that‘s before you get to the fact that the Apple Intelligence summaries, even when accurate, often read like they were written by an alien only vaguely familiar with the ways of human beings. Given that they’re compressing items that were usually pretty concise in the first place, it’s just not clear that the summaries are a net positive for users of Apple’s platforms. I can’t imagine the company will kill the feature, but maybe it should have held off releasing it until it worked better.

Across the tech business, the race to implement AI has led companies to ship functionality they know to be raw and erratic—not because it’s buggy in any traditional sense, but because unpredictability is baked into the system. For an industry that’s accustomed to working in ones and zeroes—making quality control literally binary—that’s a sea change. Thirty years ago, Intel’s Pentium processor turned out to have a glitch that could result it to divide numbers incorrectly. The chances of the defect affecting any given calculation were apparently one in nine billion, which initially led to Intel downplaying the flaw. After the chipmaker realized its blitheness was a bad look, it apologized and spent millions to provide debugged Pentiums to PC owners who requested them.

With today’s AI, flakiness is so core to the experience that some of my conversations with Anthropic’s Claude are dominated by its self-aware acknowledgments of its limitations. In response to one recent query, it helpfully explained, “I should note that while I try to be accurate, I may hallucinate details when answering questions about very obscure topics like this.” That’s better than the maniacally overconfident personas of other chatbots, but it’s still a stopgap, not a solution.

It’s true that the current versions of Claude, ChatGPT, and their rivals are less prone to getting stuff laughably wrong than their predecessors did. But they remain very good at weaving plausible-sounding inaccuracies into mostly true material. Computer graphics long faced an uncanny valley problem, in which realistic animated people that were 90% convincing tended to leave viewers fixated on the 10% that was imperfect, such as eyes that lacked a convincingly human glint. With AI, by contrast, there’s a canny valley—the percentage of material it generates that’s faulty, but not obviously so. It’s a more pernicious issue than hallucinations that are easy to spot, and it isn’t going away anytime soon.

Now, I’m aware that I often use this newsletter to carp about AI’s imperfections. I swear I’m not a Luddite. I’m fully capable of being dazzled by AI-infused features, and I don’t think they need to attain perfection to be extraordinarily useful. For instance, this week’s newsletter implements a formatting tweak that I couldn’t quite figure out on my own. Claude handled most of the coding work with only general instructions from me. Even factoring in the time I spent wrapping up the job, it felt like a miracle. (I’d previously tried ChatGPT and found its advice unworkable.)

As tempting as it is to cut an infinite amount of slack for something as astonishing as generative AI, the best computing breakthroughs don’t require any special dispensation. Arthur C. Clarke famously said that any sufficiently developed technology is indistinguishable from magic. With all due respect, the opposite may be true: We know that a new technology is sufficiently developed when it’s so dependable that we regard it as mundane, not magical. When was the last time you found yourself awed by the internet? Or electricity, or air travel?

If 2025 is the year that generative AI’s novelty fades away, it might force tech companies to hold it to the same standards as all the older, more familiar technologies at their disposal. That coming-of-age can’t arrive soon enough—and in its own unglamorous way, it would be a giant leap forward.

Read/Watch/Listen/Try

At Fast Company, Jared Newman’s roundups of the year’s best new apps—both brand-new ones and meaty upgrades—are a holiday tradition. Here’s his list for 2024. It inspired me to recommend three additional productivity apps. None of them are all that well-known, but they’re as essential to my work as any of the big-name ones.

Bear. After ditching Evernote in 2023, I’ve tried an infinite number of note-taking apps. Bear—available for Macs, iPhones, and iPads—is the one that comes closest to thinking the way I do. Its interface is beautifully minimalist, and I’ve bonded with its use of hashtags as an organizational tool. Even the Pro version is an absurdly reasonable $30 a year.

Reclaim.ai. This web-based app helps ensure that I actually complete the tasks on my to-do list by intelligently slotting them into available space on my Google Calendar, where they’re impossible to ignore. There are paid plans, but the free version offers everything I need and more. Until just now, I somehow missed that Reclaim was acquired by Dropbox in August; here’s hoping that company doesn’t mess too much with a good thing.

Focus. Back in 2015, I learned—from a Fast Company article, naturally!—about a productivity technique called Pomodoro. It involves dividing your workday into 25-minute chunks, with brief breaks in between if you need them. I continue to find it a boon to my efficiency, and my favorite way to manage it is this elegant timer app for Macs, iPhones, iPads, and Apple Watches—not to be confused with a bunch of other timers also called Focus.

It’s 2025. Let’s stop grading generative AI on a curve

Let this be the year that the tech industry holds its use of AI to the highest standard of all: mundane dependability.

Read/Watch/Listen/Try

Featured Videos

Today's Top Stories:

01

ACWA Power secures $692 million to build 1.1GW wind power plant in Egypt

02

UAE's Mubadala overtakes Saudi Arabia's PIF as world's most active sovereign wealth fund

03

5 things that make us hopeful about climate progress in 2025

04

25 experts predict how AI will change business and life in 2025

05

How Egyptian companies are capitalizing on the construction boom in Saudi Arabia

More Top Stories:

FROM OUR PARTNERS

Impact

Impact

Impact

News

News

News

Co. Design

Co. Design

Co. Design

Work Life

Work Life

Work Life

ACWA Power secures $692 million to build 1.1GW wind power plant in Egypt

UAE’s Mubadala overtakes Saudi Arabia’s PIF as world’s most active sovereign wealth fund

UAE industrial sector grows by 57%, exceeding $56.7 billion contribution to GDP

Unparalleled Journalism. Start Your Subscription Today.